Data & knowledge graphs

Data & Knowledge Graphs

We model your domain as an ontology, unify scattered records into one graph, and turn raw source — including your own code — into a queryable structure that downstream retrieval and agents can trust.

A knowledge graph is only useful if it reflects how your business actually works. We design the ontology, resolve duplicate and conflicting records into stable entities, and build an enterprise knowledge graph that downstream retrieval, graph RAG, and agents can query with confidence. The same discipline applies to source code: we extract a code graph of pages, symbols, calls, and execution chains. Strong data architecture for AI starts here — canonical entities, explicit relationships, and provenance on every value.

An ontology that matches your domain, not a generic schema

We start by writing down the ontology — the entities, the relationships, and the rules that govern them. That means deciding what a canonical record is, which fields are authoritative, and where one system wins over another when they disagree. We model tenancy, soft-delete, and audit columns from day one so the graph stays governable as it grows. Canonical dictionaries — for technologies, use cases, and delivery patterns — give every record a stable slug instead of free text, so 'AWS Lambda' and 'aws-lambda' never fork into two nodes. The ontology is documented next to the schema and treated as the contract: if a field exists in the running system, it exists in the model, and the migration lands with the doc.

Entity resolution that unifies records into one graph

Real data arrives duplicated, partial, and contradictory. We build the resolution layer that collapses it — a deterministic, multi-source match that links a record by exact identifier, then by domain, then by fuzzy signals, with each match carrying a method tag so you can see why two rows became one entity. We handle the messy cases explicitly: the same company entered twice under sibling domains, a contact split across accounts, an inbound message that belongs to a deal nobody linked. The output is one enterprise knowledge graph where every node has a single identity and every edge says what evidence created it — shared origin, a direct reference, or a shared write-then-read.

Code graph — turning a repository into a queryable structure

We extract a code graph by statically analysing source rather than guessing from documentation. Language-specific analysers walk each repository and populate a graph of pages, symbols with their bodies, calls with their arguments at every site, route registrations, and terminal effects — SQL queries, queue publishes, file writes. From that we compute root-to-terminal execution chains, the functional DNA of the system, with cycle handling instead of a shallow depth cap. Chains are grouped into atomic feature clusters by service, route prefix, and tables touched, then merged where they overlap. The result is a graph you can ask 'what does this feature actually do, end to end' and get a grounded answer.

Provenance, versioning, and a graph you can audit

A graph that feeds AI needs to be auditable, so we attach provenance to every derived value. Any field a model or pipeline wrote carries its source and the run that produced it, so you can trace a score, a summary, or a label back to the exact execution. We historicise rather than overwrite: recomputes create new versioned rows, and a stable logical identity links every snapshot across re-extractions so re-running on a new commit attaches to the existing entity instead of forking a duplicate. An append-only, hash-chained audit log records every state change. The graph stops being a black box and becomes something a procurement or security review can actually inspect.

Built so graph RAG and agents can read it

The graph only earns its keep when retrieval and agents can traverse it. We expose the structure through stable query surfaces — list the routes, fetch a chain, read a symbol, get a table schema or the foreign-key graph — so an agent can explore relationships hop by hop instead of being handed a flat blob of text. This is what makes graph RAG work: answers are grounded in real nodes and edges, with the path back to source attached, which sharply cuts hallucination. The same query layer powers analytics, internal tools, and a second verification agent that re-reads the graph to check the first agent's claims. Data architecture for AI, end to end.

What this includes
  • Ontology design — canonical entities, relationship rules, authoritative-source and conflict policy
  • Entity resolution — multi-signal record matching with a method tag on every merge
  • Canonical dictionaries — stable slugs for technologies, use cases, and patterns instead of free text
  • Code graph extraction — pages, symbols, calls with arguments, routes, and terminal effects
  • Execution-chain assembly and clustering — feature-level grouping with typed relationship evidence
  • Provenance, versioning, and hash-chained audit on every derived value and state change
What you get
  • One enterprise knowledge graph where every node has a single identity and every edge has evidence
  • A queryable code graph that answers what a feature does, end to end, grounded in real source
  • Retrieval and agents that traverse explicit relationships — graph RAG with the path back to source attached
Where it fits

Use cases

Unifying duplicated customer records

Records arrive across CRM, mailboxes, and internal systems, duplicated under sibling domains and split across accounts. We resolve them into single canonical entities with a match-method tag on every merge, so reporting and AI both read one consistent graph.

A graph of what your codebase does

We statically extract pages, symbols, calls, and execution chains from your repositories, then cluster them into features. Teams query the code graph to understand impact, onboard faster, and ground automated reasoning in what the code actually executes.

A trustworthy substrate for retrieval and agents

We turn scattered documents and records into an ontology-backed graph with provenance on every value, then expose stable query tools. Graph RAG and agents traverse real relationships and cite their path back to source instead of guessing.

FAQ

Common questions

A knowledge graph is the structure — canonical entities and the typed relationships between them. Graph RAG is how you retrieve over it: instead of pulling flat text chunks, an agent traverses the graph hop by hop and grounds its answer in real nodes and edges, with the path back to source attached. The graph makes the retrieval trustworthy; the retrieval makes the graph useful.

Usually not. Most of our enterprise knowledge graph work runs on Postgres — junction tables for relationships, JSONB for flexible attributes, and views for derived patterns — which keeps your data architecture for AI in one well-understood system. We reach for a dedicated graph engine only when traversal depth or query shape genuinely demands it, not by default.

A code graph is your repository represented as queryable structure — pages, symbols, calls with their arguments, routes, and the execution chains that connect a request to its database writes. It lets you ask what a feature actually does end to end, assess change impact, and ground agents in real behaviour rather than documentation that drifts. We build it by static analysis, not by recording sessions.

Provenance and versioning. Every derived value carries its source and the run that produced it, recomputes create new versioned rows instead of overwriting, and a stable logical identity links snapshots across re-extractions so re-running never forks duplicates. An append-only, hash-chained audit log records every state change, so the graph stays inspectable for a security or procurement review.

Building something that needs this?

Tell us what you're working on. The first call is always free.

Start a projectAll capabilities