Agent runtimes

Agent Runtimes & Orchestration

We build agent runtimes that run real work to completion — bounded tool loops, a durable job queue, and a receipt on every action — so autonomy stays accountable.

ResearchDraftActClassifyLearnhuman approves

Most demos of AI agents collapse the moment they hit production: unbounded loops, hallucinated tool calls, and no way to explain what happened. We build agent runtimes that hold up — a bounded reason-act loop, read-only and write tools with explicit permissions, a durable job queue that survives restarts, and a recorded receipt for every step. This is the orchestration layer beneath agentic AI: the part that turns a clever prompt into a system that runs real work to completion, safely, and tells you exactly what it did.

The agent runtime is a loop with a budget

An agent runtime is a controlled loop: the model reasons, picks a tool, the runtime executes it, feeds the result back, and repeats until the agent declares done or hits a limit. The limits are the point. We cap the number of tool hops per run so a confused agent can't spin forever, set per-provider rate limits and timeouts so one task can't exhaust a shared budget, and enforce structured output so the final answer parses cleanly instead of arriving as prose we have to scrape. Every run carries a model, a temperature, and a token budget chosen for the job — a fast cheap model for high-volume extraction, a stronger one for synthesis. The loop is boring on purpose; boring loops are the ones that run unattended.

Tools, permissions, and the model context protocol

Agents are only as safe as the tools you hand them. We design typed tool catalogs where each tool has a described schema the model can reason about, and we split them sharply: read-only tools that pull records, activity counts, transcripts, and external research, versus write tools gated behind explicit permission and approval. Default tools (say, reading a call transcript) can be shared across every agent, while sensitive ones stay scoped per agent. We expose these through a clean boundary — increasingly the model context protocol — so the same tool surface serves multiple runtimes and models without rewiring. The system prompt instructs the agent to ground every claim in a tool call and to treat an empty result as a real signal, not an invitation to invent.

Orchestrating multi-agent systems as a crew

Real work rarely fits one agent. We build multi-agent systems as a named crew of specialists — one sources and scores, one researches an account end to end, one extracts commitments, one drafts, one classifies replies, one composes a deliverable — each with its own prompt, model, and tool set. Ai agent orchestration is the wiring between them: which agent runs when, what it hands the next, and where a human gate sits in the chain. A second independent agent can re-read the source and verify the first's output, because a self-checking step catches the errors a single pass misses. Each agent self-critiques before handing work back, so quality is enforced inside the loop rather than discovered downstream.

Durable execution on a Postgres-backed job queue

Autonomy needs durable execution, not fire-and-forget calls. We run agents on a database-backed job queue using Postgres FOR UPDATE SKIP LOCKED, so worker threads claim work without stepping on each other and no Redis or extra broker is required — the database is the only stateful dependency. Each step is an enqueued job with status, retries, and idempotency keyed on the source's natural id, which makes a restart a non-event: in-flight work is reclaimed and replayed, not lost. The same queue is the natural place to pause: a send job can park in a pending-approval state until a human acts. On Java 21 we use virtual threads for the concurrent model and provider I/O, keeping the topology a single node that scales out later by configuration, not rewrite.

Every agent action ships a receipt

No black boxes. Every agent run writes a summary row — model used, tokens in and out, cost, turn count, duration, a verdict (ok, needs-review, failed, skipped), the trigger that started it, and the prose result. One trace id threads the whole causal chain, from the originating request through the job, the agent run, each model call, and each external provider call, so any action is followable end to end across the agent runtime. That telemetry surfaces in the product as an observability view and as footnoted receipts on each action, and it backs the cost rollups and audit trail. When an agent does something surprising, you don't guess — you read the run.

What this includes
  • Bounded reason-act loop with a per-run hop cap, timeouts, and per-provider rate limits
  • Typed tool catalogs split into read-only and permissioned write tools, exposed over a clean boundary such as the model context protocol
  • Durable, retryable execution on a Postgres-backed job queue (FOR UPDATE SKIP LOCKED), no extra broker
  • Multi-agent orchestration: a named crew of specialists with hand-offs and a second verifying agent
  • Structured, schema-validated output so agent results parse cleanly into your data model
  • Per-run telemetry — model, tokens, cost, verdict, trigger — under one end-to-end trace id
What you get
  • Agents that run real multi-step work to completion unattended, instead of demos that stall
  • A full receipt and trace for every action, so autonomy stays auditable and debuggable
  • A runtime that survives restarts, paces provider load, and pauses cleanly for human approval
Where it fits

Use cases

Account research agent

An agent pulls internal records, activity history, and recorded-call transcripts, researches the company externally, and returns a structured brief — every claim traceable to a tool call, gaps surfaced rather than invented.

Extraction with precision guarantees

A focused agent reads one document and extracts only high-confidence facts, tuned for precision over recall so it returns nothing rather than a false positive — then watermarks the source so it is never reprocessed.

Approval-gated outbound loop

A crew sources, scores, drafts, and classifies across a multi-step sequence, parking each send as a job in a pending-approval state until a human approves — autopilot stays opt-in, review-before-send is the default.

FAQ

Common questions

Every run is bounded. We cap tool hops per run, set timeouts and per-provider rate limits, and give each agent a token budget and model chosen for its job. The loop ends on a declared completion or a hard limit, and the run records its cost — so a confused agent fails cheaply and visibly instead of spinning unattended.

The model context protocol is a standard way to expose tools and data to models over a clean boundary, so the same tool surface serves multiple agents and model providers without bespoke wiring. It is not mandatory — we design typed tool catalogs regardless — but for multi-agent systems and swappable models it keeps the runtime portable and the tool layer reusable.

Every agent action ships a receipt: model, tokens, cost, turn count, verdict, and trigger, all under one trace id that threads the request, job, agent run, model calls, and provider calls. You can follow any action end to end and read exactly what the agent did. Observability and audit are built into the runtime, not bolted on after.

Yes — human-in-the-loop is the default. Because execution runs on a durable job queue, any sensitive step (a send, a write, a commitment) can park in a pending-approval state until a person acts. Approval modes range from fully manual to review-before-send to autopilot, set per workflow, so you grant autonomy as trust is earned.

Building something that needs this?

Tell us what you're working on. The first call is always free.

Start a projectAll capabilities