Databases for Production AI Agents
a2a cloud is making Postgres a first-class agent resource: declared by agents, governed by the platform, durable at rest, and able to suspend compute when idle.
Databases for Production AI Agents
AI agents are becoming production software. That means they need more than a prompt, a model key, and a webhook.
They need identity. They need scoped authority. They need a runtime. They need receipts. They need a way to expose A2A and MCP surfaces without every team rebuilding the same infrastructure.
And a lot of them need a database.
Until now, that database usually lived outside the agent contract. A builder would write setup notes, paste a connection string into an environment variable, and hope the deployment surface, runtime secrets, and billing model stayed aligned.
That is not good enough for agent infrastructure.
In a2a cloud, the direction is simple: an agent should be able to declare the resources it needs, and the platform should provision, govern, bind, suspend, resume, and audit those resources like part of the agent itself.
The Product Shape
The a2a cloud landing page says the product plainly: deploy AI agents like production microservices.
That promise already includes frontend, API, MCP, auth, OpenAPI, signed receipts, and managed runtime. Databases belong in the same product surface.
For a builder, the workflow should feel like this:
resources:
databases:
- name: app
scope: org
branch: main
env:
url: DATABASE_URLThe agent declares the need. The platform owns the boring parts.
That gives users and organizations a clean control plane:
- create databases for agents and workflows;
- bind connection material into runtime secrets;
- keep resources scoped to a user or organization;
- enforce quotas, policy, and lifecycle rules;
- expose the state in the same place as agent deploys, receipts, and grants.
The goal is not a separate database product bolted onto an agent platform. The goal is a database primitive inside the agent platform.
What Scale-to-Zero Means
For agent workloads, scale-to-zero matters because many agents are bursty. They wake up for a task, call tools, write state, produce receipts, and then sit idle.
Keeping full database compute hot for every idle agent is wasteful.
Scale-to-zero Postgres does not mean deleting data. It means separating durable storage from query compute. Storage and metadata remain durable. Compute can suspend when there are no active queries, then wake again when a new connection or operation arrives.
That is the useful architectural lesson from Neon-style Postgres: storage stays real, compute becomes elastic.
For a2a cloud, this maps directly onto the agent model. Agents can be dormant. Their databases can be dormant too. The data stays; the compute bill should not.
What We Built
We added the platform foundation for first-class agent databases.
The control plane now understands database resources declared by an agent package. It records projects, branches, roles, bindings, provision events, idle suspend policy, and runtime environment mapping.
The production infrastructure now has a dedicated database substrate. Database namespace ownership, storage class, quotas, limits, and RBAC live in the infra layer, not inside the control-plane app deployment. That keeps Helm and Argo ownership clean and makes the database substrate a shared platform primitive.
On the production cluster, the storage layer is backed by Longhorn with a two-replica database StorageClass. That matches the current two-node production footprint and gives us a practical base for durable database volumes while the higher-level database controller comes online.
We also built a disposable Kubernetes feedback loop that proves the lifecycle behavior before it becomes a product workflow. It exercises wake, write, suspend, resume, and read-after-resume behavior, plus the awkward session cases that matter for real clients.
The Gotchas We Care About
A fake implementation can scale a Deployment to zero. A real one has to survive database behavior.
The tests focus on the cases that break user trust:
- first connection wakes compute;
- writes survive suspend and resume;
- idle compute actually suspends;
- read-after-resume returns the same data;
- open idle clients are handled intentionally;
- idle-in-transaction sessions block suspension;
- listener-style clients reconnect cleanly.
Those details matter because agents will not all use databases the same way. Some will write run state. Some will queue work. Some will listen for notifications. Some will hold connections badly. The platform needs lifecycle rules that are explicit and testable.
Why This Matters for Agents
Databases are not just persistence. For production agents, they become part of the authority and audit model.
A database binding should answer:
- which user or organization owns this data;
- which agent can access it;
- which branch or role it receives;
- which runtime secret exposes it;
- what happens when the agent is undeployed;
- how idle compute is suspended;
- what provision events happened along the way.
That is exactly the kind of infrastructure a2a cloud is built to provide: governed agent runtime, scoped access, receipts, replay, and managed deployment surfaces.
Where It Goes Next
The foundation is now in place:
- database declarations in agent packages;
- control-plane metadata and APIs;
- production database substrate with Longhorn-backed storage;
- clean infra ownership for namespace, RBAC, quotas, and StorageClass;
- lifecycle tests for suspend and wake behavior.
The next step is the reconciler that turns pending database resources into live projects, branches, roles, secrets, migrations, runtime bindings, and dashboard state automatically.
That is the product direction: deploy an AI agent, give it the resources it declares, govern those resources through the platform, and let idle compute go quiet without making the user think about Kubernetes.
Agents should ship like production microservices. Their databases should too.