How it works

The proxy is the wedge.

Three components. Each does one job. Together they govern every request your AI agents make to your data.

Category definition

What is an agent data proxy?

An agent data proxy is a governance layer between AI agents and the data sources they query that authenticates each request, evaluates it against policy, masks fields per role, and writes the decision to an immutable audit log.

Architecture

One proxy, in your environment. Five mechanisms, on every request.

The proxy authenticates, evaluates policy, masks fields, scores trust, and writes the decision to an immutable audit log. App code, IDE clients, your data, and operator-registered upstream MCP servers all flow through the same governance pipeline.

The proxy runs in your environment. The control plane sees rollup telemetry only, never query content, row values, or PII.

Components

Three components, drawn for the whiteboard.

The proxy runs in your environment. The control plane stays small. The console is static.

Component one
The proxy
A Python service that runs in your environment, behind your network controls, with your credentials. It speaks your data sources natively (Postgres, MySQL, Mongo, S3, Salesforce) and your agents talk to it instead of talking to the source. Every request flows through the proxy. No request content, row values, or PII ever leaves your environment.
Component two
The control plane
A small AWS-hosted SaaS that holds your policies, contracts, trust scores, compliance settings, and audit rollups. The proxy syncs from it every minute, pushes telemetry rollups every five minutes. Telemetry is metadata only: counts, top names, latency percentiles. No query content.
Component three
The console
A static Next.js admin app where operators author contracts, set trust thresholds, view the lineage graph, triage anomalies, and export audit evidence. Talks to the control plane over HTTPS. No long-running backend.

Step one

Connect your data sources.

Point the proxy at your sources. It runs in your environment, authenticates with credentials that never leave your environment, and starts emitting metadata-only telemetry to the control plane.

Native database support

Postgres via asyncpg. MySQL via aiomysql. MongoDB via motor. Each connector reads schema, executes queries, and applies field-level transforms before the response leaves the proxy.

Object stores

S3 connector with bucket-level scoping by default. Optional in-line content scanning runs built-in PII detection over text bodies and masks detected PII before serving the object. Per source, opt in.

SaaS APIs

Salesforce, Slack, Jira, ServiceNow, HubSpot, SharePoint. The same proxy that governs your databases governs your SaaS data. Field-level masking applies to REST responses the same way it applies to SQL rows.

Custom connectors

On Enterprise, the connector framework is documented and pluggable. Write the five-method Connector interface and the proxy treats your custom source like every other source.

Step two

Configure contracts and policies.

Author what each role can see, in version-controlled files or in the console. The proxy enforces them on every request, no agent code change required.

Data contracts in ODCS v3

Open Data Contract Standard v3 is the schema. PII tags on contract columns drive auto-masking. The proxy detects schema drift on every snapshot push and emits a violation when a column type changes upstream. Contract authors set freshness SLAs per source.

Field policies in YAML

Operator-authored YAML declares per-role allow lists, mask modes, and tokenization opt-ins. The Sales role sees customer email but not SSN. The Support role sees neither. The Audit role sees both, masked.

app.mydatadam.com/contracts/pg-master-v1

DataDam console showing the contract-detail view for pg-master-v1: the active ODCS v3 YAML body with apiVersion v3.0.0, schema for the customers table, and a pii.email classification on the email column.

Step three

Point your agents at the proxy.

Same request shape, same response shape, with policy enforcement and audit logging in between. No agent code change. The proxy is a drop-in URL swap.

Decision order

Kill switch, then trust check, then anomaly correlation, then PII enforcement, then field policy, then optional tokenization. Each gate either passes, warns, or blocks with a structured reason.

Identity

Each agent carries an Ed25519 cryptographic identity issued at registration. Identity rotates on a schedule. Lost or stolen identities are revoked from the console.

Audit

Every request lands in an append-only rollup table. SHA-256 hash chain makes tampering detectable. Audit retention is configurable per org, exportable to CSV, JSON, or your SIEM.

How agents reach the proxy

Three transports, one governance pipeline.

Your application code, your IDE, and the LLM tools you already run all talk to the same proxy. No second policy language; no second audit log.

HTTP API

POST /:source/query

Plain HTTP for application code that already calls REST APIs. Bearer key auth, JSON request and response. Same shape across Postgres, MySQL, Mongo, and S3 connectors. Wrap it as a tool in the Anthropic Messages API, OpenAI Chat Completions, OpenAI Responses, LangChain, or the Vercel AI SDK.

MCP

Streamable HTTP /mcp

First-party Model Context Protocol endpoint. Your data sources surface as MCP tools alongside operator-registered upstreams (GitHub, Postgres, Notion, Slack, Linear, custom internal tools). Tool arguments scanned outbound; tool results scanned inbound. Cursor, Continue, and Cline connect directly.

Stdio shim

npx @datadam/mcp

For IDE clients that only speak stdio MCP, the published shim bridges to the proxy over Streamable HTTP. Pure transport translator: never inspects, rewrites, or persists message content. One line in the Claude Desktop config and you are governed.

Copy-paste integration snippets at /docs/integrations. OpenAPI 3.1 reference at /docs/api.

Works with what you have

Your data catalog gets smarter about agents.

Catalogs have lineage of pipelines. Nobody has lineage of agents. DataDam emits it natively because the proxy sits on the data path. We do not replace your existing catalog; we feed it the slice it does not have.

Two integration shapes match how catalogs ingest

OpenLineage push. Same shape as our SIEM sinks. DataHub, Marquez, and OpenMetadata speak it natively. Per-edge audit events flow with column-level granularity.
Catalog API pull. For catalogs that prefer scheduled crawls (Atlan, Collibra), we expose /catalog/columns and /catalog/agents/{id}/access endpoints with the agent-access slice your catalog cannot see.

What we feed your catalog

Per-column access counts: which agents read which columns, how often.
Allowed vs denied access by role, per column, per source.
Agent-to-source edges (the lineage of agents, not pipelines).
Contract-derived PII classifications, surfaced in your catalog UI.
Runtime PII entities and counts, alongside the contract tags.
Trust score, freshness SLA, ownership: the operational signal your data team already cares about.

What we explicitly do not claim

DataDam is not a data catalog and does not pretend to be one. We do not ship a business glossary, BI tool metadata, dbt model docs, legacy data contracts, or "single pane of glass" for your data estate. We know what flows through the proxy and we are honest about not seeing the rest. The integration story is feed-your- catalog, not replace-your-catalog.

Coming: native push to DataHub, Atlan, Collibra, OpenMetadata. Ask if a different catalog is in your stack; the OpenLineage shape we already emit covers most receivers today.

Read the docs or start building.

Production-ready in an afternoon. Free tier covers evaluation. Pro and Business tiers unlock contract enforcement, compliance, and lineage.

Start free Read the docs

Frequently asked

Implementation questions.

What platform teams ask before deploying. Answers grounded in the existing /how-it-works flow above.

Does my agent need an SDK to talk to DataDam?

No. Agents point their existing connection string at the proxy. Postgres on the wire, MySQL on the wire, Mongo, S3, MCP. Same protocol, same drivers. The proxy speaks each source natively. No code change in the agent, no library to import.

How long does deployment take?

Three steps. (1) Run the proxy container in your environment (cloud, on-prem, or air-gapped) pointed at one data source: minutes. (2) Author one Open Data Contract Standard contract for that source with PII tags: under an hour for a typical schema. (3) Point your agent at the proxy: instant. Most customers have one source under governance the same day.

Which data sources are supported?

Native: Postgres (asyncpg), MySQL (aiomysql), MongoDB (motor), S3 (aioboto3 with optional in-line content masking). SaaS: Salesforce, Slack, Jira, ServiceNow, HubSpot, SharePoint via REST connectors. MCP: any MCP-speaking upstream registered through the console gets the same governance gates as a database.

What is the MCP gateway?

A first-party MCP server hosted by the proxy. Agents that speak the Model Context Protocol (Claude Desktop, Cursor, Cline, ChatGPT, and others through the @datadam/mcp stdio shim) treat the proxy as their MCP server. The same field-level access control, PII masking, and audit log applies to MCP traffic as to SQL traffic.

How are field-level policies authored?

Two surfaces. Operators write per-role YAML in the console (allow lists, mask modes, tokenization opt-ins). The Sales role might see customer email but not SSN; Support sees neither; Audit sees both, masked. The proxy enforces them on every request without an agent code change.

What happens to a query that violates a policy?

The proxy returns the response with denied columns stripped, a warning header naming the policy that fired, and an audit row recording the decision. For SQL the WHERE / SELECT clauses are rewritten to drop the denied fields; the agent sees a partial result rather than a 403 (configurable per-source). Every decision is hash-chained into the audit log.