Glossary
The language we use.
Atomic definitions of every agent-data-governance term DataDam ships with. The first sentence of each entry is the canonical definition.
- Agent data proxy
A governance layer between AI agents and the data sources they query that authenticates each request, evaluates it against policy, masks fields per role, and writes the decision to an immutable audit log.
Agents talk to the proxy instead of talking directly to the database, the SaaS API, or the MCP server. The proxy speaks each source natively (Postgres on the wire, MySQL on the wire, S3, Salesforce, MCP), so the agent's connection string and code stay unchanged. Every request is authenticated, policy-evaluated, masked, and logged before any byte reaches the agent.
- Data contract
A versioned, machine-readable declaration of what a data source provides, including schema, PII tags, freshness SLAs, and ownership.
DataDam uses Open Data Contract Standard (ODCS) v3. PII tags on contract columns drive automatic masking; the proxy detects schema drift on every snapshot push and emits a violation when a column type changes upstream. Contract authors set freshness SLAs per source. Active versions are immutable; editing means a new row at a bumped semver.
- Trust score
A composite score between 0 and 1000 that the proxy consults on every request, computed from five deterministic components: contract validity, freshness SLA, schema stability, availability, and ownership.
No LLM, no decay function. Each component is worth 200 points and is recomputed every five minutes against a rolling window. Below the org's threshold, requests get a warning header or a 403 depending on enforcement mode. Above, requests flow. Long-form post: /blog/the-five-components-of-the-trust-score.
- MCP gateway
A first-party Model Context Protocol server hosted by the DataDam proxy that aggregates data-source tools and operator-registered upstream MCP servers into a single governed endpoint.
IDE clients (Cursor, Cline, Claude Desktop, ChatGPT, others) treat the proxy as their MCP server. The gateway namespaces upstream tools as alias__tool, applies the proxy's field-level access control + PII masking + audit log to MCP traffic exactly as it does to SQL traffic, and gates upstream tool visibility per the operator's allowlist or denylist.
- PII masking
Automatic removal or transformation of personally identifiable information from query responses before they leave the proxy.
Two paths. For sources with an active data contract, columns tagged pii.email, pii.phone, or pii.health get masked through the contract's declared mode (generalize / redact / hash / tokenize). For sources without a contract, runtime PII detection runs inline as a fallback over 200+ entity types across 7 recognizer packs (secrets, country IDs, healthcare, financial, network, crypto wallets, vehicle and asset identifiers). Both paths run inside the customer environment.
- Tokenization
A reversible mask mode that replaces a sensitive value with a deterministic token, with the plaintext-token mapping stored on the customer side and accessible only via an audited admin call.
Activation is opt-in per field via operator YAML; contract auto-derive and runtime PII fallback never emit tokenize automatically. The token store lives on the proxy in the customer's Postgres, schema _datadam_token_store, with HMAC-keyed deterministic tokens and AES-256-GCM-encrypted plaintext. Detokenize requires an admin key and a required reason; every detokenize call is audited.
- Kill switch
A traffic-side soft stop that returns 503 to any matching agent / source / org request until the operator deactivates it.
Distinct from process termination. The kill switch operates on the request path, scoped per agent, per source, or per org. Activation requires owner or admin role and a written reason; deactivation is a soft delete so the audit history of who killed what when stays queryable.
- Anomaly detection
Three deterministic detectors (volume z-score, novelty access, time-of-day) that run every fifteen minutes over a fourteen-day rolling baseline of audit rollups.
No machine learning, no LLM. Per-agent volume is flagged when the z-score on request_count crosses 4 (high), 3 (medium), or 2 (low). Novelty fires when an agent accesses a source it had not seen in baseline. Time-of-day fires for activity in a normally quiet hour. A cold-start guard at MIN_BASELINE_SAMPLES = 100 keeps detectors silent during onboarding.
- Audit log
An append-only, hash-chained record of every governed agent request including agent identity, source, fields requested and allowed, mask outcomes, latency, and policy decisions.
Each row carries a SHA-256 of the previous row, so any modification or deletion downstream of a row breaks the chain and the next integrity check fails closed. The retention cron is the only code path with permission to delete; database-level BEFORE triggers reject UPDATE always and DELETE unless the explicit per-transaction GUC app.audit_retention_active is set.
- Governance loop
The five-step pipeline every governed request flows through: authenticate, evaluate policy, mask fields, score trust, audit.
Authentication checks the agent's API key against the synced registry. Policy evaluation applies per-role field allow lists, contract mask declarations, and runtime PII fallback. Mask + tokenize transforms run in-memory before the response is serialized. Trust score is consulted against the per-source threshold. Every outcome lands in the hash-chained audit log before the proxy returns. No LLM in the loop; every decision traces back to a deterministic rule, threshold, or policy line.