The five components of the trust score.

A composite, deterministic score that recomputes every five minutes against a rolling window. No LLM in the loop. No decay function. Five things, two hundred points each, weighted equally on purpose.

Josh TepenFounder, DataDamApril 24, 20268 min read

EngineeringTrust score

Every governed source in DataDam carries a trust score between 0 and 1000. The proxy consults it on every request. Below the org's threshold, the request gets a warning header or a 403, depending on enforcement mode. Above, the request flows. The score is recomputed every five minutes against a rolling window. There is no decay function. There is no model. There is a deterministic formula and a cron job.

This post is about why the formula is shaped the way it is, what each component measures, and what we deliberately decided not to do.

The formula.

Five components. Each is normalized to 0 through 200 points. Sum is the composite score, range 0 through 1000. The components are weighted equally on purpose. The same cron writes the composite plus the components to a current-state table for fast reads and an append-on-change-only history table for the time series.

Equal weighting is the load-bearing decision. The moment you weight one component more heavily than another, you are making a judgment call that should be a customer's, not a vendor's. The override knob is "set the threshold."

The five components.

1. Contract validity.

Does the source have an active data contract in Open Data Contract Standard v3? Does the contract validate against the live source schema? Are the contract's column definitions consistent with the column types the proxy observes when it queries?

A source with no contract scores zero on this component. A source with a contract that passes schema validation scores 200. A source with a contract that has minor mismatches scores in between, weighted by the percentage of columns that align.

2. Freshness SLA.

Every contract declares a freshness expectation per logical entity ("orders are at most 15 minutes stale"). The proxy compares the contract's commitment to the actual observed lag at query time. Score is 200 when freshness is within SLA across the rolling window. Score degrades as observed lag exceeds the SLA, with a hard floor at zero.

Stale data is a trust problem. An agent that gets a stale response and confidently acts on it is producing a wrong outcome that the audit log will tie back to a governable component.

3. Schema stability.

How often is the upstream schema drifting against the contract? A source whose schema has not drifted in the rolling window scores 200. Each drift event reduces the score by an amount proportional to the severity of the drift (added column lower than removed column, lower than type change, lower than column rename).

Schema instability is operationally interesting on its own. As a trust component it is powerful: a source that keeps changing shape underneath you is a source whose contract you cannot rely on for downstream policy decisions.

4. Availability.

What percentage of requests in the rolling window completed without a connector-level error? 200 at 100% availability. Linear degradation. Connector-level errors are the right denominator: a source that the agent cannot reach is not just an availability issue, it is a governance issue, because it removes the proxy's ability to enforce policy on data the agent might otherwise be tempted to fabricate.

5. Ownership.

Does the source have a named owner in the contract? Has the owner reviewed and signed off on the contract within the rolling review window? Is there a fallback owner for out-of-office handling?

Ownership is the soft component, but it is the load-bearing one for compliance posture. An auditor will accept a low score on freshness and a fix plan; an auditor will not accept a source with no named owner. We weight it the same as the others to keep the formula clean and let the threshold do the work.

What the proxy does with the score.

Org-level enforcement is configured in two values: a threshold (default 400) and a mode (default warn). Per-source overrides for both are stored in a separate table and resolve via a clean fallback chain: the override wins, otherwise the default applies.

On every governed query, the proxy compares the source's composite score to the effective threshold. Score equal to or above threshold: pass. Score below threshold and mode is warn: response carries an X-DataDam-Trust-Warning header so the agent can log and continue. Score below threshold and mode is block: 403, and the audit row carries the full breakdown of which components contributed to the deny.

Sources that have not yet been scored (cron warm-up, brand-new sources) default to ALLOW. We default to permissive at warm-up because a default-deny on a fresh source creates a chicken-and-egg problem the customer will solve by misconfiguring the threshold.

What we did not build.

Five things we deliberately rejected during design. Each one is the kind of feature someone will ask for, and each one is the kind of feature that turns the score into a suggestion.

We did not add a component-weighting knob. The formula is equal-weighted by design. If a customer cares more about freshness than ownership, they should set thresholds per source, not reweight the score globally.

We did not add a model-based "anomaly" component. The score is descriptive of source state, not predictive. Anomaly detection is a separate signal that runs on the audit log, not a component of the trust score.

We did not add a decay function. The rolling window is the time axis. A score that decays without new evidence is a fiction. A score recomputed against a rolling window is a measurement.

We did not add a global "best practices" component that would compare the source to a platonic ideal. The trust score is comparable across customers only at the margins; it is most useful when used by one customer to compare two of their own sources to each other.

We did not add an LLM-based explanation. When the score deviates, the proxy returns the component breakdown. The user reads the breakdown. There is no narrative layer.

What it took to get here.

About six weeks of engineering. The cron is one Lambda. The current-state and history tables together are 50 lines of SQL. The proxy push path adds about 200 lines on the TypeScript side and 80 lines on the Python side. The hardest part was the cross-org enumeration security definer function and the append-on-change-only history table; the rest is bookkeeping.

That is the right ratio. A trust score whose underlying engineering is harder to review than the score itself is a trust score that has lost the plot. Read the migrations on GitHub if you want to see the production code.

← More from the blog