back to delimit.ai
Worked-example report v1 / Delimit team / 2026-06-17 / self-governance

We don't just sell the merge gate. We ship our own releases through it.

A real release, through our own merge gate.

On 2026-06-17 we shipped delimit-cli@4.11.0 then delimit-cli@4.11.1 (compiled Pro engine proModuleVersion 3.10.0) — npm-installable right now. Every merge that went into it passed through Delimit's own governance: multi-model deliberation, signed evidence, and a deploy-gate chain. Not a demo.

What we shipped
delimit-cli
4.11.0 → 4.11.1
How it was reviewed
5 PRs
3 models each
Security audit
0 / 0 / 0 / 0
crit / high / med / low
Verdict
shipped
1 strategic held
The dogfooding point. The other reports in this series run the merge gate against external API surfaces. This one points it at ourselves. The external promise is the merge gate for AI-written code, with a signed, replayable attestation. The test of that promise is whether we hold our own releases to it. On 2026-06-17 we did: five pull requests, each adjudicated by three independent models; a clean security audit with a signed evidence bundle; a deploy-gate chain run before each publish; compiled-engine guards re-asserted per artifact; and post-ship verification that a free user is gated on both the Python-fallback path and the live compiled-.so path. The artifact a customer hands an auditor is the same kind of artifact this release produced for us.
Why a self-governance report. External reports answer whether the gate works on someone else's API. This one answers a harder question: when we ship a real, paid, customer-facing release, does the same governance run on us?
  • The release is real and live: npm install -g delimit-cli installs 4.11.1 today, with the compiled Pro engine at proModuleVersion 3.10.0.
  • Each of the five merges was adjudicated by Claude Opus 4.8, GPT-5.3-codex, and Grok 4.3 independently, under the deliberation-as-review pattern.
  • Each deliberation's full transcript is committed to the audit trail (e.g. 2026-06-17-pr237-pr238-merge-review.md) and its path is embedded in the merge commit body.

What we did

We took five pull requests that needed to ship for the 4.11.0 / 4.11.1 release and ran each through Delimit's own governance before merge. The delimit-ai org has a single human contributor, so the GitHub review-required check assumes a second reviewer that does not exist. Under the deliberation-as-review pattern, a fresh unanimous multi-model verdict over the actual diff substitutes for that second human reviewer — but only for operational, infracore-authored, green-CI changes, and never for pricing, naming, public-facing copy, or doctrine.

For each PR we ran a multi-model deliberation over the diff and the PR description with three independent models: Claude Opus 4.8, GPT-5.3-codex, and Grok 4.3. The verdict had to be unanimous to merge; a non-unanimous result stops the auto-merge and surfaces to the founder. Every transcript was committed to the audit trail, and the transcript path was written into the merge commit body so the substitute review is reconstructable from git history alone, not asserted.

Before each npm publish we ran the full deploy-gate chain — security audit, tests, changelog, then the authoritative publish.yml dry-run, then the tag. The compiled Pro engine build re-asserted its ungate and bypass-identifier guards on every artifact, and the published tarball checksum was verified against the compiled artifact. After the release went live we verified a free user is gated end to end on both the Python-fallback path and the live compiled-.so path.

Headline numbers

Pull requests adjudicated
5
all unanimous AGREE
Models per deliberation
3
independent
Security audit (shipped code)
0 / 0 / 0 / 0
crit / high / med / low
Versions published
4.11.0 / 4.11.1
engine 3.10.0

The three models on every deliberation were Claude Opus 4.8, GPT-5.3-codex, and Grok 4.3. The five pull requests, by verdict: PR #235 + #134 (leaky-gate revenue-consistency fix + Golden-Path docs) — unanimous AGREE; PR #236 (security_audit duplicate-finding dedup) — unanimous AGREE; PR #237 + #238 (Twitter freshness gate + Pro-gate denial telemetry) — unanimous AGREE. The security audit on the exact shipped code came back clean across every band, with a signed evidence bundle collected. Both publishes ran the deploy-gate chain end to end, and post-ship a free user was confirmed gated on both the Python-fallback and the live compiled-.so path.

Not everything reached consensus, and that is in the record on purpose. A separate strategic/roadmap deliberation the same day hit MAX-ROUNDS with no consensus and was correctly held — not shipped. During the audit work a model hallucinated a phantom security finding that was caught and corrected before it could enter the evidence bundle. Governance that fails closed and catches confabulation is more credible than governance that is always unanimous.

Findings

3 multi-model adjudications, 4 deploy-gate steps, 1 fail-closed observation. Each finding cites the exact pull requests, the models that adjudicated, and what the gate recorded. There is no "breaking" partition on this report because it is not an API diff; it is the record of how a real release moved through the gate.

  1. multi-model adjudicationfinding F1
    change type: deliberation_as_review (PR #235 + #134 — unanimous AGREE)
    surface: leaky-gate revenue-consistency fix (delimit-ai/delimit-mcp-server#235) + Golden-Path docs (delimit-ai/delimit-gateway#134)

    The first pair of merges closed a revenue-consistency gap in the Pro gate and shipped the Golden-Path onboarding docs. The delimit-ai org has one human contributor, so a single GitHub review-required check cannot be satisfied the way a multi-person team satisfies it. Under the deliberation-as-review pattern, a fresh unanimous multi-model verdict over the actual diff substitutes for the second human reviewer. Three independent models — Claude Opus 4.8, GPT-5.3-codex, and Grok 4.3 — read the diff and the PR description and each returned AGREE. The verdict was unanimous, the full transcript was committed to the audit trail, and the transcript path was embedded in the merge commit body so the substitute review is reconstructable from history rather than asserted in passing.

  2. multi-model adjudicationfinding F2
    change type: deliberation_as_review (PR #236 — unanimous AGREE)
    surface: security_audit duplicate-finding dedup (delimit-ai/delimit-mcp-server#236)

    The second merge deduplicated repeated findings in the security-audit output so a single underlying issue is reported once rather than fanned out across every file it touches. The same three models adjudicated the diff independently and each returned AGREE. The dedup change is small and bounded, which is exactly the kind of operational, infracore-authored, green-CI change the deliberation-as-review carve-out is scoped to: it is not pricing, naming, public-facing copy, or doctrine, so a unanimous panel verdict over the diff is a valid stand-in for the missing second reviewer. Transcript committed, path embedded in the merge commit.

  3. multi-model adjudicationfinding F3
    change type: deliberation_as_review (PR #237 + #238 — unanimous AGREE)
    surface: Twitter freshness gate + Pro-gate denial telemetry (delimit-ai/delimit-mcp-server#237, #238)

    The third pair added a freshness gate to the social auto-post path and denial telemetry to the Pro gate so a free user hitting a Pro-only surface is recorded as a measurable denial rather than a silent failure. Three models adjudicated the combined diff and each returned AGREE. The deliberation transcript for this pair is committed at a named path in the audit trail (2026-06-17-pr237-pr238-merge-review.md) and that path is written into the merge commit body. The point of embedding the path is concrete: a future auditor reading git history alone can locate the exact transcript that stood in for a second reviewer on this specific merge, open it, and read the three models reasoning to consensus. The review is not a claim in the commit message; it is a pointer to the evidence.

  4. deploy gatefinding F4
    change type: security_audit (0 critical / 0 high / 0 medium / 0 low) + signed evidence bundle
    surface: the exact shipped delimit-cli@4.11.0 / 4.11.1 source tree

    Before each publish, a security audit ran against the exact code that would ship — not a representative sample, the tree itself. The audit came back clean across every severity band: zero critical, zero high, zero medium, zero low. A signed evidence bundle was collected for the run, so the clean verdict is an artifact a reader can verify rather than a sentence in a changelog. The discipline here is the part worth naming: a clean audit is as much an attestable artifact as a failing one. The bundle records what was scanned, what version of the scanner ran, and the empty findings set, signed, so the absence of findings is itself recorded evidence.

  5. deploy gatefinding F5
    change type: deploy_gate_chain (security audit -> tests -> changelog -> dry-run -> tag)
    surface: each npm publish of delimit-cli@4.11.0 and delimit-cli@4.11.1

    npm publish is a production deploy: every publish goes to real users. Each one ran the full deploy-gate chain in order — security audit, then the test suite, then the changelog, then the authoritative publish.yml dry-run, then the tag. The dry-run is the gate that matters most because it exercises the real publish workflow against the real bundle without shipping, so a packaging defect surfaces before the tag rather than after the artifact is on the registry. Only after the dry-run passed clean was the version tag pushed to trigger the real publish. The chain is the same fail-closed shape Delimit ships externally: every gate must pass in sequence, and any failure stops the release before it reaches a user.

  6. deploy gatefinding F6
    change type: compiled_engine_guards (ungate + bypass-identifier re-assert per artifact) + tarball checksum verify
    surface: the compiled Pro engine .so build (proModuleVersion 3.10.0) and the published tarball

    The Pro engine ships as a compiled .so artifact, not as readable source, which means the gating logic inside it cannot be eyeballed in a diff the way the CLI can. So the build re-asserts the guards on every artifact it produces: the ungate check (that the engine is not silently shipping the Pro surface for free) and the bypass-identifier check (that no build-time bypass flag survived into the shipped binary) run against each compiled artifact, not once at the top of the matrix. After the build, the published tarball checksum is verified against the artifact that was actually compiled, so the bytes a customer downloads are the bytes that passed the guards. This closes the gap where a compiled artifact could pass review in source form but ship altered.

  7. deploy gatefinding F7
    change type: post_ship_verification (free user gated on Python-fallback AND live compiled-.so path)
    surface: the published delimit-cli@4.11.1 install, both execution paths

    After the release was live on npm, the gating was verified end to end against the actual published install — not the local working tree. A free user was confirmed gated on both execution paths the CLI can take: the Python-fallback path (used when the compiled engine is unavailable for the host platform) and the live compiled-.so path (the optimized path the build produces). Verifying both matters because a gate that holds on one path and leaks on the other is a revenue leak that source review alone would not catch; the two paths are different code. Confirming a free user is correctly gated on both, against the shipped artifact, is the post-ship evidence that the pre-ship guards actually held in production.

  8. fail-closed / dissentfinding F8
    change type: held_no_consensus (MAX-ROUNDS, correctly NOT shipped) + caught_confabulation
    surface: a separate same-day strategic/roadmap deliberation, and a model-hallucinated security finding

    On the same day, two things did not go to plan, and both are reported here because governance that fails closed and catches confabulation is more credible than governance that is always unanimous. First: a separate strategic/roadmap deliberation hit MAX-ROUNDS with no consensus. It was correctly held — not shipped, not forced through, not resolved by a tie-break that pretended at agreement. A no-consensus result is a stop, not a go. Second: during the audit work a model hallucinated a phantom security finding — an issue that did not exist in the code. It was caught and corrected rather than written into the evidence bundle. The two operational merges above shipped on genuine unanimous verdicts over real diffs; this strategic deliberation did not reach one, so nothing shipped from it. The gate that blocks your release when the models disagree is the same gate that blocks ours.

What this demonstrates

The moat is three things, and this release shows each of them rather than asserting it.

  1. An adjudicated-evidence corpus. Every enforced decision on this release — each unanimous verdict, the clean security audit, each deploy-gate pass — was captured as a record and is replayable. The verdicts are not summarized in a changelog; the transcripts are committed and their paths are in the merge commits. The record grows with every release that goes through the gate.
  2. Multi-model deliberation. Three independent models reached consensus on each of the five merges, and where they did not reach consensus — the separate strategic deliberation — the result was a hold, surfaced with the dissent intact rather than averaged away. Disagreement is a signal the system preserves, not noise it suppresses.
  3. Signed, replayable attestation. The clean security audit was collected as a signed evidence bundle. That is the audit-grade artifact a customer hands an auditor, an LP, or a D&O underwriter: a record they can verify without rerunning the gate and without taking anyone's word for the verdict.

Hold all three and the claim is not marketing — it is the same governance that ran on our own release, available to run on yours.

What this report is not

Not a claim that the gate is infallible. The opposite, in fact: the most credible line in this report is that a strategic deliberation the same day failed to reach consensus and was held, and that a model hallucinated a security finding that the process caught and threw out. A gate that always agrees with itself is not a gate. The five operational merges shipped on genuine unanimous verdicts over real diffs; the strategic question did not reach one, so nothing shipped from it. That is the system working, not the system failing.

Not a claim of a live shareable attestation link for these specific merges. The attestation producer for this release class is not built yet, so there is no public /att/<id> URL to point at for PR #235 through #238. What a signed, replayable attestation looks like end to end — including the signed verdict and the public transparency-log entry — is shown in the signed-attestation walkthrough. This report is the governance-of-our-own-release surface of the same primitive that powers the merge gate for AI-written code.

The attestation artifact

The audit-grade record this release produced is the clean security audit collected as a signed evidence bundle, plus the five committed deliberation transcripts whose paths are embedded in the merge commits. Read together, they let any reader reconstruct how the release moved through the gate: which models adjudicated each merge, what they concluded, that the shipped code scanned clean, and that the deploy-gate chain passed in order before each publish. A clean verdict is as much an artifact as a fail; a clean verdict backed by a signed bundle and five reconstructable transcripts is the dense form.

For what the fully-signed, transparency-logged attestation looks like on a single PR — the form a customer hands an auditor — see the signed-attestation walkthrough. This report is the self-governance surface of the same primitive that powers the merge gate for AI-written code.

Reproduce locally

The release itself is the most reproducible part: install it and verify the version and the Pro-gate behavior yourself.

# Install the shipped release
npm install -g delimit-cli

# Confirm the published version
delimit --version          # 4.11.1

# Run it on your own repo — the diff + lint + gate is free
npx delimit-cli init

# A free user hitting a Pro-only surface is gated, by design:
delimit deliberate         # prompts to upgrade; multi-model
                           # deliberation is the Pro tier

The merge gate (diff, lint, semver classification) is free and runs on your repo with no account. The multi-model deliberation and the signed, replayable attestation are the Pro tier — the same Pro surface that gated correctly on both execution paths in this release's post-ship verification.

Run it on your repo

Run it on your repo — npx delimit-cli init. The merge gate for AI-written code is free for individual maintainers; the GitHub Action is on the Marketplace at delimit-ai/delimit-action. The multi-model deliberation and the signed, replayable attestation are Pro.

The signed, replayable attestation is the artifact your reviewers, auditors, or underwriters can read without rerunning the gate. The same governance that shipped delimit-cli@4.11.0 / 4.11.1 is the governance you can point at your own releases.