back to delimit.ai
Worked-example report v1 / Delimit team / 2026-05-08

What 76 days of Anthropic API evolution looks like under a cross-vendor merge gate

A real cross-vendor merge gate, end to end.

Delimit ran the same diff engine and 27-type taxonomy against the public Anthropic OpenAPI between SDK-pinned snapshots 7ecac54 (2026-02-19) and 2c15407 (2026-05-06). 88 surface deltas, semver major, info.version absent because the version contract is carried in the anthropic-version header.

What we scanned
Anthropic API
SDK-pinned OpenAPI
What we found
34 breaking · 42 additive
88 surface deltas
Time window
76 days
7ecac54 → 2c15407
Semver verdict
major
info.version absent
The cross-vendor point. Anthropic is one of the four model vendors named on the delimit.ai homepage, and report #6 in this series ran the same gate against the OpenAI public OpenAPI. Pairing the two reports under one taxonomy is the proof of cross-vendor symmetry: the same diff engine, the same 27-type change taxonomy, and the same signed attestation run against either vendor's public spec without a vendor-specific code path. The merge gate does not care which AI lab wrote the spec at the wire. The same primitive that gates a pull request to your service runs here against the API for the model that gates the pull request.
Why a worked example. A public report against a recognizable AI-lab API is the cleanest way to show what the gate sees on a fast-moving contract.
  • We use the SDK-pinned form (content-addressed Stainless storage URLs) on both ends so the input does not move.
  • No relationship to Anthropic; no coordination on this report; the public spec is the only input.
  • The byte-identical inputs are the right shape for a reproducible attestation.

What we did

We cloned github.com/anthropics/anthropic-sdk-python and walked the git history of .stats.yml, which the Stainless code generator updates on every spec push with the current openapi_spec_url, openapi_spec_hash, and configured_endpoints count.

The earliest commit on that file in the second-half-of-Q1 window is 7ecac54 (2026-02-19, feat: Add top-level cache control), pinned at configured_endpoints: 34 and content hash 29a6b7ba. The most recent commit at the time of this report is 2c15407 (2026-05-06, fix(api): Adjust webhook configuration), pinned at configured_endpoints: 97 and content hash 0df2c793.

We pulled both content-addressed Stainless storage URLs and treated each as the input to one side of the diff. Each spec was passed to delimit lint in its standard configuration. The diff engine classified each change against its 27-type taxonomy and the semver classifier produced a bump recommendation.

Headline numbers

Total surface deltas
88
Breaking
34
Operations (old / new)
47 / 97
Semver bump
major

The path count went from 31 to 61 (+30 paths), the operation count from 47 to 97 (+50 operations), and the schema count from 430 to 733 (+303 schemas). The diff engine classified 88 changes: 34 breaking, 42 additive, 0 patch. The largest single category is 37 endpoint_added events, every one of them under ?beta=true on six new product surfaces (Managed Agents, Sessions, Memory Stores, Vaults, Environments, User Profiles). The breaking partition is dominated by 7 endpoint_removed events plus 15 field_removed events, all on the GA (non-beta) Files and Skills surface that narrowed back to beta-only (see findings F2 and F6). The advertised info.version field is absent on both ends: Anthropic carries its version contract in the anthropic-version request header rather than in the OpenAPI document, which is a clean, spec-neutral choice and the reason the gate's independent semver classification (major) is the authoritative version signal that fell out of the diff.

Findings

3 breaking, 4 additive, 0 flagged as spec hygiene. Each finding cites the exact change-type from the 27-type taxonomy, the surface affected, and the consumer impact. The two breaking findings (F2 endpoint_removed and F6 field_removed) are the same underlying story told from both sides: the GA Files and Skills surface narrowed back to beta-only in the window, and the gate flags both the path-side and the schema-side of that contraction.

  1. additivefinding F1
    change type: endpoint_added (37 new beta paths across six new product surfaces)
    surface: /v1/agents, /v1/sessions, /v1/sessions/{session_id}/threads, /v1/sessions/{session_id}/events, /v1/memory_stores, /v1/memory_stores/{memory_store_id}/memories, /v1/memory_stores/{memory_store_id}/memory_versions, /v1/vaults, /v1/vaults/{vault_id}/credentials, /v1/environments, /v1/user_profiles (all under ?beta=true)

    Thirty-seven new paths landed in the window, every one of them under the ?beta=true header gate. The shape of the additions is the Anthropic agent-platform stack arriving in plain text: a Managed Agents surface (agents, agent versions, agent archive), a Sessions surface for stateful multi-turn conversation context (sessions, session threads, session events with both poll and stream variants, session resources), a Memory Stores surface for content-addressed long-term memory with versioning and per-version redact (memories, memory versions, memory version redact), a Vaults surface for credential brokerage including an mcp_oauth_validate operation (vaults, credentials, archive), an Environments surface for sandbox configuration, and a User Profiles surface with an enrollment_url generator. Each surface has its full create / list / retrieve / update / archive lifecycle modeled. The diff engine flags every one of these as endpoint_added with is_breaking=false. The artifact this finding describes is exactly which beta surfaces a downstream consumer can begin to integrate against, with the exact path inventory.

  2. breakingfinding F2
    change type: endpoint_removed (7 GA paths narrowed back to beta only)
    surface: /v1/files, /v1/files/{file_id}, /v1/files/{file_id}/content, /v1/skills, /v1/skills/{skill_id}, /v1/skills/{skill_id}/versions, /v1/skills/{skill_id}/versions/{version}

    Seven paths were removed in the window. On inspection, every one of them is the GA (non-beta) variant of a path whose ?beta=true twin still exists on both ends of the window. Files (list, retrieve, retrieve content, delete) and Skills (CRUD plus versions CRUD) were each modeled both ways in the older snapshot; the newer snapshot keeps the ?beta=true variants and drops the un-headered GA variants. The wire-shape reading is that Files and Skills retreated from a soft-GA stance to a beta-only stance: a consumer who was calling these surfaces without the anthropic-beta header will now get a 404 or routing error, and the migration is to add the appropriate beta header to the request. The merge gate flags each removal as is_breaking=true regardless of stability tier, which is the correct verdict for a wire change of this shape. The gate output is what makes the contraction visible without diffing 30,000 lines of YAML.

  3. additivefinding F3
    change type: enum_value_added (advisor tool, xhigh effort) plus Model union additions
    surface: #/components/schemas/BetaRequestServerToolUseBlock.name, #/components/schemas/BetaResponseServerToolUseBlock.name, #/components/schemas/BetaEffortLevel, #/components/schemas/EffortLevel, plus the Model anyOf union (claude-opus-4-7, claude-mythos-preview, claude-opus-4-1)

    Two enum_value_added events landed on closed enums: advisor on the server-tool-use name field (a new beta server-side tool the model can call) and xhigh on the EffortLevel enum (a new ceiling above the prior high tier). Both are non-breaking by the rules of enum widening. Separately, the Model schema is an anyOf [string, const ...] union where the string branch keeps any model identifier accepting; the const list shifted across the window to add claude-opus-4-7 (the current frontier slug, on the new side), claude-opus-4-1, and claude-mythos-preview, and to drop a set of older const entries (claude-3-5-haiku, claude-3-7-sonnet, claude-3-opus, the dated 2025-05 opus and sonnet 4 slugs). Because the union keeps the open-string branch, dropping a const is not a wire break: the API still accepts the older slug as a string, and the server-side end-of-life policy (anthropic-version header plus model deprecation schedule) is what governs whether the call succeeds. The diff engine correctly does not flag the const-list churn as enum_value_removed; the version-stability primitive on Anthropic is the request header, not the schema constraint.

  4. breakingfinding F4
    change type: required_field_added (12 new required fields across Message, ModelInfo, MessageDelta, BetaCompactionContentBlockDelta, BetaResponseCompactionBlock)
    surface: #/components/schemas/Message.stop_details, #/components/schemas/MessageDelta.stop_details, #/components/schemas/BetaMessage.stop_details, #/components/schemas/BetaMessageDelta.stop_details, #/components/schemas/ModelInfo.{max_input_tokens, max_tokens, capabilities}, #/components/schemas/BetaModelInfo.{max_input_tokens, max_tokens, capabilities}, #/components/schemas/BetaCompactionContentBlockDelta.encrypted_content, #/components/schemas/BetaResponseCompactionBlock.encrypted_content

    Twelve fields became required in the window. The largest cluster is stop_details on the Message and MessageDelta shapes (both GA and Beta variants), reflecting the formalization of the structured stop-reason record on message responses (commit a304ccc, 2026-04-01, feat(api): add structured stop_details to message responses). The next cluster is ModelInfo and BetaModelInfo: max_input_tokens, max_tokens, and capabilities all became required, so a consumer that lists models now gets the explicit token-budget contract and capability matrix on every entry. The third cluster is on the Beta compaction-block path: encrypted_content became required on BetaCompactionContentBlockDelta and BetaResponseCompactionBlock, formalizing the encrypted intermediate-state contract for the compaction stream. A consumer that does not surface these fields on serialization will fail validation against the new spec; a consumer that reads the new spec and updates its parser knows exactly which response shapes need to deserialize the new required keys. The visibility of the required-field list in the attestation is the artifact.

  5. additivefinding F5
    change type: deprecated_added (sampling parameters flagged across four request schemas)
    surface: #/components/schemas/CreateMessageParams.{temperature, top_k, top_p}, #/components/schemas/CreateMessageParamsWithoutStream.{temperature, top_k, top_p}, #/components/schemas/BetaCreateMessageParams.{temperature, top_k, top_p}, #/components/schemas/CompletionRequest.{temperature, top_k, top_p}

    Twelve deprecated_added events landed across the request-parameter surface, all on the three classical sampling controls: temperature, top_k, and top_p. Each was flagged with deprecated=true on the GA Messages request, the Messages-without-stream request, the Beta Messages request, and the legacy Completion request. The fields are still accepted on the wire (the deprecation marker is non-breaking by spec convention), and a consumer that sends temperature today keeps working; the signal is that the sampling-control surface is being narrowed in favor of effort-level and the new EffortLevel.xhigh tier from F3, which gives the model a structured intelligence-budget knob rather than a numeric sampling-temperature knob. This is an intentional surface evolution: announce the deprecation in the spec on the way to a future surface change. The non-breaking flag is correct, and the visibility of the deprecation is the artifact a downstream client uses to plan its migration.

  6. breakingfinding F6
    change type: field_removed (15 schemas in the Skills and Files components surface)
    surface: #/components/schemas/Skill, #/components/schemas/SkillVersion, #/components/schemas/CreateSkillResponse, #/components/schemas/CreateSkillVersionResponse, #/components/schemas/GetSkillResponse, #/components/schemas/GetSkillVersionResponse, #/components/schemas/ListSkillsResponse, #/components/schemas/ListSkillVersionsResponse, #/components/schemas/DeleteSkillResponse, #/components/schemas/DeleteSkillVersionResponse, #/components/schemas/Body_create_skill_v1_skills_post, #/components/schemas/Body_create_skill_version_v1_skills__skill_id__versions_post, #/components/schemas/FileMetadataSchema, #/components/schemas/FileListResponse, #/components/schemas/FileDeleteResponse

    Fifteen component schemas were removed in the window. Every one of them is the GA (non-Beta-prefixed) Skill, SkillVersion, or File response or request body shape that backed the seven removed GA paths in F2. The Beta-prefixed equivalents (BetaSkill, BetaFileMetadataSchema, and the like) still exist on both ends, which is consistent with the F2 reading that the GA Files and Skills surfaces narrowed back to beta-only. The diff engine flags each removal as is_breaking=true; the wire-shape reading is that a code generator that pinned to the GA Skill type needs to repoint at BetaSkill (and add the anthropic-beta header to the call), and a pinned-to-Beta consumer is unaffected. This is the textbook shape of a stability-tier contraction: the GA surface gets retracted, the beta surface keeps shipping, and the merge gate output names exactly which schema identifiers a downstream client has to update.

  7. additivefinding F7
    change type: optional_param_added (scope_id query parameter on /v1/files?beta=true GET)
    surface: /v1/files?beta=true GET, query parameter scope_id

    One optional parameter landed: scope_id on the GET /v1/files?beta=true list operation. The diff engine flags this as is_breaking=false because optional-parameter additions are non-breaking by spec convention (a client that does not send the parameter keeps the prior server-default behavior). Consumer impact is the new ability to filter the file list by scope identifier; existing list calls keep returning the prior result set unchanged. This is the smallest delta in the report and the cleanest shape: one new optional knob, zero migration cost.

What this report is not

Not a defect claim. Not a security advisory. Not a judgment of the Anthropic team's release process. Anthropic ships a deeply-watched API surface against an installed base measured in millions of integrations, under a window that introduced six new beta product surfaces (Managed Agents, Sessions, Memory Stores, Vaults, Environments, User Profiles), the claude-opus-4-7 model tier, the claude-mythos-preview research preview, and a structured stop-reason contract on every message response. The changes flagged above are the textbook shape of an AI-lab API surface evolving fast: 37 new beta paths, 12 new required fields that name the explicit token-budget and capability contract on ModelInfo, 12 deprecation flags on the classical sampling controls (temperature, top_k, top_p) on the way to the new EffortLevel.xhigh tier, and one stability-tier contraction (Files and Skills retracting from GA back to beta-only). The merge gate flagged a major-class semver bump and 34 breaking changes; both reflect the actual shape of the diff.

The findings above do not say Anthropic did anything wrong, and they do not say Anthropic did anything special either. They say: here is exactly which paths landed, which fields became required, which enums widened, which sampling parameters were flagged for deprecation, and which surfaces were narrowed in 76 days. That visibility is the artifact. A downstream consumer who reads this attestation knows exactly which client code paths to update, and an auditor knows exactly what shipped without taking anyone's word for it.

The attestation artifact

A Delimit attestation is a bounded evidence record at a single commit pair (or a commit plus a content-addressed live snapshot, as in this report). The same Delimit version run against the same two inputs produces the same bytes; that is the replayable property. The attestation does not opine on whether a change should have shipped, only on what shipped and how the change-type taxonomy classifies it. A clean pass is as much an artifact as a fail; a major-class fail with a 34-line breaking-change inventory is an even denser artifact, because every one of those lines is something a downstream consumer needs to know.

For the precise list of checks, the explicit out-of-scope list, and the reproducibility guarantee, see the attestation methodology v1. This report is the OpenAPI-diff surface of the same primitive that powers the merge gate for AI-written code.

Reproduce locally

Anyone can re-run the analysis above against the same two content-addressed snapshots and verify the same diff comes out. The full command sequence:

# Install the CLI
npm install -g delimit-cli

# Pull the SDK-pinned old spec (uploaded 2026-02-18, pinned 2026-02-19)
curl -o /tmp/anthropic-old.yml \
  "https://storage.googleapis.com/stainless-sdk-openapi-specs/anthropic%2Fanthropic-29a6b7ba51942cd606e5bf4b533e5aac1bef42f6d4b1f7f45f756304cf676782.yml"

# Pull the SDK-pinned new spec (uploaded 2026-05-06, pinned 2026-05-06)
curl -o /tmp/anthropic-new.yml \
  "https://storage.googleapis.com/stainless-sdk-openapi-specs/anthropic/anthropic-0df2c793ea4c3ad955e8e488be39d7041a0a95e2fe144dd69ae4d9fb72835190.yml"

# Run the merge gate
delimit lint /tmp/anthropic-old.yml /tmp/anthropic-new.yml

If the bytes you get differ from the bytes in this report, that is itself a finding worth reporting; raise it on the Delimit repo and we will look. The two URLs above are content-addressed: the spec-hash component of each URL is the SHA-256 of the spec bytes, so a different upload would live at a different URL. The current SDK-pinned URL is whatever the most recent .stats.yml commit on github.com/anthropics/anthropic-sdk-python references; pulling the head version will produce a different new-side input and therefore a different diff, which is the expected shape of a moving-target attestation.

For your own API surface

If you ship a public API and want this kind of pre-merge attestation in your CI pipeline, install delimit-cli and run delimit lint <old> <new> against your own specs. The GitHub Action is on the Marketplace at delimit-ai/delimit-action. Free for individual maintainers. Pro tier $10/month for teams.

The signed, replayable attestation is the artifact your reviewers, auditors, or downstream consumers can read without rerunning the gate. A major-class fail with a full breaking-change inventory is worth publishing too; it is what disciplined evolution looks like on a deeply-watched API surface.