unsorry

Gate A Statement-Binding Red Team — Round 002 (2026-06-10)

Round 001 (gate-a-redteam-001.md) closed every soundness hole but recorded a deeper limitation: no Gate A layer bound a library theorem’s statement to its claimed canonical goal, so a vacuously-true or mis-stated theorem under a plausible name still passed (sound but meaningless). That gap is exactly what the statement-binding gate (Stage D, ADR-011 / SPEC-011-A, landed in #123 / #128) was built to close.

Round 002 attacks that gate directly. Three adversarial agents each opened a real PR that keeps the goal’s name and ships an honest-building library term whose type is not the goal’s statement, while pointing the index entry at the goal’s real canonical sha — the smuggle is the asymmetry between what the index claims and what the library proves. A fourth PR is an honest control. An independent adjudicator verified every PR’s CI state from gh pr checks / gh run view --log-failed. An attack is “blocked” iff gate-a is RED (cannot merge); the control must be GREEN; a survivor is any attack PR whose gate-a is green.

Verdict

The ADR-011 binding gate holds against every vector. 3/3 attacks blocked, 0 survivors; the honest control passes gate-a. Each attack is caught at the regenerated statement-binding obligation: Gate A lowers the goal’s canonical statement to a Lean obligation <name>_binding_check : <goal-∀-type> := <library-name> and type-checks it. Because the obligation demands the library term inhabit the goal’s type, a vacuous True, a specialised 0 ≤ 0, and an autoImplicit-vacuous ∀ {p} {n}, p → ¬p → 0 ≤ n all fail to type-check — the meaningfulness gap recorded in round 001 is now gated, not merely reviewed.

Per-vector results

Vector	PR	gate-a	Caught by	Evidence (the binding obligation rejected the smuggle)
vacuous-true	#124	❌ fail	statement-binding obligation (built under `--wfail`)	`✖ [575/591] Building Unsorry.NatMulOneThmBinding` → `nat_mul_one_thm has type True but is expected to have type ∀ (n : Nat), n * 1 = n` → Lean exited 1 (run 27300731227)
weakened-hyp	#125	❌ fail	statement-binding obligation (built under `--wfail`)	`✖ [595/600] Building Unsorry.NatZeroLeBinding` → `nat_zero_le_thm has type 0 ≤ 0` (expected `∀ (n : Nat), 0 ≤ n`) → build failed (run 27300741087)
autoimplicit-vacuity	#127	❌ fail	statement-binding obligation (built under `--wfail`)	`✖ [594/600] Building Unsorry.NatZeroLeBinding` → `nat_zero_le_thm has type ?m.5 → ¬?m.5 → 0 ≤ ?m.6` (expected `∀ (n : Nat), 0 ≤ n`) → build failed (run 27300765050)

Control:

Vector	PR	gate-a	Result	Evidence
control-correct	#126	✅ pass	honest proof not false-rejected	`nat_mul_one_thm := Nat.mul_one n`; binding obligation regenerated to `∀ (n : Nat), n * 1 = n` and type-checked: `✔ [589/600] Built Unsorry.NatMulOneThmBinding`; gate-a passed (run 27300751229, 5m30s)

All four PRs are OPEN, unmerged, autoMerge=null — verified, nothing merged or closed by the adjudicator.

Why the binding step is the load-bearing layer here

Every attack module built fine on its own (✔ Built Unsorry.NatMulOneThm, ✔ Built Unsorry.NatZeroLe) — these are honest, sound Lean terms; round-001’s sorry/axiom layers have nothing to bite on. The defense is the binding obligation, regenerated by tools.gate_a.check_statement_binding generate from the goal’s canonical AISP statement, then type-checked in the --wfail library build:

The obligation <name>_binding_check : <goal-type> := <library-name> forces the merged term to inhabit the goal’s type, not merely a type. True, 0 ≤ 0, and the autoImplicit metavariable type ?m → ¬?m → 0 ≤ ?m are all distinct from ∀ (n : Nat), …, so elaboration fails with a type mismatch on exactly the tampered module — every other *Binding.lean in all three runs built cleanly, isolating the failure to the smuggle.
The obligation is CI-ephemeral and regenerated from the goal, never read from the PR: no *Binding.lean is committed on any attack branch (verified via git ls-tree). This is the non-bypassable-by-construction property — an attacker cannot ship a forged-passing binding, because Gate A overwrites it from the canonical goal before building.
The index goal/sha linkage that round 001 called advisory is now load-bearing: the binding generator keys off the goal the index entry claims, so claiming the goal’s real sha is precisely what summons the obligation that catches the mismatched body.

The autoimplicit-vacuity vector (#127) is the explicit replay of round 001’s PR-#64 survivor class. In round 001 it was only a soundness-clean vacuity that check_library_options had to catch as a forbidden option; in round 002 the binding obligation rejects it one step earlier on type-mismatch (?m → ¬?m → 0 ≤ ?m ≠ ∀ (n : Nat), 0 ≤ n). check_library_options is sequenced after the --wfail build and so never ran (the build halted the job first); the agent verified locally that it would also exit 1 on the payload, so the defense is layered, not single-point.

Caveats (declared, not silently estimated)

gate-b also failed on all three attacks and on the control. This is orthogonal to the gate under test. On the control it is a real Gate B invariant, not a binding result: GB005 rejects goals/nat-mul-one-thm.aisp: status≡proved requires sha as 64 lowercase hex — the red-team task flipped status to proved while leaving the goal record’s content-address sha≜∅, which Gate B’s validator rejects (a correctly-merged proof, e.g. nat-mul-comm-thm.aisp, also carries the sha). gate-b’s state does not change any verdict here: the authoritative result for every vector is gate-a’s binding/build outcome, and the control’s gate-a (the gate under test) is green.
The failing CI step is literally named “Library build — zero-warning bar (--wfail)”, because the regenerated binding obligations are type-checked inside that build rather than in a separate step. The cause of each failure is the binding obligation (*Binding.lean), as the per-module trace shows. We classify all three as caught by the statement-binding gate.

Reproduce

The three attack PRs (#124 vacuous-true, #125 weakened-hyp, #127 autoimplicit-vacuity) and the control (#126 control-correct) are OPEN/unmerged, branches redteam2/* retained for audit. Each attack branch carries its tampered library/Unsorry/*.lean, an index entry under library/index/<goal-sha>.aisp claiming the goal’s canonical statement, and the flipped goals/*.aisp. Re-running gate-a on any attack branch regenerates the binding obligation from the goal and fails the --wfail build on the corresponding *Binding.lean.

This site is open source. Improve this page.