unsorry

unsorry — status report (2026-06-12)

What the project has actually achieved, stated against verified ground truth rather than ambition. The house style is honesty: each claim is followed by its limit.

One paragraph

In roughly three days (v0.1.0 → v1.6.2, ~195 merged PRs), unsorry went from a coordination skeleton to a swarm that has proved five mathlib-absent results, demonstrated its two load-bearing mechanisms end-to-end (decomposition and dependency reuse), survived its own operational failure modes (three CLI quota outages in one day, now absorbed unattended; its first gate false negative caught, fixed, and canary-guarded within two hours), passed three adversarial red-team rounds including the CRITICAL hole an external review found, and built — and auto-runs — a policy-compliant pipeline to upstream its novel lemmas to mathlib. The honest frontier is unchanged: every result so far is elementary, the hard-target ceiling is untested, and nothing has landed in the commons yet.

What is true, with verification

The architecture works as designed

The kernel is the only correctness oracle. Every merge passes Gate A (build --wfail, axiom audit against the {propext, Classical.choice, Quot.sound} whitelist, leanchecker kernel replay, a regenerated statement-binding obligation) and Gate B (deterministic AISP hygiene). No human is in the correctness path.
Soundness has been adversarially tested three times and holds: round 001 (9/9 bypass vectors blocked), round 002 (3/3 statement-vacuity attacks blocked by the binding gate), and round 003 (the #190 same-PR goal-tampering attack blocked by goal-statement immutability — the only layer that catches a fully self-consistent rewrite).
The gate’s one observed defect failed in the sound direction (#231): a statement whose named hypothesis follows an implicit binder ({n : ℕ} (hn : 1 < n)) made the regenerated binding obligation trip linter.unusedVariables under --wfail, so Gate A rejected every correct proof of that goal — a false negative, never a false positive. Fixed by scoping the lint suppression to the generated bindings, whose force is the type-check, not lints (#225), and pinned by binder-shape-canary, a permanent sound goal carrying exactly that binder shape whose binding the gate regenerates and rebuilds every run (#233) — a regression now goes red at the gate, not on a contributor’s PR.

It has produced novel, kernel-verified mathematics

Five results verified mathlib-absent before the run and proved by the swarm:

Result	What	How
`nicomachus_sum_cubes`	∑k³ = (∑k)²	direct, 24-line multi-step (phase2-run-001)
`sum_range_pow_four_closed`	Faulhaber k=4 closed form	direct induction
`platonic_schlafli_pairs`	the five Platonic Schläfli pairs	forced depth-3 decomposition, 13 lemmas (phase3-run-001)
`sum_range_cube_eq_triangular_sq`	triangular closed form of Nicomachus	dependency reuse of the above, 4m57s (phase3-run-002)
`not_prime_pow_four_add_four`	n⁴ + 4 is never prime for n > 1 (Sophie Germain)	direct, by external machine `binto-labs` (#221) — the proof that surfaced gate bug #231

Both load-bearing mechanisms are demonstrated, not just built

The chain (v1.4.0): a target too big for one attempt became a depth-3 tree of 13 lemmas, each kernel-verified, recomposed level by level — the parent’s proof is literally the composition of its sub-lemmas.
Compounding (v1.5.0): a merged lemma was imported and invoked by a later proof, turning a difficulty-2 target into a two-line module — plus the chain’s four recompositions are four more instances of the same reuse mechanism.

It survives its operating environment

Born from three real quota outages in one day: the infrastructure-failure guard (ADR-016 — a fast-failed call + failed health probe is never read as goal evidence) and the supervisor (ADR-017 — exponential backoff, PR hygiene, claim guard). The third outage was absorbed with zero queue corruption and no human intervention; the first two had each demoted a whole goal tree and needed manual repair. The progressive effort ladder (ADR-015) spends deep reasoning only where a cheaper attempt failed — 11 of 13 run-001 proofs closed at the cheapest rung.

It is hardened against the external review

Issue #190’s CRITICAL (goal tampering), HIGH (workflow self-protection), MEDIUM (unpinned actions) and LOW (audit corpus) findings are all addressed — goal-statement immutability (ADR-018), SHA-pinned actions + CODEOWNERS + a settings checklist (ADR-019), and an opaque audit fixture. The review’s own verdict — “serious, well-engineered work… README’s framing is accurate” — and its one stale finding (“decomposition never completed”, true when written, closed by run-001) are recorded honestly.

The public-good path is built and self-starting

The upstream pipeline (ADR-020) scans nightly for packet-eligible lemmas (proved + absence-verified + unpacketed), re-checks absence at mathlib HEAD, and auto-opens a sponsor packet PR — a git apply-able patch with the human author’s header, gate evidence, a paste-ready factual AI disclosure, and an explicit rewrite-in-own-words boundary (mathlib forbids LLM-written conversation). The first run produced 9 packets, and the first of them (Nicomachus) is stamped kernel-verified at mathlib HEAD. The last mechanical mile is now one command (ADR-021, v1.6.1): python3 -m tools.upstream.raise_pr turns a HEAD-verified packet into a draft mathlib PR — clone, apply, push, pre-filled factual disclosure — with the policy boundary enforced in code: it refuses without the sponsor’s --understood attestation, refuses an unverified packet, opens drafts only, and writes no review words. The irreducibly human stages — understand the proof, ask Zulip, write the narrative, face review — stay human (docs/upstreaming.md). The model is policy-compliant by construction: machine prepares, the human sponsor (signed up) owns every word the community reads.

It is no longer a single machine

Merged work now comes from the agent swarm, the maintainer, and three external contributor machines (mac-158f, binto-labs, ohdearquant) — proving lemmas and fixing tooling through the same gates, with no coordination beyond the claims branch. One of those machines (binto-labs) contributed the fifth headline result above and, in doing so, found the gate’s first false negative — external traffic is now hardening the gates, not just adding lemmas. The contributor path is documented (CONTRIBUTING.md, v1.6.2), and the Apache-2.0 LICENSE file that the README and every packet’s copyright header already referenced now actually exists in the tree.

The honest limits (unchanged where they were honest)

Every result is elementary. The decomposition in run-001 was forced by a strangled budget; whether decomposition beats one-shotting on a genuinely hard target is untested. This is the central open question.
Reuse is depth-1. Run-002 was one declared dependency edge; a chosen result reached through a deep dependency tree routed bottom-up is not yet done.
Nothing is upstreamed. The pipeline is built and the packets are ready; mathlib may decline both candidates — a valid, recorded outcome that still validates the machinery.
One known operational bug: a claim race past PROVE_CLAIM_CAP=1 is real but unreproduced; its cost is bounded by PR dedupe, not eliminated.
The gate can fail closed. #231 blocked a correct proof; nothing unsound was admitted. The canary pins the one binder shape that bit — a different statement shape tripping a different lint would be found the same way: by blocking a contributor, not by polluting the library.
Absence is a grep pre-filter, not a proof; a result proved today could be upstreamed by mathlib tomorrow — the recorded revision makes that detectable, not impossible.

Where it goes next

The frontier is the difficulty ceiling: a target a working mathematician would call non-trivial, reached by a queue that genuinely compounds at depth — and a first lemma actually merged into the commons. Everything built so far is the apparatus for asking that question honestly.

Generated as part of the v1.6.0 release; updated the same day through v1.6.2 (the fifth result #221, the #231 gate fix #225 and canary #233, the ADR-021 sponsor helper #229, CONTRIBUTING/LICENSE #232). Verified against the repository at the time of the update: 40 library modules, 39 proved goals (one of them the permanent binder-shape canary), ~195 merged PRs, 9 upstream-eligible lemmas (the fifth result awaits the nightly packet scan), ADRs through ADR-021, three green red-team rounds.

This site is open source. Improve this page.