unsorry

unsorry — status report (2026-06-12)

What the project has actually achieved, stated against verified ground truth rather than ambition. The house style is honesty: each claim is followed by its limit.

One paragraph

In roughly three days (v0.1.0 → v1.6.2, ~195 merged PRs), unsorry went from a coordination skeleton to a swarm that has proved five mathlib-absent results, demonstrated its two load-bearing mechanisms end-to-end (decomposition and dependency reuse), survived its own operational failure modes (three CLI quota outages in one day, now absorbed unattended; its first gate false negative caught, fixed, and canary-guarded within two hours), passed three adversarial red-team rounds including the CRITICAL hole an external review found, and built — and auto-runs — a policy-compliant pipeline to upstream its novel lemmas to mathlib. The honest frontier is unchanged: every result so far is elementary, the hard-target ceiling is untested, and nothing has landed in the commons yet.

What is true, with verification

The architecture works as designed

It has produced novel, kernel-verified mathematics

Five results verified mathlib-absent before the run and proved by the swarm:

Result What How
nicomachus_sum_cubes ∑k³ = (∑k)² direct, 24-line multi-step (phase2-run-001)
sum_range_pow_four_closed Faulhaber k=4 closed form direct induction
platonic_schlafli_pairs the five Platonic Schläfli pairs forced depth-3 decomposition, 13 lemmas (phase3-run-001)
sum_range_cube_eq_triangular_sq triangular closed form of Nicomachus dependency reuse of the above, 4m57s (phase3-run-002)
not_prime_pow_four_add_four n⁴ + 4 is never prime for n > 1 (Sophie Germain) direct, by external machine binto-labs (#221) — the proof that surfaced gate bug #231

Both load-bearing mechanisms are demonstrated, not just built

It survives its operating environment

Born from three real quota outages in one day: the infrastructure-failure guard (ADR-016 — a fast-failed call + failed health probe is never read as goal evidence) and the supervisor (ADR-017 — exponential backoff, PR hygiene, claim guard). The third outage was absorbed with zero queue corruption and no human intervention; the first two had each demoted a whole goal tree and needed manual repair. The progressive effort ladder (ADR-015) spends deep reasoning only where a cheaper attempt failed — 11 of 13 run-001 proofs closed at the cheapest rung.

It is hardened against the external review

Issue #190’s CRITICAL (goal tampering), HIGH (workflow self-protection), MEDIUM (unpinned actions) and LOW (audit corpus) findings are all addressed — goal-statement immutability (ADR-018), SHA-pinned actions + CODEOWNERS + a settings checklist (ADR-019), and an opaque audit fixture. The review’s own verdict — “serious, well-engineered work… README’s framing is accurate” — and its one stale finding (“decomposition never completed”, true when written, closed by run-001) are recorded honestly.

The public-good path is built and self-starting

The upstream pipeline (ADR-020) scans nightly for packet-eligible lemmas (proved + absence-verified + unpacketed), re-checks absence at mathlib HEAD, and auto-opens a sponsor packet PR — a git apply-able patch with the human author’s header, gate evidence, a paste-ready factual AI disclosure, and an explicit rewrite-in-own-words boundary (mathlib forbids LLM-written conversation). The first run produced 9 packets, and the first of them (Nicomachus) is stamped kernel-verified at mathlib HEAD. The last mechanical mile is now one command (ADR-021, v1.6.1): python3 -m tools.upstream.raise_pr turns a HEAD-verified packet into a draft mathlib PR — clone, apply, push, pre-filled factual disclosure — with the policy boundary enforced in code: it refuses without the sponsor’s --understood attestation, refuses an unverified packet, opens drafts only, and writes no review words. The irreducibly human stages — understand the proof, ask Zulip, write the narrative, face review — stay human (docs/upstreaming.md). The model is policy-compliant by construction: machine prepares, the human sponsor (signed up) owns every word the community reads.

It is no longer a single machine

Merged work now comes from the agent swarm, the maintainer, and three external contributor machines (mac-158f, binto-labs, ohdearquant) — proving lemmas and fixing tooling through the same gates, with no coordination beyond the claims branch. One of those machines (binto-labs) contributed the fifth headline result above and, in doing so, found the gate’s first false negative — external traffic is now hardening the gates, not just adding lemmas. The contributor path is documented (CONTRIBUTING.md, v1.6.2), and the Apache-2.0 LICENSE file that the README and every packet’s copyright header already referenced now actually exists in the tree.

The honest limits (unchanged where they were honest)

Where it goes next

The frontier is the difficulty ceiling: a target a working mathematician would call non-trivial, reached by a queue that genuinely compounds at depth — and a first lemma actually merged into the commons. Everything built so far is the apparatus for asking that question honestly.


Generated as part of the v1.6.0 release; updated the same day through v1.6.2 (the fifth result #221, the #231 gate fix #225 and canary #233, the ADR-021 sponsor helper #229, CONTRIBUTING/LICENSE #232). Verified against the repository at the time of the update: 40 library modules, 39 proved goals (one of them the permanent binder-shape canary), ~195 merged PRs, 9 upstream-eligible lemmas (the fifth result awaits the nightly packet scan), ADRs through ADR-021, three green red-team rounds.