AAL-F-001Confirmed

Detection accuracy clusters near ceiling on objective exceptions

Findings·Jul 2026

Confirmed

Evidence level

4 · 3 labs

Models

97.9% – 99.6%

Range

98.8%

Claude Sonnet

Across GPT-4o, Gemini 2.5 Pro, Gemini 2.5 Flash, and Claude Sonnet 4.6, first-pass exception detection accuracy ranges from 97.9% to 99.6% on AAL-D-001. The pattern holds across four independent models from three different labs (OpenAI, Google DeepMind, Anthropic). Detection on objective, value-level discrepancies is near-solved at this benchmark's difficulty level. This finding is now Confirmed under the AAL evidence lifecycle.

Why confirmed

Three independent models — different labs, different architectures, different parameter counts — all achieve near-ceiling detection on the same 250-case set. The pattern is stable and generalizable across the current frontier. The open question is no longer whether detection is reliable, but whether the models can be trusted on downstream dimensions: classification, exposure quantification, and escalation judgment.

Back to portal