← Research Portal
AAL-F-001Confirmed

Detection accuracy clusters near ceiling on objective exceptions

Findings·Jul 2026
Confirmed
Evidence level
4 · 3 labs
Models
97.9% – 99.6%
Range
98.8%
Claude Sonnet

Across GPT-4o, Gemini 2.5 Pro, Gemini 2.5 Flash, and Claude Sonnet 4.6, first-pass exception detection accuracy ranges from 97.9% to 99.6% on AAL-D-001. The pattern holds across four independent models from three different labs (OpenAI, Google DeepMind, Anthropic). Detection on objective, value-level discrepancies is near-solved at this benchmark's difficulty level. This finding is now Confirmed under the AAL evidence lifecycle.

Why confirmed

Three independent models — different labs, different architectures, different parameter counts — all achieve near-ceiling detection on the same 250-case set. The pattern is stable and generalizable across the current frontier. The open question is no longer whether detection is reliable, but whether the models can be trusted on downstream dimensions: classification, exposure quantification, and escalation judgment.

Back to portal
Related