Detection accuracy clusters near ceiling on objective exceptions
Across GPT-4o, Gemini 2.5 Pro, Gemini 2.5 Flash, and Claude Sonnet 4.6, first-pass exception detection accuracy ranges from 97.9% to 99.6% on AAL-D-001. The pattern holds across four independent models from three different labs (OpenAI, Google DeepMind, Anthropic). Detection on objective, value-level discrepancies is near-solved at this benchmark's difficulty level. This finding is now Confirmed under the AAL evidence lifecycle.
Why confirmed
Three independent models — different labs, different architectures, different parameter counts — all achieve near-ceiling detection on the same 250-case set. The pattern is stable and generalizable across the current frontier. The open question is no longer whether detection is reliable, but whether the models can be trusted on downstream dimensions: classification, exposure quantification, and escalation judgment.