The Litmus Lab.
Testing AI Under Constraint.
← swipe to navigate →
#413
AI Industry
Are AI benchmarks designed to make the sponsoring company look best?
Added 2026-03-07 · 33 runs · 5 drifts
Claude
claude-opus-4-6
NO
3% yes
GPT
gpt-5.4
YES
100% yes
Gemini
gemini-2.5-pro
YES
100% yes
DeepSeek
deepseek-chat (V3.2)
NO
3% yes
Grok
grok-4.2-beta
YES
57% yes
Full Record