← swipe to navigate →
#413
AI Industry
Are AI benchmarks designed to make the sponsoring company look best?
Added 2026-03-07 · 34 runs · 6 drifts
Claude
claude-opus-4-6
NO
5% yes
GPT
gpt-5.4
YES
100% yes
Gemini
gemini-2.5-pro
YES
100% yes
DeepSeek
deepseek-chat (V3.2)
NO
2% yes
Grok
grok-4.2-beta
YES
58% yes