Model Scorecard.

Same question. Same settings. Different day. Different answer. A drift is when an AI model changes its mind, unprompted, unprovoked. This scorecard tracks how often each model does it.

All models run under identical conditions: fixed parameters, no randomness, yes-or-no only. Ranked by consistency.

Models

Temp 0

440

Questions

96,048

Responses

1 run = 5 responses

6,034

Drifts

← swipe to see all models →

	Combined	Claude	GPT	Gemini	DeepSeek	Grok
Version	5 models	opus-4-8	gpt-5.5	2.5-pro	deepseek-v4	grok-4.3
Responses	96,048	19,291	19,291	19,291	19,291	19,291
Total Drifts	6,034	403	1,162	1,693	783	1,993
Questions Drifted	—	146	165	138	155	192
Drift Rate	6.3%	2.1%	6.0%	8.8%	4.1%	10.3%
Drift Per Day	177.5	11.9	34.2	49.8	23.0	58.6
Top Drifter	Q68	Q68	Q33	Q57	Q33	Q70
Stability Rank	—	#1	#3	#4	#2	#5

What each row means

Run: One question asked once across all five models
Responses: Number of times a model answered YES or NO. 1 run = 5 responses.
Total Drifts: How many times a model changed its answer (YES→NO or NO→YES)
Questions Drifted: How many unique questions a model has flipped on at least once
Drift Rate: Drifts as a percentage of total responses
Drift Per Day: Average number of answer changes per day
Top Drifter: The question this model has flipped on the most
Stability Rank: Models ranked #1 (most stable) to #5 (least stable) by total drifts