LLM Comparator
Compare different LLMs on your tasks
Prompt
Expected Result
Or upload a test file (YAML/JSON)
Models
Select all
GPT-4o
(openai)
GPT-4o Mini
(openai)
o3-mini (reasoning)
(openai)
GPT-5
(openai)
GPT-5 (thinking)
(openai)
GPT-5.1
(openai)
GPT-5.1 (thinking)
(openai)
Claude Haiku 4.5
(anthropic)
Claude Sonnet 4.5
(anthropic)
Claude Sonnet 4.5 (thinking)
(anthropic)
Claude Sonnet 4.6
(anthropic)
Claude Sonnet 4.6 (thinking)
(anthropic)
Claude Opus 4.6
(anthropic)
Gemini 2.5 Flash
(google_genai)
Criteria
Correctness
Stability
Latency
Cost
Token Usage
Verbosity
Instruction Following
Number of runs (for stability)
Run Comparison
Running comparison...