← Dashboard
Create

Build an eval suite

Define the task, two variants, and a few test cases with deterministic checks, then run it in mock mode.

Variants

claude-opus-4-8
claude-opus-4-8

Test cases

Checks: