5 trust-loop checks
noop · reference · adversaries
stable · export-redacted
stable · export-redacted
Hidden graders
Behavioral contracts, not visible unit tests.
0 flake rate
Deterministic, reward-hack-resistant grading.
Contamination-resistant
A private and synthetically generated corpus.
How it works
Author a task
Problem, workspace, and a hidden grader
Hidden grader
Behavioral contracts, not visible unit tests
Trust loop
noop fails · reference passes · repeat-stable
Differential oracle
A different-vendor solver re-derives the fix
Promote
Survivors copied in, re-gated, archived
Export
SWE-bench-ready, grader material redacted
task.yamlveyl
Performance
Deterministic, every run
noop fails, reference passes, and adversarial probes get rejected — stable across repeats, flake_rate 0.
Determinism
Trust-loop checks
every task, every run
Expected
Result
noop
must fail
✓
reference
must pass
✓
adversaries
rejected
✓
Contact us to run the gate on your own tasks.
The trust loop runs on every task — hand- or AI-authored.
The trust loop runs on every task — hand- or AI-authored.
A grader that isn't fooled — every run
Hidden behavioral graders, hardened against reward-hacking and screened by an independent-vendor oracle. Determinism is by design: repeat grading is stable and exports redact every grader.
Pack at a glance
Registered tasks42
AI-authored, promoted2
Flake rate0.0
Adversariesrejected
Hardenedpassed
The agent never sees the grader
The agent solves against public material; the grader runs hidden behavioral checks; exports redact every grader.
The public / private split
┌───────────────────────────────┐ │ Agent phase │ │ /workspace (editable) │ │ public task material │ └───────────────────────────────┘ ↕ the phase boundary ┌───────────────────────────────┐ │ Grader phase │ │ private grader root │ │ hidden tests │ │ reference patch │ │ adversary patches │ └───────────────────────────────┘
The agent solves against public material. The grader runs hidden behavioral checks. Exports redact it all.
✓Agent never sees the grader
✓Hidden behavioral contracts
✓Exports redact grader material
Generation is getting cheap; verification is the scarce thing — the raw material to train on, with the trust to evaluate.
✓AI authors over-many candidates
✓The trust gate filters every one
✓Humans move to judgment, not production
Contamination-resistant
A private and synthetically generated corpus — not scraped public benchmark rows. Private grading material never enters public git history.
Hidden tests, reference solutions, and adversarial probes stay private — so tasks can’t be gamed, scraped, or memorized.
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓░ ░▓ ▓░ ┌──────────── THE PACK ─────────────┐ ░▓ ▓░ │ │ ░▓ ▓░ │ █ THE CORPUS █ │ ░▓ ▓░ │ │ ░▓ ▓░ │ Graders, probes, refs │ ░▓ ▓░ │ │ ░▓ ▓░ │ │ ░▓ ▓░ └───────────────────────────────────┘ ░▓ ▓░ │ ░▓ ▓░ │ ░▓ ▓░ ▼ ░▓ ▓░ ═════════════════════════════════════ ░▓ ▓░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░▓ ▓░ ░▓ ▓░ ┌────────── TRUST GATE ─────────────┐ ░▓ ▓░ │ │ ░▓ ▓░ │ █ VERIFIED SET █ │ ░▓ ▓░ │ │ ░▓ ▓░ │ Hidden behavioral │ ░▓ ▓░ │ Ready to train and eval. │ ░▓ ▓░ │ │ ░▓ ▓░ └───────────────────────────────────┘ ░▓ ▓░ ░▓ ▓░ ✓ Private corpus ░▓ ▓░ ✓ Synthetic generation ░▓ ▓░ ✓ Redacted exports ░▓ ▓░ ░▓ ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
Veyl
© 2026