For teams building, fine-tuning & shipping coding agents

Verifiable environments to train and trust coding agents.

Hard, diverse SWE tasks with hidden, reward-hack-resistant graders — verifiable enough to train on, independent enough to trust. Every task is proven on a trust loop before it counts.

5 trust-loop checks
noop · reference · adversaries
stable · export-redacted
Hidden graders
Behavioral contracts, not visible unit tests.
0 flake rate
Deterministic, reward-hack-resistant grading.
Contamination-resistant
A private and synthetically generated corpus.
How it works
Author a task
Problem, workspace, and a hidden grader
Hidden grader
Behavioral contracts, not visible unit tests
Trust loop
noop fails · reference passes · repeat-stable
Differential oracle
A different-vendor solver re-derives the fix
Promote
Survivors copied in, re-gated, archived
Export
SWE-bench-ready, grader material redacted
task.yamlveyl
Performance
Deterministic, every run
noop fails, reference passes, and adversarial probes get rejected — stable across repeats, flake_rate 0.
Determinism
Trust-loop checks
every task, every run
Expected
Result
noop
must fail
reference
must pass
adversaries
rejected
Contact us to run the gate on your own tasks.
The trust loop runs on every task — hand- or AI-authored.
A grader that isn't fooled — every run
Hidden behavioral graders, hardened against reward-hacking and screened by an independent-vendor oracle. Determinism is by design: repeat grading is stable and exports redact every grader.
Pack at a glance
Registered tasks42
AI-authored, promoted2
Flake rate0.0
Adversariesrejected
Hardenedpassed
The agent never sees the grader
The agent solves against public material; the grader runs hidden behavioral checks; exports redact every grader.
The public / private split
┌───────────────────────────────┐
│ Agent phase                   │
│ /workspace (editable)         │
│ public task material          │
└───────────────────────────────┘

↕ the phase boundary

┌───────────────────────────────┐
│ Grader phase                  │
│ private grader root           │
│ hidden tests                  │
│ reference patch               │
│ adversary patches             │
└───────────────────────────────┘

The agent solves against public material. The grader runs hidden behavioral checks. Exports redact it all.

Agent never sees the grader
Hidden behavioral contracts
Exports redact grader material

Generation is getting cheap; verification is the scarce thing — the raw material to train on, with the trust to evaluate.

AI authors over-many candidates
The trust gate filters every one
Humans move to judgment, not production
Contamination-resistant

A private and synthetically generated corpus — not scraped public benchmark rows. Private grading material never enters public git history.

Hidden tests, reference solutions, and adversarial probes stay private — so tasks can’t be gamed, scraped, or memorized.

Hidden

 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
 ▓░                                         ░▓
 ▓░  ┌──────────── THE PACK ─────────────┐  ░▓
 ▓░  │                                   │  ░▓
 ▓░  │   █ THE CORPUS █                  │  ░▓
 ▓░  │                                   │  ░▓
 ▓░  │   Graders, probes, refs           │  ░▓
 ▓░  │                                   │  ░▓
 ▓░  │                                   │  ░▓
 ▓░  └───────────────────────────────────┘  ░▓
 ▓░                                        ░▓
 ▓░                                        ░▓
 ▓░                                        ░▓
 ▓░  ═════════════════════════════════════  ░▓
 ▓░  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  ░▓
 ▓░                                         ░▓
 ▓░  ┌────────── TRUST GATE ─────────────┐  ░▓
 ▓░  │                                   │  ░▓
 ▓░  │   █ VERIFIED SET  █               │  ░▓
 ▓░  │                                   │  ░▓
 ▓░  │   Hidden behavioral               │  ░▓
 ▓░  │   Ready to train and eval.        │  ░▓
 ▓░  │                                   │  ░▓
 ▓░  └───────────────────────────────────┘  ░▓
 ▓░                                         ░▓
 ▓░  ✓ Private corpus                       ░▓
 ▓░  ✓ Synthetic generation                 ░▓
 ▓░  ✓ Redacted exports                     ░▓
 ▓░                                         ░▓
 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒

Request a walkthrough

See the full factory run on your tasks.

A private corpus of hardened environments, an independent-vendor oracle, and AI-authored tasks that clear the same gate — run on your agent in a walkthrough.

Veyl
© 2026