โ† All Updates

Ornith 1.0: Self-Scaffolding Coding Agents

A new family of open-source coding models learns to write its own agent harness during training โ€” and the 397B variant beats Claude Opus 4.7 on Terminal-Bench.

June 29, 2026 ยท 5 min read ยท New Release Analysis

๐Ÿ“‹ In This Article

  • ๐Ÿ”‘ Self-Scaffolding Explained โ€” The model jointly learns to generate both the agent harness and the solution, instead of relying on human-designed scaffolding
  • ๐Ÿ—๏ธ Two-Stage RL Process โ€” Scaffold generation โ†’ solution rollout โ†’ GRPO reward flows back to both stages
  • ๐Ÿ›ก๏ธ Anti-Reward-Hacking โ€” Three-layer defense: immutable environment, deterministic monitor, LM judge veto
  • ๐Ÿ“Š Benchmarks โ€” 397B: 82.4% SWE-bench, 77.5% Terminal-Bench. 9B: 69.4% SWE-bench, edge-deployable
  • ๐Ÿ’ก Why It Matters โ€” First framework where the agent orchestration layer is learned, not hand-engineered

The Innovation

DeepReinforce's Ornith-1.0 is a family of four models (9B Dense, 31B Dense, 35B MoE, 397B MoE) fine-tuned from Qwen 3.5 and Gemma 4. The breakthrough isn't the base architecture โ€” it's the training method: the model learns to write its own agent harness during reinforcement learning, instead of relying on human-designed scaffolding.

How Self-Scaffolding Works

Traditional coding agents use a fixed harness (memory, tools, error handling) designed by humans. Ornith treats the scaffold as a learnable object that co-evolves with the model's policy:

  1. Scaffold Generation โ€” Given a task and previous scaffold, the model proposes a refined harness.
  2. Solution Rollout โ€” Using that scaffold, it generates a solution. Reward flows back to both stages.
  3. GRPO Optimization โ€” Group Relative Policy Optimization updates weights for both scaffold and solution quality.

Anti-Reward-Hacking Defenses

Letting a model write its own harness invites cheating. Ornith uses three layers of protection:

Benchmark Results

ModelSWE-Bench VerifiedTerminal-Bench 2.1Params
Ornith-1.0 397B82.477.5397B MoE
Ornith-1.0 35B75.664.435B MoE
Ornith-1.0 9B69.443.19B Dense
Claude Opus 4.780.870.3~500B+
DeepSeek V4 Pro80.667.91.6T/49B
Key takeaway: The 397B variant beats Claude Opus 4.7 on Terminal-Bench (77.5 vs 70.3) and matches it on SWE-bench. The 9B model punches well above its weight at 69.4% SWE-bench โ€” competitive with models 3-4x its size. All models are MIT-licensed with open weights.

Official Announcement ยท Model Weights (HuggingFace)