โ† All Updates

Qwythos 9B: Claude Mythos Distillation into Open Weights

A 9B model fine-tuned on 500M+ tokens of Claude Mythos reasoning traces gains +34 points on MMLU and native tool calling โ€” showing how far targeted distillation can push small open models.

June 29, 2026 ยท 4 min read ยท Fine-tune Analysis

๐Ÿ“‹ In This Article

  • ๐Ÿ”‘ What Is Qwythos โ€” Full-parameter fine-tune of Qwen3.5-9B on Claude Mythos traces, Apache 2.0 licensed
  • ๐Ÿ“ˆ Performance Gains โ€” +34 MMLU, +30 gsm8k-strict, +19 gsm8k-flex over base Qwen3.5-9B
  • ๐Ÿ”ง Capabilities โ€” 1M context via YaRN, native function calling, optional vision projector
  • ๐Ÿ“Š Benchmark Comparison โ€” How it stacks against base Qwen3.5 and other 9B-class models
  • โš ๏ธ Trade-offs โ€” GPQA regression, narrow improvement domain, not general-purpose

What It Is

Qwythos-9B is a full-parameter fine-tune of Qwen3.5-9B by Empero AI, trained on 500M+ tokens of chain-of-thought reasoning traces. It ships with 1M context (via YaRN rope scaling from the native 262K), native function calling, and an optional vision projector. Apache 2.0 licensed.

Performance Gains Over Base

MetricQwen3.5-9B (base)Qwythos-9BDelta
MMLU23.2%57.5%+34.3
gsm8k (strict)51.0%81.0%+30.0
gsm8k (flex)67.0%86.0%+19.0
GPQA Diamond63.0%58.0%-5.0

Key Observations

Key takeaway: Qwythos demonstrates that targeted fine-tuning on high-quality reasoning traces can dramatically improve small model performance on structured tasks. It's not a general-purpose replacement for frontier models, but for math, code, and tool-use workflows at the 9B scale, it's one of the strongest open options available.

Model Card (HuggingFace) ยท GGUF Quantizations