SubQ: First Sub-Quadratic LLM with 12M Token Context
A startup out of Miami claims to have escaped the quadratic attention bottleneck that has defined transformers since 2017 โ with a 12-million-token context window to prove it.
๐ In This Article
- ๐ The Quadratic Problem โ Every transformer wastes compute on token-to-token relationships that don't matter
- โ๏ธ Sparse Attention (SSA) โ Content-based token selection instead of position-based. Linear scaling, not quadratic.
- ๐ Efficiency Numbers โ 64.5x less compute at 1M tokens. 52x faster than Flash Attention 2. 12M context window.
- โ Quality Benchmarks โ RULER 95.6%, needle-in-haystack 100% at 2M tokens, SWE-bench 81.8%
- โ ๏ธ Honest Caveats โ Vendor-reported, no open weights, no independent reproduction yet
The Problem
Every transformer LLM since 2017 has been bottlenecked by quadratic attention: doubling input length quadruples compute. This is why 1M-token contexts cost $5-25 per query and why RAG pipelines exist โ not because retrieval is better, but because feeding full documents is too expensive.
The Architecture: Subquadratic Sparse Attention (SSA)
SubQ, built by Miami startup Subquadratic (founded by ex-Meta AI leads, $29M seed), replaces dense attention with content-based sparse selection. Each token learns to select a small subset of other tokens that are semantically relevant, then full attention math runs only on those pairs.
This differs from prior approaches:
- Longformer/BigBird skip based on position (nearby tokens). SSA skips based on content.
- Mamba/RWKV compress into fixed memory. SSA still computes exact attention on selected tokens.
- RAG/chunking discard potentially relevant context. SSA keeps everything but only attends to what matters.
Efficiency Gains
| Context Length | Compute vs Dense | Speed vs FA2 |
|---|---|---|
| 128K tokens | 8x less | 8x faster |
| 512K tokens | 31x less | 31x faster |
| 1M tokens | 64.5x less | 52x faster |
| 12M tokens | ~1000x less | โ |
Quality Benchmarks
- RULER 128K: 95.6% (vs 94.8% for Claude Opus 4.6)
- Needle-in-Haystack: 100% at 1M and 2M tokens, 98% at 6M and 12M
- SWE-Bench Verified: 81.8%
- GPQA Diamond: 85.4% (below frontier ~93%)