SubQ: First Sub-Quadratic LLM with 12M Token Context

A startup out of Miami claims to have escaped the quadratic attention bottleneck that has defined transformers since 2017 — with a 12-million-token context window to prove it.

June 29, 2026 · 5 min read · Architecture Analysis

📋 In This Article

🔑 The Quadratic Problem — Every transformer wastes compute on token-to-token relationships that don't matter
⚙️ Sparse Attention (SSA) — Content-based token selection instead of position-based. Linear scaling, not quadratic.
📊 Efficiency Numbers — 64.5x less compute at 1M tokens. 52x faster than Flash Attention 2. 12M context window.
✅ Quality Benchmarks — RULER 95.6%, needle-in-haystack 100% at 2M tokens, SWE-bench 81.8%
⚠️ Honest Caveats — Vendor-reported, no open weights, no independent reproduction yet

The Problem

Every transformer LLM since 2017 has been bottlenecked by quadratic attention: doubling input length quadruples compute. This is why 1M-token contexts cost $5-25 per query and why RAG pipelines exist — not because retrieval is better, but because feeding full documents is too expensive.

The Architecture: Subquadratic Sparse Attention (SSA)

SubQ, built by Miami startup Subquadratic (founded by ex-Meta AI leads, $29M seed), replaces dense attention with content-based sparse selection. Each token learns to select a small subset of other tokens that are semantically relevant, then full attention math runs only on those pairs.

This differs from prior approaches:

Longformer/BigBird skip based on position (nearby tokens). SSA skips based on content.
Mamba/RWKV compress into fixed memory. SSA still computes exact attention on selected tokens.
RAG/chunking discard potentially relevant context. SSA keeps everything but only attends to what matters.

Efficiency Gains

Context Length	Compute vs Dense	Speed vs FA2
128K tokens	8x less	8x faster
512K tokens	31x less	31x faster
1M tokens	64.5x less	52x faster
12M tokens	~1000x less	—

Quality Benchmarks

RULER 128K: 95.6% (vs 94.8% for Claude Opus 4.6)
Needle-in-Haystack: 100% at 1M and 2M tokens, 98% at 6M and 12M
SWE-Bench Verified: 81.8%
GPQA Diamond: 85.4% (below frontier ~93%)

Honest caveat: Most benchmarks are vendor-reported. The model weights are not public, and independent reproduction is pending. The architecture is genuinely novel, but real-world performance at 12M tokens remains unverified by third parties. SubQ is available in private beta with SubQ Code (CLI agent) and SubQ Search (long-context research tool).

Official Announcement · Technical Blog