💪 Strengths
GPT-5.5Best all-rounder, native audio, enterprise agents
Claude 4.8SWE-bench king, nuanced reasoning, long context
DS V4 ProBest open math/code, MIT, 1M context
Qwen3.7Top Chinese model, 1M ctx, Apache 2.0
GLM-5.2SOTA open coding/agents, MIT, async RL
Gemini 3.1Best reasoning (ARC-AGI-2 77%), 1M ctx, multimodal
MiniMax M31M ctx + multimodal, 23B active, cheap
Gemma 4Best small model, Apache 2.0, edge-ready
VibeThinker3B params, frontier math reasoning, MIT
Ornith-1.0SOTA open coding at each size tier, MIT
Kimi K2.6Agent swarms (300 sub-agents), multimodal
⚠️ Weaknesses
GPT-5.5Expensive, closed, rate limits
Claude 4.8Highest cost, export restrictions
DS V4 ProChinese data law, needs multi-GPU
Qwen3.7Proprietary (no weights), hallucination
GLM-5.2744B total, needs serious hardware
MiniMax M3Community license, self-reported scores
Gemma 4Lower ceiling on hard reasoning
VibeThinkerReasoning-only, bad at general chat
Ornith-1.0Brand new, limited independent verification
Kimi K2.6Modified MIT, API-only for best variant