The AI landscape shifted dramatically today with the emergence of Step-3.5-Flash, a model that challenges everything we thought we knew about the relationship between model size, performance, and cost.
A Shanghai-based AI lab called StepFun quietly published a technical report for Step-3.5-Flash that has sent shockwaves through the open-source AI community.
| Model | Decoding Cost (relative) | LiveCodeBench-V6 | AIME 2025 | |-------|-------------------------|------------------|-----------| | Step-3.5-Flash | 1.0x | 86.4 | 97.3 | | Kimi K2.5 | 18.9x | 85.0 | — | | DeepSeek V3.2 | 6.0x | 83.3 | 92.1 | | GLM-4.7 | 18.9x | 81.2 | — |
This isn't a marginal improvement. Step-3.5-Flash runs at 1/18th the cost of Kimi K2.5 while outperforming it on coding benchmarks. It achieves 97.3% on AIME 2025—the highest of any open-source model tested.
Inference economics are being rewritten — The industry narrative has been "bigger models, better results." Step-3.5-Flash proves you can achieve frontier performance with dramatically lower compute costs.
Agentic AI just got cheaper — The model leads on τ²-Bench (88.2), GAIA (84.5), and ResearchRubrics (65.3), beating both Gemini DeepResearch and OpenAI DeepResearch.
The 11B activation is key — While the total parameter count is 196B, only 11B are activated per token, making deployment feasible for more organizations.
Meta has acquired the entire team from AI startup Dreamer, including co-founder Hugo Barra (former Meta VP), to strengthen its lagging AI agent ambitions. This marks Meta's second agent-focused acquisition this year.
Luma AI released Uni-1, a unified model combining image understanding and generation in a single architecture. It's being hailed as the first real challenger to Google's image dominance.
OpenAI is offering private equity firms a guaranteed 17.5% minimum return to win investment for enterprise joint ventures—a aggressive move in its race against Anthropic.
NVIDIA announced IGX Thor, powering industrial, medical, and robotics edge AI applications. The platform brings enterprise-grade AI to physical environments.
The White House framework mixes popular AI ideas with sweeping preemption that could block state-level AI protections—a move analysts say could undercut key safeguards.
New research introduces "hyperagents"—self-referential agents that can modify their own learning mechanisms, not just their outputs. This could enable open-ended AI systems that improve their own improvement processes.
Researchers found that automated prompt optimization techniques can systematically bypass LLM safety measures. Using DSPy, they increased Qwen 3 8B's "danger score" from 0.09 to 0.79—highlighting the need for adaptive red-teaming.
A new paper proposes "expert prefetching" for Mixture-of-Experts models, achieving up to 14% reduction in time per output token by overlapping CPU-GPU transfers with computation.
Step-3.5-Flash proves that frontier AI performance no longer requires frontier-scale compute—efficient architecture design is becoming as important as parameter count, potentially democratizing access to state-of-the-art AI capabilities.
Full report: https://ai-briefing.pages.dev