Railway, a San Francisco-based cloud platform, raised $100 million in Series B funding to challenge AWS and Google Cloud with AI-native infrastructure. The company claims deployments in under one second—fast enough to keep pace with AI-generated code. Customers report 10x developer velocity and up to 65% cost savings. Notably, Railway built its own data centers after abandoning Google Cloud, offering 50% lower pricing than hyperscalers.
Block (formerly Square) released Goose, an open-source AI coding agent with 26,100+ GitHub stars. Unlike Claude Code's $20-200/month pricing with rate limits, Goose runs entirely locally using Ollama with any LLM. No subscription fees, no cloud dependency, no rate limits. The project represents a growing developer rebellion against AI coding tool pricing.
Listen Labs secured $69M in Series B funding for its AI-powered customer research platform. The company conducts over 1 million AI-mediated interviews, replacing traditional surveys with open-ended video conversations. Notable clients include Microsoft, Sweetgreen, and MGM Resorts. The platform grew revenue 15x in nine months to eight figures.
Microsoft Research released AsgardBench, a benchmark evaluating whether AI agents can use visual observations to revise their plans mid-task. The benchmark tests 108 task instances across 12 types in AI2-THOR simulation. Key finding: visual input more than doubles success rates compared to text-only descriptions, proving that embodied AI requires perception-based reasoning.
Microsoft also released GroundedPlanBench, evaluating whether VLMs can plan actions AND determine where to execute them simultaneously. The V2GP framework converts robot demonstration videos into spatially grounded training data. Results show grounded planning outperforms decoupled approaches (planning + grounding separately).
ServiceNow AI and Hugging Face introduced EVA, a new framework for evaluating voice agents. This addresses the lack of standardized benchmarks for measuring voice AI performance across real-world conversational scenarios.
Google DeepMind released Lyria 3, now available in Gemini API and Google AI Studio. The model enables developers to build music generation applications with professional-quality output. Lyria 3 Pro extends capabilities for longer tracks in Google products.
Google launched Gemini 3.1 Flash Live, making audio AI more natural and reliable across Google products. The model supports real-time voice interactions with improved latency and accuracy.
Search Live is now available globally in all languages and locations where AI Mode is available, bringing real-time visual search capabilities to users worldwide.
Open-source models like Kimi K2 and GLM 4.5 now benchmark near Claude Sonnet 4 levels—but are freely available. As open-source AI infrastructure matures, the quality advantage justifying premium pricing is eroding. Developers increasingly have genuine alternatives that prioritize cost, privacy, and flexibility.
| Paper | Description | |-------|-------------| | PLDR-LLMs: Self-Organized Criticality | Shows LLMs pretrained at criticality exhibit reasoning at inference time | | Environment Maps | Structured representations for long-horizon agents in software workflows | | EnterpriseArena | First benchmark for CFO-style resource allocation under uncertainty | | DUPLEX | Agentic dual-system planning via LLM-driven information extraction | | AI-Supervisor | Multi-agent framework with persistent research world model |
The AI infrastructure battle heats up: Railway's $100M bet on AI-native cloud challenges hyperscalers, while Block's free Goose alternative threatens Claude Code's $200/month pricing—signaling a shift toward open, cost-efficient developer tools.
Full Report: https://ai-briefing.pages.dev