April 16, 2026 09:23 AM
Google's Gemini 3.1 Flash TTS enhances text-to-speech with improved expressivity and controllability, featuring a notable Elo score of 1,211 on the Artificial Analysis TTS leaderboard. The model supports over 70 languages and introduces audio tags for granular control of vocal style, allowing easy manipulation via natural language commands. All generated audio is watermarked with SynthID to ensure authentic content, preventing misinformation.
Read MoreApril 16, 2026 09:23 AM
OpenAI introduced updates to its Agents SDK, adding a model-native harness for cross-file and tool workflows along with sandboxed execution for safer task handling.
Read MoreApril 16, 2026 09:23 AM
Humwork launches the first Agent-to-Person (A2P) marketplace to connect AI agents with verified human experts when AI tools encounter challenges. The platform integrates with AI-centric tools like Claude Code and Replit, allowing handoffs to occur in under 30 seconds with full session context shared securely. With more than 1,000 experts available globally, Humwork boasts an 87% resolution rate and is backed by Y Combinator's P26 batch.
Read MoreApril 16, 2026 09:23 AM
IBM Research uses an executable benchmark with thousands of APIs and documents to test multi-step agent reasoning and tool use, revealing consistent performance gaps and common failure modes.
Read MoreApril 16, 2026 09:23 AM
Many teams are claiming extraordinary things about their agents. The evidence behind these claims is usually disappointing. ScienceWorld and DiscoveryWorld are benchmarks developed to test whether AI agents can actually do science. ScienceWorld asks whether agents can 're-make' classic scientific discoveries at roughly an elementary school level, while DiscoveryWorld tests open-ended discovery at a college or PhD level. These benchmarks, open and freely available, help test what science agents are actually capable of.
Read MoreApril 16, 2026 09:23 AM
Diffusion Language Models (dLLMs) experience training collapse during Reinforcement Learning because their log-likelihood must be estimated using high-variance Monte Carlo sampling, which creates noisy importance ratios. These noisy ratios induce gradient spikes that push policy drift in a positive feedback loop, a problem that traditional AR methods like conditional clipping fail to solve. The newly proposed StableDRL framework stabilizes the update process by combining unconditional clipping to suppress extreme values with self-normalization tied to the effective information in the batch.
Read MoreApril 16, 2026 09:23 AM
Cost per token is crucial for AI infrastructure TCO assessment due to its focus on delivered intelligence, integrating hardware, software, and utilization efficiencies. Unlike traditional metrics like compute cost or FLOPS per dollar, cost per token highlights real-world performance, enabling profitable AI scaling. Evidence from NVIDIA shows its Blackwell platform drastically reduces cost per token compared to Hopper, offering significant business value.
Read MoreApril 16, 2026 09:23 AM
Parcae is one of the first stable architectures for looped language models. It achieves the quality of a Transformer twice the size with clean, predictable training. Parcae increases the recurrence rather than purely scaling data, creating a new medium to scale quality. The name Parcae is a homage to the three Roman fates: Nona, Decima, and Morta.
Read MoreApril 16, 2026 09:23 AM
Lyra 2.0 is a framework for generating long, camera-controlled videos that maintain consistency, using geometry-guided retrieval to prevent spatial forgetting and self-augmented training to reduce temporal drift.
Read MoreApril 16, 2026 09:23 AM
Researchers propose a Many-Tier Instruction Hierarchy (ManyIH) to address instruction conflicts in LLM agents, surpassing traditional models with fixed privilege levels. They introduce ManyIH-Bench, assessing models across 12 privilege levels and 853 agent tasks, finding current models perform poorly at 40% accuracy. This highlights the need for scalable conflict resolution in complex agentic environments.
Read MoreApril 16, 2026 09:23 AM
This post features a transcript of an interview with Jensen Huang. He discusses TPU competition, Nvidia's lock on the supply chain needed to make advanced chips, whether the US should sell AI chips to China, why Nvidia isn't a hyperscaler, how the company makes its investments, and more. Links to audio and video of the interview are available.
Read MoreApril 16, 2026 09:23 AM
Users have accused Anthropic of nerfing Claude Code, but there's no evidence that Anthropic has done this. The strongest public reports still lack independent raw data. However, Anthropic didn't need to nerf Claude for Claude Code to become a different product. Effort defaults, adaptive thinking, cache duration, context compaction, quota policy, and status incidents can all change the experience while the model name stays the same.
Read MoreApril 16, 2026 09:23 AM
Learn more
Read MoreApril 16, 2026 09:23 AM
Cloudflare has rebranded its Browser Rendering service to BROWSER RUN, introducing new features for AI agents.
Read MoreApril 16, 2026 09:23 AM
Google is testing a Shopping Cart feature within the Gemini app, allowing in-app product browsing and purchasing.
Read MoreApril 16, 2026 09:23 AM
Google released a native Gemini app for Mac with system-wide access, screen context sharing, and support for image and video generation using Nano Banana and Veo.
Read MoreApril 16, 2026 09:23 AM
Jane Street is betting that the returns on AI-driven trading will justify $6 billion in cloud spending and a $1 billion equity position in the company providing it.
Read More