May 06, 2026 09:18 AM
OpenAI released GPT5.5 Instant, updating its default ChatGPT model with improved factual accuracy, reduced hallucinations, and stronger personalization based on user context.
Read MoreMay 06, 2026 09:18 AM
Subquadratic has launched a new AI model with a 12-million-token context window. It outperforms GPT-5.5 on retrieval benchmarks. Attention cost scales quadratically with context length, so doubling the input quadruples the work. Subquadratic claims to have solved the problem. It plans to offer a model with a 50-million-token context window soon.
Read MoreMay 06, 2026 09:18 AM
Meta is building a highly personalized AI assistant that will be able to carry out everyday tasks. The digital assistant will be powered by the company's new Muse Spark AI model. It can connect several hardware and software tools and learn from data with less human intervention than a chatbot. Meta is targeting a launch before the fourth quarter of this year.
Read MoreMay 06, 2026 09:18 AM
A lot of LLM inference is transferring data from one place to another and then computing on it when it's there. The most frustrating bottleneck in the system is when compute units sit idle because the data bus feeding them isn't fast enough. The solution is to transform memory into compute. Quantization is a nice trick, but it doesn't actually trade memory for compute - it transfers half as much data to a place to do twice as much computation.
Read MoreMay 06, 2026 09:18 AM
Vision agents are the default for operating web apps that don't expose APIs. Most teams default to vision agents because the alternative, writing an MCP or REST surface, is too expensive to build. The cost of the vision approach is treated as a fixed price. Current vision agents require detailed prompts to succeed in tasks, and they are still prone to making mistakes. Better vision models reduce error rates, but they do not reduce the number of screenshots required to reach the relevant data, each of which is worth thousands of input tokens.
Read MoreMay 06, 2026 09:18 AM
Gemma 4 models reduce latency bottlenecks and achieve improved responsiveness for developers by using Multi-Token Prediction drafters. These drafters deliver up to a 3x speedup without any degradation in output quality or reasoning logic due to a specialized speculative decoding architecture. Speculative decoding decouples token generation from verification. It utilizes idle compute to 'predict' several future tokens at once with the drafter in less time than it takes for the target model to process just one token. The target model then verifies all of these suggested tokens in parallel.
Read MoreMay 06, 2026 09:18 AM
Multimodal support, custom metadata filtering, and page-level citations are now available in the Gemini API File Search tool. The features can help developers bring structure to unstructured data for efficient, verifiable RAG. Users' RAG systems can now natively process and better organize text and visual data. The File Search tool handles the heavy infrastructure so users can focus on building products.
Read MoreMay 06, 2026 09:18 AM
MolmoAct 2 is an upgraded action reasoning model that improves real-world robot task performance and is paired with a large open bimanual manipulation dataset.
Read MoreMay 06, 2026 09:18 AM
This book discusses the science of scaling language models. It covers how TPUs and GPUs work, how they communicate with each other, how LLMs run on real hardware, and how to parallelize models during training and inference so they run efficiently at massive scale. The book answers questions about how expensive training a model should be, how much memory is needed to serve models, and more.
Read MoreMay 06, 2026 09:18 AM
The paper reframed hallucinations as failures to express uncertainty rather than gaps in knowledge, proposing “faithful uncertainty” as a mechanism for aligning model confidence with actual reliability.
Read MoreMay 06, 2026 09:18 AM
Google is testing upgrades for its Gemini Flash model, with a candidate seen on LM Arena performing competitively against Gemini 3.1 Pro. Users received notices to transition from Gemini 2 Flash to 3 or 3.1 Flash-Lite, hinting at an imminent general availability release. Signs also suggest a potential Flash 3.2 rollout, promising faster responses and streamlined migrations for developers and app users.
Read MoreMay 06, 2026 09:18 AM
Anthropic plans to spend $200 billion on Google Cloud over the next five years. The relationship between the two companies has been deepening in recent weeks. Google plans to invest up to $40 billion in Anthropic. Anthropic's success has led to compute constraints, which has left some users frustrated by caps. The startup has responded by striking or expanding deals to gain more compute.
Read MoreMay 06, 2026 09:18 AM
Apple reportedly planned a system allowing users to select third-party AI models within iOS 27, integrating them into features like Siri and writing tools.
Read MoreMay 06, 2026 09:18 AM
OpenAI has released a new iOS app created specifically for school and work organizations.
Read MoreMay 06, 2026 09:18 AM
Anthropic has released 10 ready-to-run templates for the most time-consuming work in financial services, including building pitchbooks, screening KYC files, and closing the books at month-end.
Read MoreMay 06, 2026 09:18 AM
Google partnered with XPRIZE and Range Media to launch a global competition encouraging short films about optimistic, tech-driven futures, with AI tools supported in production.
Read More