February 20, 2026 09:44 AM
Google released Gemini 3.1 Pro as the upgraded core model behind recent Gemini 3 “Deep Think” improvements, and began rolling it out to the Gemini API/AI Studio, Vertex AI, Android Studio, the Gemini app, and NotebookLM. The post highlighted a verified 77.1% score on ARC-AGI-2, more than doubling Gemini 3 Pro's result.
Read MoreFebruary 20, 2026 09:44 AM
Google tests NotebookLM integration within Opal workflows, enhancing data extraction and automation. This integration aims to streamline processes and improve workflow efficiency for users.
Read MoreFebruary 20, 2026 09:44 AM
ARC-AGI-3 is an Interactive Reasoning Benchmark designed to measure an AI Agent's ability to generalize in novel, unseen environments. Opus 4.6 demonstrates better reasoning and use of memory than Gemini 3.1 Pro and solves more levels. Current models may be able to solve ARC-AGI-3 given access to a harness with a simple memory. Memory scaffolds are likely enough for pseudo-continual learning to push us to some self-improvement or research-agent threshold within the next 2 years.
Read MoreFebruary 20, 2026 09:44 AM
This post contains a roundup of what happened in AI this week. It focuses on projections of jobs and economic impacts and also timelines to the world being transformed. It also covers recent podcasts with Dario Amondei and Elon Musk. A linked table of contents with a short description for each section is available.
Read MoreFebruary 20, 2026 09:44 AM
Cursor described an “agent sandboxing” system that let local coding agents run freely inside a constrained environment and only asked for approval when leaving the sandbox (often for internet access).
Read MoreFebruary 20, 2026 09:44 AM
optimize_anything is a declarative API that optimizes any artifact representable as text. Users declare what to optimize and how to measure it, and the system handles the search. It consistently matches or outperforms domain-specific tools. A surprisingly wide range of problems can be formulated as optimizing a text artifact. If it can be serialized to a string and its quality measured, a large language model can reason about it and propose improvements.
Read MoreFebruary 20, 2026 09:44 AM
When not using reasoning, repeating the input prompt improves performance for popular models without increasing the number of generated tokens or latency. It is interesting that tricks like this are still possible despite the amount of work being put into improving large language models. The discovery proves how much room for improvement there still is for current models.
Read MoreFebruary 20, 2026 09:44 AM
Building on Google's Transformer architecture, the authors proposed training sequence-model agents against many different opponents so they learned to adapt within each game without hardcoded assumptions about how others learn.
Read MoreFebruary 20, 2026 09:44 AM
OpenAI has a big user base, but it has limited engagement and stickiness and no network effect. The company doesn't have any unique technology. The incumbents have already matched the technology and are leveraging their product and distribution. This post takes a look at OpenAI's strategy and how the company can compete in today's landscape.
Read MoreFebruary 20, 2026 09:44 AM
Prototype with the best and polish small gems. Use teams of agents as micromanagers, and experiment with different tools and workflows. Document everything to create improvement loops that improve success rates without manual intervention. Skills are easier to debug than code.
Read MoreFebruary 20, 2026 09:44 AM
Sam Altman and Dario Amodei avoided hand-holding at an AI summit, highlighting their tense rivalry.
Read MoreFebruary 20, 2026 09:44 AM
DuckDuckGo has launched AI-powered image editing on Duck.ai, allowing users to edit images without needing an account, though subscribers have higher daily limits.
Read More