February 03, 2026 09:43 AM
Anthropic plans to release Claude Sonnet 5, with early testing showing strong math and coding capabilities, potentially outperforming Claude Opus 4.5. The model, observed with a 128K context window, aims to be a cost-effective solution for developers. Its release may coincide with Super Bowl LX week, aligning with AI labs' marketing strategies against ChatGPT and Google's Gemini.
Read MoreFebruary 03, 2026 09:43 AM
OpenAI released a new macOS app for Codex designed to coordinate multiple agents, run tasks in parallel, and manage long-running software projects. It's now available to ChatGPT Free and Go users for a limited time, with doubled rate limits for paid tiers.
Read MoreFebruary 03, 2026 09:43 AM
SpaceX announced that xAI is joining its organization, integrating efforts between Elon Musk's AI lab and the space company. The collaboration aims to combine advanced AI research with aerospace engineering, potentially accelerating autonomous systems and robotics in space missions. This merger signals a strategic alignment of AI development with real‑world hardware and exploration initiatives.
Read MoreFebruary 03, 2026 09:43 AM
Tinker recently made it possible to post-train Kimi K2, Moonshot's 1 trillion parameter model. This post takes a look at how to train the model on a qualitative reward. It shows readers how to train a model to decompose jokes into verifiable properties. The resulting model can make jokes and explain why jokes are funny. The models, code, and data needed to replicate the model in the post are available.
Read MoreFebruary 03, 2026 09:43 AM
E-commerce took a while to take off because the infrastructure to make it safe took time to develop. This is the same with agents. While the technology has great potential, it is currently full of risks. Agents need a security stack, just like e-commerce, where each layer handles what the others can't. This post discusses the different security layers AI agents need. Each layer represents an opportunity for companies to build infrastructure that will make the entire ecosystem possible.
Read MoreFebruary 03, 2026 09:43 AM
Open-source models like GPT-OSS 120B and Qwen3 235B are fine-tuned using Direct Preference Optimization (DPO) to potentially outperform GPT-5.2 on human preference tasks. RewardBench 2 is used for evaluation, highlighting areas like Math and Safety where these models excel. Cost-efficient, these open models also offer transparency, enabling better alignment with specific use cases while significantly reducing reliance on expensive, closed-source alternatives.
Read MoreFebruary 03, 2026 09:43 AM
Context rot is unavoidable with today's models, and you cannot work around it. The best way to deal with this is to leverage subagents. The subagent approach provides a ton of flexibility in how problems can be approached. It's not a perfect solution, but it is a lot better than other current fixes in a lot of use cases, and it embraces the limitations of the models.
Read MoreFebruary 03, 2026 09:43 AM
Golden Goose enables the synthesis of large-scale RL with Verifiable Rewards (RLVR) tasks from unverifiable web text. The resulting GooseReason dataset helps revive model performance in math, science, and cybersecurity, surpassing prior state-of-the-art in multiple domains.
Read MoreFebruary 03, 2026 09:43 AM
ChatGPT responses now contain references to ads in the source code. While not visible to users yet in the UI, the addition of the code signals that ChatGPT ads are moving from concept to near-launch. It is likely that targeting and eligibility are already being tested. OpenAI will sell ads on an impression basis. Early indications suggest they won't be cheap.
Read MoreFebruary 03, 2026 09:43 AM
Experts frequently underestimate AI's capabilities, as shown by Yann LeCun's dismissal of AI learning real-world physics and a recent NYT Op-Ed's claim that judgment is human-exclusive. AI models like GPT-3.5 and tools by Anthropic already demonstrate sophisticated decision-making, challenging these assertions. Studies in "Science" and "Nature" also reveal that AI often outperforms humans in complex judgments, suggesting the need to reassess AI's role and potential in decision-making contexts.
Read MoreFebruary 03, 2026 09:43 AM
Learn more
Read MoreFebruary 03, 2026 09:43 AM
ChatGPT ads are being tested, requiring marketers to focus on user behavior and psychology rather than traditional targeting strategies.
Read MoreFebruary 03, 2026 09:43 AM
Any request to the default model is extremely likely to be wrong unless users enable web search.
Read MoreFebruary 03, 2026 09:43 AM
Some people are buying Mac Minis just to host Moltbot (a locally running agent that can wire itself into calendars, messages, and other personal workflows) full-time.
Read MoreFebruary 03, 2026 09:43 AM
OpenClaw agents do things by combining tools in ways that haven't been combined before.
Read MoreFebruary 03, 2026 09:43 AM
Google DeepMind's Game Arena now includes Werewolf and poker to evaluate AI reasoning under uncertainty.
Read More