May 26, 2026 09:19 AM
Grok Build, a new coding agent and CLI, has launched in beta for SuperGrok and X Premium Plus subscribers. It supports complex coding projects by allowing plan mode reviews and integrates seamlessly with user conventions. Users can deploy Grok's capabilities for automation and parallel processing using headless mode and specialized subagents.
Read MoreMay 26, 2026 09:19 AM
Pope Leo XIV recently released a document on the ethics of integrating AI into modern society. It touches on the environmental impact of the technology, covers the risks of algorithmic systems that make decisions that impact people's lives, discusses how AI amplifies the power of those with resources, and more. A link to the document is available in the article. The writing style is very approachable, even to non-Catholics.
Read MoreMay 26, 2026 09:19 AM
The market is becoming a stack of memory problems. Hardware changes slowly, while software and model architectures can move quickly. Hardware companies will need to build architectures that remain useful as the bottleneck shifts.
Read MoreMay 26, 2026 09:19 AM
Zvi judges Gemini 3.5 Flash the best model at its speed point but unconvincing against Opus 4.7 or GPT-5.5 outside latency-sensitive workloads, with Google positioning it as a daily driver for agentic workflows that outscores 3.1 Pro on Terminal-Bench and MCP Atlas while running 4x faster.
Read MoreMay 26, 2026 09:19 AM
On-policy distillation trains a student model on trajectories sampled from its own policy while a teacher provides dense token-level supervision through KL-based regularization, closing the train-inference distribution mismatch that off-policy methods suffer. The canonical formulation unifies forward-KL, reverse-KL, and JSD losses with reverse-KL emerging as the default for mode-seeking smaller students, and a one-line code swap of the regularizer model on top of an RL stack like Tinker implements the technique.
Read MoreMay 26, 2026 09:19 AM
Models.dev consolidates specifications and pricing of various models, accessible via an API.
Read MoreMay 26, 2026 09:19 AM
BenchBench is a benchmark that tests how well models can create a benchmark. It works as a great benchmark for model abilities as well as a test of models' self-awareness. The benchmark tests creativity and not just problem-solving ability. In tests, GPT 5.2 was the only winner, with every other model, from Opus 4.6 to GPT 5.5, struggling to create an actually useful benchmark that others had a hard time solving.
Read MoreMay 26, 2026 09:19 AM
Google DeepMind's AlphaProof Nexus autonomously solved nine out of 353 open Erdős problems, including questions unanswered for decades, at inference costs of a few hundred dollars per problem.
Read MoreMay 26, 2026 09:19 AM
GPT-5.6 seems heavily focused on stronger multi-step reasoning, better agentic workflows, and improved frontend generation capabilities.
Read MoreMay 26, 2026 09:19 AM
Prediction markets have failed to deliver Robin Hanson's 1990 Idea Futures vision.
Read MoreMay 26, 2026 09:19 AM
Learn more
Read MoreMay 26, 2026 09:19 AM
DeepSeek's aim is to enable a $10 trillion Chinese AI hardware ecosystem and achieve a $1 trillion valuation for itself.
Read MoreMay 26, 2026 09:19 AM
Apple plans to upgrade AI image tools, Genmoji and Image Playground, in iOS 27, enhancing visual quality and realism.
Read More