Top Stories

How Airtable Saved Millions by Cutting Archive Storage Costs by 100x
IMAP

April 27, 2026 06:08 AM

Airtable cut archive storage costs by about 100x by moving cold, mostly immutable MySQL data into S3 as partitioned Parquet files and querying it with embedded Apache DataFusion. The dataset became 10x smaller, while S3 was about 10x cheaper per byte. A Flink-based migration, bulk and shadow validation, tiered caching, custom secondary indexes, and Parquet bloom filters preserved interactive latency and enterprise guarantees.

Read More
Internal vs. External Storage: What's the Limit of External Tables?
IMAP

April 27, 2026 06:08 AM

Internal tables store and manage both data and metadata within the database system, while external tables only store metadata and reference data that lives outside the system, leaving the underlying data untouched. Internal tables enable tighter lifecycle management, whereas external tables decouple storage and compute, making it easier to scale, share, and access large datasets without moving or duplicating data.

Read More
Background Coding Agents: Supercharging Downstream Consumer Dataset Migrations
IMAP

April 27, 2026 06:08 AM

Spotify's coding agent Honk automated a complex migration of ~1,800 data pipelines by using tooling (Backstage + Fleet Management) to find dependencies, generate code changes, and manage rollout, saving 10 weeks of engineering work. This worked thanks to systems being standardized and well-instrumented, and testing to reliably make and validate changes at scale.

Read More
Measure Less to Learn More: Using Fewer, Higher-quality Metrics to Capture What Matters
IMAP

April 27, 2026 06:08 AM

Discord improved experimentation by removing redundant metrics, grouping related ones, and focusing on a small set of clearly defined “north-star” and guardrail metrics. Adding too many metrics to experiments increases multiple-testing issues and metric correlation, which can require stricter statistical corrections and make real effects harder to detect.

Read More
Databases Were Not Designed For This
IMAP

April 27, 2026 06:08 AM

Databases were built for predictable apps and human-written queries, not AI agents that generate queries on the fly, retry automatically, and can make silent mistakes at scale. Teams now need stronger guardrails like tighter permissions, timeouts, audit logs, idempotent writes, and clearer schemas so databases stay safe when AI becomes the caller.

Read More
When a Cloud Region Fails: Rethinking High Availability in a Geopolitically Unstable World
IMAP

April 27, 2026 06:08 AM

Cloud high availability can no longer assume regions are safe, independent failure domains: sanctions, data localization laws, conflict zones, and submarine cable cuts can take out an entire region or make it noncompliant. Treat region-level disruption as a first-class risk, with multi-region, jurisdiction-aware data placement, control-plane separation, and dependency audits. The added cost and complexity should be justified with Annual Loss Expectancy modeling rather than assumed.

Read More
Stop Letting Tools Lead Your Platform Decisions
IMAP

April 27, 2026 06:08 AM

Data platform decisions should start with use cases, constraints, and operating requirements, not with Kafka, Spark, Snowflake, or Airflow. The key questions are latency, data freshness, cost, failure handling, and who will consume the system. Choose the simplest stack that fits the problem, team, budget, and timelines.

Read More
DuckDB Extension - Whisper
IMAP

April 27, 2026 06:08 AM

Whisper is a DuckDB extension that lets you transcribe audio into text directly with SQL, making voice data easier to search, analyze, and use alongside your normal tables.

Read More
Jaeger adopts OpenTelemetry at its core to solve the AI agent observability gap
IMAP

April 27, 2026 06:08 AM

Jaeger v2 rebuilds its core on the OpenTelemetry Collector, natively ingesting OTLP and unifying metrics, logs, and traces in one deployment model to improve ingestion and eliminate translation steps. It's also adding agent-facing interfaces like MCP, ACP, and AG-UI so engineers can use natural language to translate incident context into deterministic trace queries and collaborate with AI agents.

Read More
tda-mapper
IMAP

April 27, 2026 06:08 AM

tda-mapper is a Python library that helps find hidden shapes, clusters, and patterns in messy data using the Mapper algorithm from topological data analysis. It's built to scale to large datasets, works with scikit-learn pipelines, and includes visual tools to explore complex data more clearly.

Read More
Measurement Engineering: The Part of Data Science That Will Thrive in AI
IMAP

April 27, 2026 06:08 AM

As AI takes over more coding, SQL, and dashboard work, the most valuable data skill may become judgment: knowing what to measure, whether metrics are trustworthy, and how to make decisions when results are unclear. Future top performers won't just build analysis, they'll own the harder question of whether the analysis actually reflects reality.

Read More
Fixing What LLMs Get Wrong
IMAP

April 27, 2026 06:08 AM

Enterprise LLM systems can produce fluent but factually wrong answers against private structured knowledge, creating a “hallucination tax” on pricing, policy, org, and legal data. Fine-tuning, RAG, and static verification each help, but none learn from repeated failures. Reflexion closes the loop by storing natural-language reflections from verified errors in episodic memory and reinjecting them into future prompts.

Read More
HDFS Lost. How Object Storage and Table Formats Won the Data Lake
IMAP

April 27, 2026 06:08 AM

Data systems evolved to decouple storage and compute, making it cheaper and easier to scale.

Read More
Airflow 2 reaches end of life
IMAP

April 27, 2026 06:08 AM

Security patches and provider updates stopped last week.

Read More
Apply here
IMAP

April 27, 2026 06:08 AM

Remi Turpaud

Read More
create your own role
IMAP

April 27, 2026 06:08 AM

Remi Turpaud

Read More
Inc.'s Best Bootstrapped businesses
IMAP

April 27, 2026 06:08 AM

Remi Turpaud

Read More