Michael Wan Interactive Insights

Latest · Development Jul 7. 2026

Frontier Frugality

Frontier-tier models like Claude Fable 5 cost $10/$50 per million tokens — 10x the price of Haiku 4.5 — and an agentic coding session re-sends its whole history on every turn, so the bill grows roughly quadratically with session length. This post works through the levers that tame that bill: prompt caching, delegating bulk reading to cheap subagents, difficulty-based routing, verification asymmetry, and — the lever with the longest half-life — distilling the expensive model's judgment into rules files that cheap models can execute after your frontier access ends.

Michael Wan

Machine Learning, AI & Agents

Paper explainers, model architectures, agent systems, and the mechanics behind modern AI

Jul 4. 2026 explainer

How Reasoning Models Learn to Think

Pretraining teaches a model what to say; it does not teach it how to think through a hard problem. Reasoning models like OpenAI o1 and DeepSeek-R1 add a second training phase: reinforcement learning against rewards that can be checked mechanically, such as a math answer or a passing test suite. This post explains that recipe — the verifiable reward signal, the GRPO algorithm that makes it cheap, the behaviors that emerge, and the new scaling axis it unlocked at test time.

Michael Wan

Jun 25. 2026 explainer

GLM-5.2: The Open-Weight Model That Beats GPT-5.5

GLM-5.2 is an MIT-licensed open-weight model that extends its context window to 1M tokens and ranks as the highest open-source model on Z.ai's evaluations, beating GPT-5.5 on long-horizon coding. Its headline trick, IndexShare, reuses sparse-attention indices across groups of layers to cut per-token FLOPs by 2.9× at 1M context. This article explains the architecture, the core mechanic, the real benchmarks, and the community's reaction.

Michael Wan

Apr 24. 2026 explainer

DeepSeek V4

DeepSeek V4 matters because it makes ultra-long-context reasoning much cheaper, then couples that with a stronger agent-training stack. This article walks through the official paper's architecture figures, explains CSA and HCA visually, and asks what V4 reveals about the secret sauce of closed frontier models.

Michael Wan

Apr 1. 2026 explainer

Manifold Constrained Hyperconnections

Skip connections are the backbone of deep learning — but naive extensions cause catastrophic instability. DeepSeek's mHC constrains Hyperconnection weight matrices to the Birkhoff polytope using the Sinkhorn-Knopp algorithm, achieving 1.8× convergence speedup while keeping gradients tamed.

Michael Wan

Sep 28. 2023 explainer

Vision Transformers Need Registers

Large Vision Transformers develop artifacts—high-norm tokens in low-information areas used for internal computation. Adding learnable register tokens eliminates these artifacts and improves dense prediction tasks.

Timothée Darcet , Maxime Oquab , Julien Mairal , et al.

May 27. 2022 explainer

FlashAttention

Understanding how FlashAttention achieves 2-4x speedups by respecting GPU memory hierarchy, using tiling to minimize HBM access, and leveraging online softmax for numerical stability.

Tri Dao , Daniel Y. Fu , Stefano Ermon , et al.

Oct 11. 2018 explainer

Understanding BERT

Pre-training deep bidirectional representations for language understanding. How masked language modeling enables a single model to master almost any NLP task.

Jacob Devlin , Ming-Wei Chang , Kenton Lee , et al.

Jun 12. 2017 explainer

Understanding the Transformer

Understanding the building blocks and design choices of the Transformer architecture that powers GPT, BERT, and modern language models.

Ashish Vaswani , Noam Shazeer , Niki Parmar , et al.

Jan 29. 1998 explainer

PageRank: How Google Brought Order to the Web

PageRank measures a page's importance by counting not just how many pages link to it, but how important those linking pages are — a recursive definition solved by treating the web as a Markov chain and finding its stationary distribution via power iteration.

Sergey Brin , Lawrence Page

Development

Tools, workflows, and practical engineering insights

Apr 1. 2026 explainer

Inside the Claude Code Leak

On March 31, 2026, a missing .npmignore rule exposed Claude Code's entire TypeScript source to the public for several hours. Beyond the drama, the code revealed something deeper: Claude Code is not a CLI tool — it is a production-grade Agent Runtime with a ReAct engine, 7-layer fault recovery, multi-agent orchestration, and 20+ unreleased features hiding behind internal flags.

Michael Wan

Apr 1. 2026 explainer

LLM-Compiled Knowledge Bases

Modern LLMs are powerful enough to act as a personal research compiler. This article covers the full pipeline in depth: collecting raw source documents, using an LLM to incrementally compile a structured Markdown wiki with concept articles and backlinks, querying that wiki with a live LLM agent, and rendering outputs as slides, charts, and reports — all viewable in Obsidian.

Michael Wan

Mar 7. 2026 explainer

AI Agent-First Engineering

A structured learning path synthesizing four key resources: Claude Code Skills docs, Butter's taxonomy of deterministic agent approaches, OpenAI's harness engineering post, and the Everything-Claude-Code toolkit. Five chapters covering foundations, determinism strategies, skills, harness design, and production systems.

Michael Wan

Feb 3. 2026 explainer

Mastering Claude Code

How parallelism, plan-first thinking, and verification loops combine into a high-leverage workflow. Distilled from the practices of the team that built Claude Code.

Boris Cherny

Business & Industry

Compensation, industry trends, and career strategy through data

Jul 4. 2026 explainer

The Collapsing Price of Intelligence

In March 2023, GPT-4 cost about $36 per million tokens. By 2025, models of comparable capability sold for under $0.50 — a decline of roughly 100x in under three years. This post decomposes the drivers behind the collapse, examines what serving actually costs, and explores why cheaper tokens paradoxically produce bigger AI bills.

Michael Wan

Jul 3. 2026 explainer

The Trimodal Nature of Tech Compensation in 2026

An interactive, data-driven look at why tech compensation falls into three tiers — Traditional, Competitive Tech, and Big Tech+ — how the gaps widened through 2025-26, and why frontier AI labs now form a fourth spike above the classic curve. Includes a tier explorer where you can see exactly which companies make up each slice.

Michael Wan

Jul 1. 2026 explainer

The AI Investment War

By mid-2026, the AI industry has organized itself into four layers — applications, models, compute, and silicon — stitched together by a dense web of equity stakes and hundred-billion-dollar supply contracts. The striking feature is not the size of any one deal but the circularity: a chipmaker invests in a lab, the lab pays a cloud, the cloud buys the chips. This interactive map lets you trace every major deal — click a company to see its full position in the war, or click an arrow for the terms — and the essay decodes the deal structures and asks what would tell the bulls from the bears.

Michael Wan

Feb 7. 2026 explainer

98 Years of the S&P 500: How Major Events Shaped Market History

An interactive visual journey through 98 years of S&P 500 performance, showing how wars, recessions, pandemics, and policy shifts drove the market through 22 bull runs and 21 corrections — and why $100 invested in 1926 became $1.48 million.

Michael Wan