Jason Wei — 3 Key Ideas in AI in 2025

TL;DR

Jason Wei outlines three mental models for 2025: (1) intelligence becomes a commodity as test-time compute and agents drive the cost of “getting the right answer” toward zero; (2) Verifier’s Law: AI learns fastest on tasks that are easy to verify; (3) the edge of intelligence is jagged: capabilities and improvement rates vary wildly by task—no single “fast takeoff.”

Once a capability is unlocked, its cost drops rapidly; same target performance gets cheaper each year.
Key driver: adaptive/test-time compute (e.g., o1) — spend more reasoning on hard problems, less on easy ones without growing model size.
Agents compress time-to-knowledge: public facts go from hours → minutes → seconds as browsing + tool use mature (e.g., finding niche stats via national databases).
Implications: democratization of knowledge-gated fields (coding, personal health guidance), rising relative value of private/insider data, and eventual personalized, frictionless information access.

Principle: The easier a task is to verify, the easier it is to train AI to do it; easily verifiable tasks will be conquered first.
Verification ease depends on: objective truth, speed of checking, scalability (batchable checks), low noise, and continuous reward (not just pass/fail).
Examples along the spectrum:
- Easy to verify: Sudoku, rendering/click-testing an app, code with unit tests, geometry packing tasks.
- Hard to verify: factual essays, “best diet” claims (slow, noisy, costly to validate).
Tactics: supply answer keys/tests to move tasks into the easy-to-verify quadrant (benchmarks, unit tests, graded objectives).
Case study: AlphaEvolve-style loops—sample many candidates, grade automatically, feed the best back to the model—show rapid gains when a crisp metric exists.

No singular step-function “superintelligence takeoff”; instead, capability improves at different rates per task.
Heuristics for where progress is fastest:
- Digital > physical (iteration speed, scale).
- Easier for humans → easier for AI (in general).
- Data-abundant tasks improve faster; exception: if a single clear metric exists, RL/self-play can synthesize data (AlphaZero/AlphaEvolve pattern).
Rough landscape: already strong on competition math/coding and translation for high-resource languages; slower on low-resource languages, robotics, hands-on trades (plumbing, hairdressing), artisanal crafts, and socially complex goals.

Short term: automation/acceleration hits trivially verifiable, digital, data-rich workflows first (software dev, debugging with tests, structured research + fact retrieval).
Opportunity surface: invent measurements and create verification harnesses—whoever defines the metric unlocks progress.
Strategy: expect uneven gains; plan around task-level variance, and value private data/connectors as public knowledge commoditizes.

Jason Wei is an AI researcher at Meta Superintelligence Labs.
Previously at OpenAI, co-created o1 and Deep Research.
Earlier at Google Brain, helped introduce Chain-of-Thought prompting and documented emergent phenomena.