Google DeepMind's New Framework for Measuring Progress Toward AGI: Everything You Need to Know

One of the most debated questions in artificial intelligence is deceptively simple: How close are we to AGI? Google DeepMind just published a groundbreaking paper that attempts to answer this question with a structured, scientific framework for measuring progress toward Artificial General Intelligence.

This isn't just another think piece or prediction - it's a rigorous cognitive framework that could fundamentally change how the entire AI industry benchmarks progress. Let's dive deep into what they've proposed and what it means.

What Is AGI, Really?

Before we can measure progress toward AGI, we need to define it. And that's been one of the biggest challenges in AI research. Different organizations have different definitions:

OpenAI: "Highly autonomous systems that outperform humans at most economically valuable work"
Google DeepMind: A system with broad cognitive capabilities matching or exceeding human performance
Academic consensus: A machine that can understand, learn, and apply intelligence across any domain

DeepMind's new framework moves beyond these vague definitions to create measurable, testable criteria.

The DeepMind AGI Framework: Key Components

1. Levels of AGI

Rather than treating AGI as a binary (either we have it or we don't), DeepMind proposes a 6-level scale:

Level	Name	Description	Current Example
0	No AI	Rule-based systems only	Basic calculators
1	Emerging	Equal to or better than unskilled humans	GPT-3, early LLMs
2	Competent	At least 50th percentile of skilled adults	GPT-4, Claude Opus, Gemini 3
3	Expert	At least 90th percentile of skilled adults	Narrow: AlphaFold, coding agents
4	Virtuoso	At least 99th percentile of skilled adults	Not yet achieved broadly
5	Superhuman	Exceeds 100% of humans	Narrow: Chess engines, Go engines

2. Breadth vs. Depth Matrix

One of the framework's most important innovations is separating breadth (how many domains) from depth (how well in each domain):

Narrow AI: Expert or superhuman in ONE domain (we have this)
General AI: Competent or better across MANY domains (this is the goal)
Broad AI: Emerging capabilities across many domains (this is roughly where we are)

Current frontier models like Claude, GPT-5, and Gemini 3 fall into what DeepMind calls "Level 2 General" - competent across many tasks but not yet expert-level across the board.

3. Cognitive Capabilities Tested

The framework evaluates AI across these cognitive dimensions:

Reasoning and Problem-Solving

Abstract reasoning
Causal inference
Multi-step logical deduction
Novel problem-solving (not pattern matching)

Learning and Adaptation

Few-shot learning efficiency
Transfer learning across domains
Continuous learning without catastrophic forgetting
Learning from feedback

Language and Communication

Natural language understanding
Nuanced communication
Multilingual capability
Context-appropriate responses

Perception and Interaction

Multimodal understanding (text, image, audio, video)
Spatial reasoning
Temporal understanding
Physical world modeling

Social Intelligence

Theory of mind
Emotional understanding
Cultural awareness
Collaborative problem-solving

Metacognition

Self-awareness of limitations
Uncertainty quantification
Strategic planning
Resource allocation

Where Are We Now?

Based on DeepMind's framework, here's an honest assessment of where current AI stands:

What We've Achieved (Level 2-3)

Language understanding: Frontier models comprehend and generate text at expert human levels
Code generation: AI can write, debug, and explain code at a professional level
Knowledge synthesis: Models can combine information across vast domains
Creative tasks: AI produces compelling writing, art, and music

Where We're Still Struggling (Level 1-2)

Physical reasoning: Understanding how the real world works remains challenging
Long-term planning: Multi-step plans over extended periods still fail frequently
Novel problem-solving: AI excels at pattern matching but struggles with truly new problems
Common sense: Everyday reasoning that humans find trivial can trip up AI
Reliable agency: AI agents still make critical errors in autonomous operation

What's Still Far Away (Level 0-1)

True understanding: AI processes information but may not truly "understand" it
Consciousness: If it's even relevant to AGI (debatable)
Robust generalization: Performing well in genuinely novel situations

Why This Framework Matters

1. It Ends the Hype vs. Doom Debate

Instead of arguing about whether AGI is "2 years away" or "50 years away," we can now have nuanced discussions about specific capabilities and levels. We might achieve Level 3 General AI in 5 years but Level 5 might take 30 years - and that distinction matters enormously.

2. It Guides Research Priorities

By identifying exactly where current AI falls short, researchers can focus their efforts. The framework reveals that metacognition and novel problem-solving are the biggest gaps - these should be priority research areas.

3. It Helps with Safety Planning

Different levels of AGI require different safety measures:

Level 2-3: Current alignment techniques may be sufficient
Level 4: We need significant advances in interpretability and control
Level 5: This is where existential risk discussions become critical

4. It Sets Industry Standards

Having a common measurement framework allows:

Meaningful comparison between different AI systems
Clear communication about capabilities to the public
Informed policy and regulation decisions
Better investment allocation in AI research

The Controversy

Not everyone agrees with DeepMind's approach. Critics raise several points:

"You Can't Measure AGI Like This"

Some researchers argue that intelligence is too complex to fit into a neat scale. They worry that optimizing for specific benchmarks might create the illusion of progress without genuine advancement.

"It's Self-Serving"

Skeptics note that frameworks created by AI companies tend to place their own products favorably. DeepMind's Gemini models naturally score well on their own evaluation criteria.

"It Ignores Consciousness"

Philosophers and some researchers argue that true AGI must involve some form of consciousness or understanding, not just functional equivalence. DeepMind's framework deliberately sidesteps this question.

"The Goalposts Will Move"

History shows that as AI achieves milestones, we tend to redefine AGI to exclude them. "Playing chess" was once considered a sign of intelligence; now it's "just computation."

What This Means for the Future

For AI Researchers

This framework provides clear targets. The next frontier is Level 3 General - expert-level performance across diverse cognitive tasks. Key areas to crack:

Robust reasoning that works on novel problems
Reliable long-term planning and execution
True understanding vs. sophisticated pattern matching
Self-aware systems that know what they don't know

For the Industry

Companies can now:

Set realistic product roadmaps based on capability levels
Communicate more honestly about what their AI can and can't do
Plan safety measures appropriate to their system's level
Make better hiring and research investment decisions

For Society

The public now has a clearer way to understand:

What AI can actually do today (it's impressive but has clear limits)
What's coming next (more capable but not omniscient AI)
When to be concerned (Level 4+ requires serious safety work)
How to plan (education, careers, policy)

My Perspective

Having followed AI research closely, I believe DeepMind's framework is the most useful attempt yet at measuring AGI progress. It's not perfect - no single framework can capture the full complexity of intelligence - but it gives us a shared language and measurement system that the field desperately needs.

We're currently at Level 2 General AI with pockets of Level 3 in specific domains. The jump to Level 3 General is likely achievable within the next few years. But Level 4 and 5 represent fundamentally harder challenges that may require breakthroughs we haven't yet imagined.

The most important takeaway? AGI isn't a single moment or event. It's a gradual progression, and we're already further along than most people realize - while also further away from "true" AGI than the hype suggests.

How do you think we should measure progress toward AGI? Do you find DeepMind's framework useful, or do you think we're missing something fundamental? Share your thoughts below.

Back to all posts

Google DeepMind's New Framework for Measuring Progress Toward AGI: Everything You Need to Know

Google DeepMind's New Framework for Measuring Progress Toward AGI: Everything You Need to Know

What Is AGI, Really?

The DeepMind AGI Framework: Key Components

1. Levels of AGI

2. Breadth vs. Depth Matrix

3. Cognitive Capabilities Tested

Reasoning and Problem-Solving

Learning and Adaptation

Language and Communication

Perception and Interaction

Social Intelligence

Metacognition

Where Are We Now?

What We've Achieved (Level 2-3)

Where We're Still Struggling (Level 1-2)

What's Still Far Away (Level 0-1)

Why This Framework Matters

1. It Ends the Hype vs. Doom Debate

2. It Guides Research Priorities

3. It Helps with Safety Planning

4. It Sets Industry Standards

The Controversy

"You Can't Measure AGI Like This"

"It's Self-Serving"

"It Ignores Consciousness"

"The Goalposts Will Move"

What This Means for the Future

For AI Researchers

For the Industry

For Society

My Perspective

Related Articles

AI in 2026: Agents Enter Banking, Google Gemini Imports Chats & More — Today's Top AI Trends