What LLMs can’t learn from text — and why human-like understanding may require bodies, not bigger models.

The Fundamental Limitation

Large Language Models (LLMs) have achieved remarkable success by learning from text. But there’s a fundamental gap: they lack sensorimotor experience.

What is the Sensorimotor Gap?

The sensorimotor gap refers to the difference between:

  • Textual knowledge: What can be learned from reading
  • Embodied knowledge: What requires physical interaction

Examples

Text can teach:

  • “A cup is used for drinking”
  • “Red means stop”
  • “Gravity pulls objects down”

But text cannot teach:

  • The weight of a cup in your hand
  • The feeling of acceleration
  • The texture of different materials
  • Spatial relationships through movement
  • Cause and effect through manipulation

Why This Matters

1. Understanding vs. Knowledge

LLMs can:

  • Generate text about concepts
  • Answer questions about topics
  • Explain relationships

But they cannot:

  • Truly understand physical causality
  • Predict outcomes from first principles
  • Generalize to novel situations
  • Reason about space and time

2. The Grounding Problem

Without sensorimotor experience:

  • Words lack referents
  • Concepts are abstract
  • Knowledge is disconnected
  • Understanding is shallow

3. Limitations in Reasoning

LLMs struggle with:

  • Physical reasoning: Predicting object behavior
  • Spatial reasoning: Understanding 3D relationships
  • Temporal reasoning: Understanding cause and effect
  • Causal reasoning: Identifying true causes

What Text Can’t Convey

Embodied Knowledge

Proprioception:

  • Knowing where your body is in space
  • Understanding movement and balance
  • Feeling muscle tension and effort

Haptic Feedback:

  • Texture and material properties
  • Weight and resistance
  • Temperature and pressure

Spatial Understanding:

  • Distance and scale
  • Orientation and perspective
  • Navigation and wayfinding

Experiential Learning

Trial and Error:

  • Learning from mistakes
  • Understanding consequences
  • Developing intuition

Cause and Effect:

  • Manipulating objects
  • Observing outcomes
  • Building mental models

Social Interaction:

  • Reading body language
  • Understanding tone and emotion
  • Responding to feedback

The Path Forward

1. Multimodal Models

Combine text with:

  • Vision: Understanding visual information
  • Audio: Processing sound and speech
  • Video: Learning from motion
  • Sensors: Getting real-world data

2. Embodied AI

Robots that:

  • Interact with the physical world
  • Learn from manipulation
  • Develop sensorimotor skills
  • Build grounded understanding

3. Simulation

Virtual environments where AI can:

  • Practice physical interactions
  • Learn from simulated physics
  • Develop spatial reasoning
  • Understand cause and effect

4. Hybrid Approaches

Combine:

  • Text learning: Broad knowledge
  • Embodied learning: Grounded understanding
  • Simulation: Safe experimentation
  • Human feedback: Guided learning

Implications for AI Development

Current Limitations

We should recognize that:

  • LLMs excel at language tasks
  • But struggle with physical reasoning
  • And lack true understanding
  • Bigger models won’t solve this

Future Directions

Focus on:

  • Multimodal learning: Beyond text
  • Embodied systems: Physical interaction
  • Causal reasoning: Understanding mechanisms
  • Grounded knowledge: Connecting symbols to reality

The Philosophical Question

Can Understanding Exist Without Experience?

Arguments for:

  • Mathematical truths exist independently
  • Abstract concepts don’t require embodiment
  • Symbolic reasoning can be sufficient

Arguments against:

  • Understanding requires grounding
  • Concepts need referents
  • Knowledge requires experience
  • Meaning comes from interaction

The Middle Ground

Perhaps:

  • Some understanding is possible from text
  • But full understanding requires experience
  • Different types of knowledge need different approaches
  • Hybrid systems may be the answer

Practical Implications

For Developers

When building AI systems:

  • Recognize text-only limitations
  • Consider multimodal approaches
  • Use simulation when possible
  • Combine different learning methods

For Users

When using AI:

  • Understand its limitations
  • Don’t expect physical reasoning
  • Verify important claims
  • Use AI as a tool, not a replacement

Conclusion

The sensorimotor gap highlights a fundamental limitation of text-only learning. While LLMs have achieved remarkable success, true understanding may require:

  • Embodied experience: Physical interaction with the world
  • Multimodal learning: Beyond just text
  • Causal reasoning: Understanding mechanisms
  • Grounded knowledge: Connecting symbols to reality

Bigger models won’t solve this. We need different approaches:

  • Embodied AI systems
  • Multimodal learning
  • Simulation environments
  • Hybrid architectures

The future of AI isn’t just bigger language models—it’s systems that can learn from experience, reason about the physical world, and develop true understanding through interaction.

This is the challenge—and opportunity—for the next generation of AI systems.