Interview Prep

February 12, 2026

Motion World Models for Robot Control: A Fireside Chat with Michael Ryoo, Angjoo Kanazawa, Jeannette Bohg, and Kanu Gulati

As robotics systems move into real-world environments, AI is undergoing a fundamental shift—from understanding the present to predicting the future.

In this fireside chat, Michael Ryoo, Angjoo Kanazawa, and Jeannette Bohg explore why Motion World Models may be the missing piece in building truly capable robotic systems.

About the Speakers:

Michael Ryoo - Associate Professor at Stony Brook University and Principal Research Scientist at Salesforce AI Research. Former Google DeepMind researcher and co-author of RT-1, RT-2, and SARA-RT.

Angjoo Kanazawa - Assistant Professor at UC Berkeley and Amazon Scholar. Known for her pioneering work in NeRFs, 3D human reconstruction, and visual understanding of humans.

Jeannette Bohg - Associate Professor at Stanford University and Director of the Interactive Perception and Robot Learning (IPRL) Lab. Her research focuses on robot manipulation, multimodal perception, and learning-based systems.

Hosted by Tristan Tao. A partner at Rora and former founder/CEO of HyperArc (acquired by Zendesk).

Moderated by Kanu Gulati. A partner at Khosla Ventures, investing in AI, robotics, and autonomous systems. She has backed companies such as PolyAI, Waabi, and FieldAI.This session was also made possible by Khosla Ventures, a venture capital firm known for backing technically ambitious founders tackling consequential problems across AI, robotics, climate, healthcare, and enterprise infrastructure.

Khosla Ventures was the first institutional investor in OpenAI and has supported category-defining companies including DoorDash, Square, Stripe, Instacart, and Impossible Foods. The firm is known for its conviction-driven approach, partnering closely with founders from inception through scale.

‍

Get Support on Your Job Search

1. AI Must Predict, Not Just Perceive

Michael Ryoo highlights a key limitation of current AI systems: they are largely reactive.

Most models today excel at identifying what is happening right now -- objects, scenes, or patterns -- but struggle to anticipate what comes next. In robotics, this is a critical gap. Every action depends on forecasting outcomes, whether it’s grasping an object or navigating a space.

2. Motion Is the Core of Robotics

A central theme in Ryoo’s perspective is that motion isn’t just an attribute of the world -- it’s the essence of it, especially for robots.

Real-world environments are constantly changing:

Objects move
Humans interact
Physics introduces uncertainty

Static models trained on images fail to capture this complexity. By explicitly modeling motion, AI systems can better understand cause-and-effect relationships and respond more effectively in dynamic settings.

3. Simulation Is the Next Step for Intelligence

All three speakers point toward a future where AI systems simulate outcomes before acting.

Instead of learning purely from trial-and-error in the real world, models can:

Imagine possible futures
Evaluate different actions
Choose the most optimal path

Jeannette Bohg emphasizes that this kind of predictive capability is essential for safe and efficient interaction—especially in robotics, where mistakes can be costly.

This marks a shift toward simulation-driven intelligence, where thinking happens before acting.

4. Real-World Data Still Matters Most

Angjoo Kanazawa underscores the importance of learning from real-world, unstructured data.

While simulation and curated datasets are useful, they often fail to capture:

The variability of real environments
The unpredictability of human behavior
The complexity of natural motion

For models to generalize effectively, they must be trained on data that reflects the richness of the real world—not just controlled settings.

5. Perception and Action Must Be Integrated

Jeannette Bohg points out a long-standing issue in robotics: perception and control are often treated as separate problems.

In reality, they are deeply interconnected:

What a robot sees influences how it acts
Its actions, in turn, change what it sees

To build robust systems, these components must be tightly integrated. Motion World Models offer a way to unify perception and action through shared representations of dynamics and interaction.

6. Video Models Are a Natural Fit—but Not Enough

Video models are emerging as a powerful tool because they naturally capture temporal structure.

They allow systems to:

Learn patterns of movement
Understand sequences of events
Model transitions over time

However, the speakers note that current video-based approaches still fall short. They often lack:

Physical consistency
Accurate modeling of forces and interactions
Long-term coherence

This highlights the need for more physics-aware and structured representations of motion.

7. Data and Infrastructure Are Major Bottlenecks

A recurring challenge discussed is the lack of scalable robotics data.

Unlike text or images, robotics data requires:

Physical systems
Real-world interaction
Expensive collection processes

This creates a bottleneck for progress. As a result, there is growing interest in:

Simulation environments
Synthetic data generation
Scalable data pipelines

For builders and founders, this represents a significant opportunity in robotics infrastructure.

8. Robotics Is Converging with Foundation Models

The future of robotics is increasingly tied to advances in foundation models.

We are seeing a convergence of:

Vision (perception)
Language (reasoning and planning)
Action (control and execution)

Motion World Models may serve as the glue that connects these components, enabling systems to reason about the world in a unified way.

This points toward a new class of systems: general-purpose, embodied AI.

Final Takeaway

Across all perspectives, one idea stands out:

The next generation of AI systems won’t just understand the world -- they will predict, simulate, and interact with it.

Motion World Models represent an important step in that direction, bringing us closer to AI that can operate reliably in the real world.

What’s Next?

We want to give a massive thank you to Michael Ryoo, Angjoo Kanazawa, and Jeannette Bohg for sharing their valuable knowledge and time with us.

Also, a huge thanks to Kanu and Khosla Ventures for making this event possible.

‍

Full Video:

Negotiate Your Offer

Get Support on Your Job Search

Brian is the founder and CEO of Rora. He's spent his career in education - first building Leada, a Y-Combinator backed ed-tech startup that was Codecademy for Data Science.

Brian founded Rora in 2018 with a mission to shift power to candidates and employees and has helped hundreds of people negotiate for fairer pay, better roles, and more power at work.

Brian is a graduate of UC Berkeley's Haas School of Business.

Over 1000 individuals have used Rora to negotiate more than $10M in pay increases at companies like Amazon, Google, Meta, hundreds of startups, as well as consulting firms such as Vanguard, Cornerstone, BCG, Bain, and McKinsey. Their work has been featured in Forbes, ABC News, The TODAY Show, and theSkimm.