
As robotics systems move into real-world environments, AI is undergoing a fundamental shift—from understanding the present to predicting the future.
In this fireside chat, Michael Ryoo, Angjoo Kanazawa, and Jeannette Bohg explore why Motion World Models may be the missing piece in building truly capable robotic systems.
About the Speakers:
Michael Ryoo - Associate Professor at Stony Brook University and Principal Research Scientist at Salesforce AI Research. Former Google DeepMind researcher and co-author of RT-1, RT-2, and SARA-RT.
Angjoo Kanazawa - Assistant Professor at UC Berkeley and Amazon Scholar. Known for her pioneering work in NeRFs, 3D human reconstruction, and visual understanding of humans.
Jeannette Bohg - Associate Professor at Stanford University and Director of the Interactive Perception and Robot Learning (IPRL) Lab. Her research focuses on robot manipulation, multimodal perception, and learning-based systems.
Hosted by Tristan Tao. A partner at Rora and former founder/CEO of HyperArc (acquired by Zendesk).
Moderated by Kanu Gulati. A partner at Khosla Ventures, investing in AI, robotics, and autonomous systems. She has backed companies such as PolyAI, Waabi, and FieldAI.This session was also made possible by Khosla Ventures, a venture capital firm known for backing technically ambitious founders tackling consequential problems across AI, robotics, climate, healthcare, and enterprise infrastructure.
Khosla Ventures was the first institutional investor in OpenAI and has supported category-defining companies including DoorDash, Square, Stripe, Instacart, and Impossible Foods. The firm is known for its conviction-driven approach, partnering closely with founders from inception through scale.
Michael Ryoo highlights a key limitation of current AI systems: they are largely reactive.
Most models today excel at identifying what is happening right now -- objects, scenes, or patterns -- but struggle to anticipate what comes next. In robotics, this is a critical gap. Every action depends on forecasting outcomes, whether it’s grasping an object or navigating a space.
A central theme in Ryoo’s perspective is that motion isn’t just an attribute of the world -- it’s the essence of it, especially for robots.
Real-world environments are constantly changing:
Static models trained on images fail to capture this complexity. By explicitly modeling motion, AI systems can better understand cause-and-effect relationships and respond more effectively in dynamic settings.
All three speakers point toward a future where AI systems simulate outcomes before acting.
Instead of learning purely from trial-and-error in the real world, models can:
Jeannette Bohg emphasizes that this kind of predictive capability is essential for safe and efficient interaction—especially in robotics, where mistakes can be costly.
This marks a shift toward simulation-driven intelligence, where thinking happens before acting.
Angjoo Kanazawa underscores the importance of learning from real-world, unstructured data.
While simulation and curated datasets are useful, they often fail to capture:
For models to generalize effectively, they must be trained on data that reflects the richness of the real world—not just controlled settings.
Jeannette Bohg points out a long-standing issue in robotics: perception and control are often treated as separate problems.
In reality, they are deeply interconnected:
To build robust systems, these components must be tightly integrated. Motion World Models offer a way to unify perception and action through shared representations of dynamics and interaction.
Video models are emerging as a powerful tool because they naturally capture temporal structure.
They allow systems to:
However, the speakers note that current video-based approaches still fall short. They often lack:
This highlights the need for more physics-aware and structured representations of motion.
A recurring challenge discussed is the lack of scalable robotics data.
Unlike text or images, robotics data requires:
This creates a bottleneck for progress. As a result, there is growing interest in:
For builders and founders, this represents a significant opportunity in robotics infrastructure.
The future of robotics is increasingly tied to advances in foundation models.
We are seeing a convergence of:
Motion World Models may serve as the glue that connects these components, enabling systems to reason about the world in a unified way.
This points toward a new class of systems: general-purpose, embodied AI.
Across all perspectives, one idea stands out:
The next generation of AI systems won’t just understand the world -- they will predict, simulate, and interact with it.
Motion World Models represent an important step in that direction, bringing us closer to AI that can operate reliably in the real world.
We want to give a massive thank you to Michael Ryoo, Angjoo Kanazawa, and Jeannette Bohg for sharing their valuable knowledge and time with us.
Also, a huge thanks to Kanu and Khosla Ventures for making this event possible.
Brian is the founder and CEO of Rora. He's spent his career in education - first building Leada, a Y-Combinator backed ed-tech startup that was Codecademy for Data Science.
Brian founded Rora in 2018 with a mission to shift power to candidates and employees and has helped hundreds of people negotiate for fairer pay, better roles, and more power at work.
Brian is a graduate of UC Berkeley's Haas School of Business.
Over 1000 individuals have used Rora to negotiate more than $10M in pay increases at companies like Amazon, Google, Meta, hundreds of startups, as well as consulting firms such as Vanguard, Cornerstone, BCG, Bain, and McKinsey. Their work has been featured in Forbes, ABC News, The TODAY Show, and theSkimm.
.png)
Step 1 is defining the strategy, which often starts by helping you create leverage for your negotiation (e.g. setting up conversations with FAANG recruiters).

Step 2 we decide on anchor numbers and target numbers with the goal of securing a top of band offer, based on our internal verified data sets.

Step 3 we create custom scripts for each of your calls, practice multiple 1:1 mock negotiations, and join your recruiter calls to guide you via chat.