Next webinar Registration
Ranjay Krishna
University of Washington
Title:
Visual Reasoning will be bigger than language reasoning
Summary:
I will argue that visual reasoning is a fundamental capability and one that has tremendous potential in multimodal language models. I will start by outlining the types of tasks that multimodal models still fall short on, drawing on decades of computer vision research. Next, I will introduce the concept of sketching, which operationalizes visual reasoning using external computer vision models as tools. I will demonstrate the potential of visual reasoning with sketching, and outline the limitations. After which, we will overcome these limitations by incorporating visual reasoning directly into the language model using perception tokens. Finally, I will describe how visual reasoning can enable robots to reason in space, allowing them to surpass non-reasoning proprietary robotics foundation models.
Bio:
Ranjay Krishna is an Assistant Professor at the Allen School of Computer Science & Engineering. He co-directs the RAIVN lab at UW and directs the PRIOR team at the Allen Institute. His research lies at the intersection of computer vision, natural language processing, robotics, and human computer interaction. This research has received best paper honorable mentions at CVPR'25 and CSCW'23, outstanding paper at NeurIPS'21 and ACL'21, and dozens of orals at CVPR, ACL, CSCW, NeurIPS, UIST, and ECCV, and has been reported by Science, Forbes, the Wall Street Journal, and PBS NOVA. He is also recognized as one of MIT Technology Review's 35 under 35 Asia Pacific '25. His research has been supported by Google, Apple, Ai2, Amazon, Cisco, Toyota Motor Inc, Toyota Research Institute, NSF, ONR, and Yahoo. He holds a bachelor's degree in Electrical & Computer Engineering and in Computer Science from Cornell University, a master's degree in Computer Science from Stanford University and a Ph.D. in Computer Science from Stanford University.



