Bio | Makarand Tapaswi

Hi! I am a Principal Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for Social Impact, and an Assistant Professor at the Computer Vision group at IIIT Hyderabad, India.

I am very interested in using AI for Social Good. At Wadhwani AI, I work on projects across all domains: healthcare, agriculture, and education. A couple are highlighted below. Millions of newborns in India face a life of prolonged and serious ill-health as they are born with low birth-weight and interventions are missed by the health system. We have developed and deployed a video-based neonatal weight estimation approach to identify underweight newborns and monitor their growth in order to facilitate timely care for those in need. We are now extending this for anthropometry of children up to 6 years of age. In education, I am working on tools for automated assessment of reading fluency in school children. Our model has been deployed in Gujarat and Rajasthan and has been used to assess over 6 million students.

My academic research interests revolve around machine understanding of videos, language, and people. I enjoy working with movies and TV series, especially teaching machines about human behavior and analyzing storylines. Two large projects from my past are MovieGraphs and MovieQA. My journey in machine learning started with work on clustering and identifying characters in videos, a crucial building block for high-level understanding, this continues to remain a favorite problem I visit often.

Previously, I was a PostDoctoral Fellow at Inria Paris, working in the Willow group with Ivan Laptev and Josef Sivic. Before that I was with the Machine Learning group at the University of Toronto and the Vector Institute, working with Sanja Fidler. I completed my PhD at the Computer Vision for Human Computer Interaction (CVHCI) lab at the Karlsruhe Institute of Technology, Germany, advised by Rainer Stiefelhagen.

Professional service

In a year, I usually review for 4-6 major CV, ML, or NLP conferences and about 3 journal articles. Some of the venues I am reviewing for, or have reviewed for in the past include:

Area Chair: ECCV 2026, CVPR 2026, CVPR 2025, ECCV 2024, ACCV 2024, ICCV 2023
Journals: IJCV, T-PAMI, T-IP, T-CYB, T-NNLS, T-CSVT, T-MM, ACM TMC, IMAVIS
Conferences: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, ACL RR, EMNLP, EACL, ACCV, ICMI, AAAI, WACV, ICVGIP, NCVPRIPG

I have been recognized as an outstanding reviewer at:
CVPR 2019, ACCV 2020, ECCV 2020, CVPR 2021, ICML 2022

Research in Media

CVIT (the vision research center at IIIT H) had several papers at CVPR 2025, here’s a nice round up! IIIT-H Blog
Our work on predicting emotions of characters and overall movie in a scene at CVPR 2023 has been featured in multiple outlets: IIIT-H Blog, Times of India, Hindu Businessline, Telangana Today, Deccan Chronicle.
Our work on weaving soundtracks for books that won the Brave New Idea Award at ISMIR 2022 has been featured in multiple articles: IIIT-H Blog, Hindu Businessline, Eenadu - Telugu.
Our work on neonatal anthropometry won the Fast Company’s 2021 World Changing Ideas Award! Article has some factual errors as our work is still in progress.
Our work on understanding human behaviors by watching movies, MovieGraphs, was featured in UofT news and phys.org. We also received the Nvidia AI Pioneer Award for this at CVPR 2018.
Our work on enabling 4D experiences for movies, Movie4D, was featured in UofT news, phys.org, The Inquistr. I had a unique opportunity to be hosted on a radio show in Toronto by CBC Radio, Here and Now.
Our joint ICCV 2017 workshop on MovieQA and LSMDC was covered by the RSIP vision magazine.
Our work on question-answering on movie stories, MovieQA, was featured in MIT Tech Review and Nvidia Developer News.
Our work on identifying characters in movies was presented at CeBIT 2013, an annual trade show in Hannover, and covered very nicely by ITWorld.