Makarand Tapaswi

Hi! I am a Principal Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for Social Good, and an Assistant Professor at the Computer Vision group at IIIT Hyderabad, India.
At Wadhwani AI, we are developing AI solutions that create social impact. In particular, I work on several projects in education and MNCH (maternal, newborn, and child health).
At IIIT, I continue to work on projects at the intersection of video and language understanding, especially related to analyzing stories.
News [archives]
Oct 2025 | Giving a talk at the ICCV 2025 Workshop on Story Understanding. Happy to share our work on Audio Descriptions (slides soon). |
Oct 2025 | Giving a talk at the AI Alignment Workshop on our work at Wadhwani AI. Together with Daksha Dixit, a senior design researcher, we share insights on alignment with humans to build solutions for social impact. |
Sep 2025 | Our work on layout guided image generation, specifically for multiple subjects with multiple attributes and to reduce dependence on “lucky” random seeds is accepted to the Transactions on Graphics journal! It will be presented at Siggraph Asia. This is our group’s first paper in a graphics venue! arXiv soon! |
Sep 2025 | Moved back to 🇮🇳India 5 years ago. Lucky to work at both institutes and have the opportunity to develop AI solutions of societal value and mentor young researchers. |
Aug 2025 | Audio Descriptions (ADs) make video content accessible to blind and low vision (BLV) viewers. While there is interest in automatic generation, evaluation is challenging. In our long paper accepted to EMNLP 2025, we show the subjective nature of ADs and propose ADQA to evaluate AD generation. arXiv |
Aug 2025 | Happy to be serving as Area Chair for CVPR 2026! |
Jul 2025 | Many students graduated from our group! Darshan (MS Research), Eshika (Dual degree), Haran (B Tech); and co-supervised students Darshana (MS Research), Kawshik (Dual degree), and Prajneya (Dual degree). They are off to various industry research positions (Google Deepmind, Adobe, MSR) or PhD programs (US, Germany). Congratulations to all of them! |
May 2025 | Thank you Adobe Research for extending the research gift for 2025! |
Feb 2025 | One paper accepted to CVPR 2025 on benchmarking video-language models for their ability to understand compositionality! We propose a strict form of video-language entailment that is amenable to modern VLMs. Try it out! arXiv, HuggingFace |
Jan 2025 | Are LLMs good at resolving coreference between people in complicated stories? Our benchmark paper, IdentifyMe, accepted to NAACL 2025 indicates not! arXiv |