Makarand Tapaswi

Hi! I am a Principal Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for Social Good, and an Assistant Professor at CVIT, the Computer Vision group at IIIT Hyderabad, India.

At Wadhwani AI, we are developing and deploying AI solutions that create social impact in education, health, and agriculture. In particular, I have the privilege to work on and advise multiple projects across all domains.

At IIIT, I continue to work on projects at the intersection of video and language understanding, especially related to analyzing or generating stories. Thus, Katha AI is a natural name for our research group!

News [archives]

Nov 2025	Thanks to Sony Research India for extending our collaboration on analyzing TV shows and creating shorts.
Oct 2025	Talk at the ICCV 2025 Workshop on Story-Level Movie Understanding. Happy to share ADQA, our work on improving evaluation of Audio Descriptions: slides.
Oct 2025	Talk at the AI Alignment Workshop on our Wadhwani AI work. Together with Daksha Dixit, a senior design researcher, we shared insights on how we align our ORF solution with humans for maximum impact.
Sep 2025	Our work on layout guided text-to-image generation, specifically for robust generation of multiple subjects with multiple attributes is accepted to ACM Transactions on Graphics! It will be presented at Siggraph Asia. This is our group’s first foray in the graphics community! arXiv soon!
Sep 2025	Moved back to 🇮🇳India 5 years ago. Lucky to work at both institutes and have the opportunity to develop AI solutions of societal value and mentor young researchers.
Aug 2025	Audio Descriptions (ADs) make video content accessible to blind and low vision (BLV) viewers. While there is interest in automatic generation, evaluation is challenging. In our long paper accepted to EMNLP 2025, we show the subjective nature of ADs and propose ADQA to evaluate AD generation. arXiv
Aug 2025	Happy to be serving as Area Chair for CVPR 2026!
Jul 2025	Many students graduated from our group! Darshan (MS Research), Eshika (Dual degree), Haran (B Tech); and co-supervised students Darshana (MS Research), Kawshik (Dual degree), and Prajneya (Dual degree). They are off to various industry research positions (Google Deepmind, Adobe, MSR) or PhD programs (US, Germany). Congratulations to all of them!
May 2025	Thank you Adobe Research for extending the research gift for 2025!
Feb 2025	One paper accepted to CVPR 2025 on benchmarking video-language models for their ability to understand compositionality! We propose a strict form of video-language entailment that is amenable to modern VLMs. Try it out! arXiv, HuggingFace