Makarand Tapaswi

Wadhwani AI, IIIT Hyderabad

personal/makarand_nizamsagar.jpg

Hi! I am a Principal Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for Social Good, and an Assistant Professor at CVIT, the Computer Vision group at IIIT Hyderabad, India.

At Wadhwani AI, we are developing and deploying AI solutions that create social impact in education, health, and agriculture. In particular, I have the privilege to work on and advise multiple projects across all three domains.

At IIIT, I continue to work on projects at the intersection of video and language understanding, especially towards analyzing or generating stories. Thus, Katha AI is a natural name for our research group! (katha or कथा = story in Hindi/Marathi)

News [archives]

Feb 2026 One paper accepted to CVPR 2026 Findings track on Multimodal Entity Coreference. We perform spatio-temporal grounding of entities across videos with shot boundaries and describe them with consistency for video situation recognition. arXiv soon!
Jan 2026 Participated in a panel discussion at India’s AI impact summit on multi-stakeholder collaboration required to make AI benefit society at large! Impressed by the enthusiasm and energy about AI in India; time to galvanize this towards deeper research and useful solutions.
Jan 2026 Thank you Amazon Prime Video for supporting our work on audio descriptions with a research gift!
Dec 2025 Thank you Google Research for supporting our work on video-language understanding with a research gift and generous Gemini credits!
Nov 2025 Had the pleasure of sharing our (Wadhwani AI) work on Oral Reading Fluency at the Indian National Science Academy (INSA) - Royal Society UK Workshop (LinkedIn)! An interesting 2 days immersed in thinking about the challenges and opportunities of AI from amazing speakers from India and UK.
Nov 2025 🎉 In another first, our amazing students extended their Computer Vision course project on improving text-fidelity in Gaussian Splats to a paper at WACV 2026! Congrats Abhinav and Gaurav! arXiv
Nov 2025 Thanks to Sony Research India for extending our collaboration on analyzing TV shows and creating shorts.
Oct 2025 Talk at the ICCV 2025 Workshop on Story-Level Movie Understanding. Happy to share ADQA, our work on improving evaluation of Audio Descriptions: slides.
Oct 2025 Talk at the AI Alignment Workshop on our Wadhwani AI work. Together with Daksha Dixit, a senior design researcher, we shared insights on how we align our ORF solution with humans for maximum impact.
Sep 2025 Our work on layout guided text-to-image generation, specifically for robust generation of multiple subjects with multiple attributes is accepted to ACM Transactions on Graphics! It will be presented at Siggraph Asia. This is our group’s first foray in the graphics community! arXiv