Makarand Tapaswi

Hi! I am a Principal Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for Social Good, and an Assistant Professor at CVIT, the Computer Vision group at IIIT Hyderabad, India.

At Wadhwani AI, we are developing and deploying AI solutions that create social impact in education, health, and agriculture. In particular, I have the privilege to work on and advise multiple projects across all three domains.

At IIIT, I continue to work on projects at the intersection of video and language understanding, especially towards analyzing or generating stories. Thus, Katha AI is a natural name for our research group! (katha or कथा = story in Hindi/Marathi)

News [archives]

Jun 2026	Our work on Steering Visual Representations is accepted to ECCV 2026! Read more about this vision-first paradigm on arXiv or Manu’s Twitter/X thread.
Jun 2026	Congratulations to Darshan and Zeeshan for winning the 🏆 Best Paper Award at the CV with Small Data workshop at CVPR 2026! Read more about our work on efficient adaptation of CLIP for videos using Semantic Role Labels.
Apr 2026	We will be co-organizing the Story-Level Movie Understanding & Audio Description workshop (SLoMO) at ECCV 2026! Excited to contribute a challenge on audio descriptions with our evaluation framework ADQA published at EMNLP 2025.
Feb 2026	One paper accepted to CVPR 2026 Findings track on Multimodal Entity Coreference. We perform spatio-temporal grounding of entities across videos with shot boundaries and describe them with consistency for video situation recognition. arXiv
Jan 2026	Participated in a panel discussion at India’s AI impact summit on multi-stakeholder collaboration required to make AI benefit society at large! Impressed by the enthusiasm and energy about AI in India; time to galvanize this towards deeper research and useful solutions.
Jan 2026	Thank you Amazon Prime Video for supporting our work on audio descriptions with a research gift and a generous AWS compute grant!
Dec 2025	Thank you Google Research for supporting our work on video-language understanding with a research gift and generous Gemini credits!
Nov 2025	Had the pleasure of sharing our (Wadhwani AI) work on Oral Reading Fluency at the Indian National Science Academy (INSA) - Royal Society UK Workshop (LinkedIn)! An interesting 2 days immersed in thinking about the challenges and opportunities of AI from amazing speakers from India and UK.
Nov 2025	🎉 In another first, our amazing students extended their Computer Vision course project on improving text-fidelity in Gaussian Splats to a paper at WACV 2026! Congrats Abhinav and Gaurav! arXiv
Nov 2025	Thanks to Sony Research India for extending our collaboration on analyzing TV shows and creating shorts.