Makarand Tapaswi

Wadhwani AI, IIIT Hyderabad

Hi! I am a Senior Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for Social Good, and an Assistant Professor at the Computer Vision group at IIIT Hyderabad, India.

At Wadhwani AI, we are developing AI solutions that create social impact. In particular, my primary project is estimating the weight of newborns from a video, with the goal to empower primary healthcare workers and facilities to improve lives of at risk low-birth-weight babies.

At IIIT, I continue to work on projects at the intersection of video and language understanding, especially related to analyzing stories.

news [archives]

May 2023 Wrote an article explaining Transformers for the newspaper The Hindu. link (paywall) | pdf
Feb 2023 Two papers accepted to CVPR 2023! The first is on emotion recognition in movies arXiv, and the second is on understanding time in videos arXiv.
Dec 2022 Super excited that our paper at ISMIR 2022 on automatic soundtracking for books (by using movie soundtracks) was awarded the Brave New Idea Award! Fantastic example of do what you love and good things will happen :)
Nov 2022 Excited to receive the Google India Research Award 2022! Thanks to all involved in this process. Looking forward to do more fun video-language work.
Nov 2022 Giving a talk at the Deep Video Understanding workshop co-located with ICMI 2022 on our recent work at NeurIPS 2022 on incorporating grounding with video situation recognition.
Nov 2022 Honored to be serving as Area Chair for ICCV 2023!
Oct 2022 One paper accepted to WACV 2023! We introduce a new audio-video-language dataset of lecture videos and show that contrastive learning of narrations and video clip helps learn suitable representations to perform unsupervised lecture segmentation. ArXiv
Sep 2022 Two papers accepted to NeurIPS 2022! The first is on incorporating Grounding information in Video Situation Recognition, and the second is on 3D Object Grounding based on natural language instructions. Grounded VidSitu arXiv, 3D Object Grounding arXiv
Sep 2022 One paper accepted to CoRL 2022! We show how Transformers can combine state history, multiple camera views, and natural language instructions to perform a variety of manipulation tasks on the RLBench benchmark. ArXiv
Jul 2022 One paper accepted to ISMIR 2022, my first in music! Building upon work on book-movie alignment during my PhD, we show how to generate soundtracks for books by sourcing music from their movie adaptations. A fun project on the first book-movie pair of Harry Potter! ArXiv