At Wadhwani AI, we are developing AI solutions that create social impact. In particular, my primary project is estimating the weight of newborns from a video, with the goal to empower primary healthcare workers and facilities to improve lives of at risk low-birth-weight babies.
At IIIT, I continue to work on projects at the intersection of video and language understanding, especially related to analyzing stories.
|Dec 2022||Super excited that our paper at ISMIR 2022 on automatic soundtracking for books (by using movie soundtracks) was awarded the Brave New Idea Award! Fantastic example of do what you love and good things will happen :)|
|Nov 2022||Excited to receive the Google India Research Award 2022! Thanks to all involved in this process. Looking forward to do more fun video-language work.|
|Nov 2022||Giving a talk at the Deep Video Understanding workshop co-located with ICMI 2022 on our recent work at NeurIPS 2022 on incorporating grounding with video situation recognition.|
|Nov 2022||Honored to be serving as Area Chair for ICCV 2023!|
|Oct 2022||One paper accepted to WACV 2023! We introduce a new audio-video-language dataset of lecture videos and show that contrastive learning of narrations and video clip helps learn suitable representations to perform unsupervised lecture segmentation. ArXiv|
|Sep 2022||Two papers accepted to NeurIPS 2022! The first is on incorporating Grounding information in Video Situation Recognition, and the second is on 3D Object Grounding based on natural language instructions. Grounded VidSitu arXiv, 3D Object Grounding arXiv|
|Sep 2022||One paper accepted to CoRL 2022! We show how Transformers can combine state history, multiple camera views, and natural language instructions to perform a variety of manipulation tasks on the RLBench benchmark. ArXiv|
|Jul 2022||One paper accepted to ISMIR 2022, my first in music! Building upon work on book-movie alignment during my PhD, we show how to generate soundtracks for books by sourcing music from their movie adaptations. A fun project on the first book-movie pair of Harry Potter! ArXiv|
|Jul 2022||One paper accepted to ECCV 2022! Another work on vision-and-language navigation, we show that 3D unlabeled environments can be repurposed to generate meaningful training data with pseudo 3D object labels and GPT-2 based captions. ArXiv|
|Jun 2022||One paper accepted to IROS 2022! We show that modeling physics as a differentiable ODE allows us to dramatically improve the performance of 3D approximate trajectory reconstruction in Real2Sim. This also removes the need for expensive RL as trajectories can be re-targeted directly to the robot. ArXiv|