News

Apr 2024 Congrats to Aditya and Dhruv for successfully defending their theses and completing their MS by Research! Very proud to have them as my first (single-advisor) MS students. They not only have papers at CVPR, but have been instrumental in setting a fantastic lab culture!
Apr 2024 Thank you Adobe Research for extending the research gift for 2024!
Feb 2024 Two papers accepted to CVPR 2024! The first is on using recaps to predict TV episode story summaries arXiv - coming soon, and the second is on identity-aware video captioning arXiv - coming soon.
Dec 2023 Tutorial (Slides) on Video Understanding through Language at ICVGIP 2023.
Nov 2023 Happy to be serving as Area Chair for ECCV 2024 and ACCV 2024!
Oct 2023 Visited my alma mater NITK Surathkal after 14 years! A lot has changed on campus since we graduated. Happy to give a talk about our Wadhwani AI work.
Sep 2023 Excited to share that SERB has approved funding for my Start-up Research Grant application on video understanding! This happens to be my first proposal funded by the Indian government.
Jul 2023 Speaking about computer vision projects at Wadhwani AI at NCVPRIPG 2023 industry session.
Jul 2023 Speaking about our Wadhwani AI work on newborn anthropometry at Precision Public Health Asia 2023 Conference.
Jun 2023 Excited to receive a research gift from Adobe! Sincere thanks to all involved in this process. Looking forward to a collaboration with Adobe Research India.
May 2023 Wrote an article explaining Transformers for the newspaper The Hindu. link (paywall) | pdf
Feb 2023 Two papers accepted to CVPR 2023! The first is on emotion recognition in movies arXiv, and the second is on understanding time in videos arXiv.
Dec 2022 Super excited that our paper at ISMIR 2022 on automatic soundtracking for books (by using movie soundtracks) was awarded the Brave New Idea Award! Fantastic example of do what you love and good things will happen :)
Nov 2022 Excited to receive the Google India Research Award 2022! Thanks to all involved in this process. Looking forward to do more fun video-language work.
Nov 2022 Giving a talk at the Deep Video Understanding workshop co-located with ICMI 2022 on our recent work at NeurIPS 2022 on incorporating grounding with video situation recognition.
Nov 2022 Honored to be serving as Area Chair for ICCV 2023!
Oct 2022 One paper accepted to WACV 2023! We introduce a new audio-video-language dataset of lecture videos and show that contrastive learning of narrations and video clip helps learn suitable representations to perform unsupervised lecture segmentation. ArXiv
Sep 2022 Two papers accepted to NeurIPS 2022! The first is on incorporating Grounding information in Video Situation Recognition, and the second is on 3D Object Grounding based on natural language instructions. Grounded VidSitu arXiv, 3D Object Grounding arXiv
Sep 2022 One paper accepted to CoRL 2022! We show how Transformers can combine state history, multiple camera views, and natural language instructions to perform a variety of manipulation tasks on the RLBench benchmark. ArXiv
Jul 2022 One paper accepted to ISMIR 2022, my first in music! Building upon work on book-movie alignment during my PhD, we show how to generate soundtracks for books by sourcing music from their movie adaptations. A fun project on the first book-movie pair of Harry Potter! ArXiv
Jul 2022 One paper accepted to ECCV 2022! Another work on vision-and-language navigation, we show that 3D unlabeled environments can be repurposed to generate meaningful training data with pseudo 3D object labels and GPT-2 based captions. ArXiv
Jun 2022 One paper accepted to IROS 2022! We show that modeling physics as a differentiable ODE allows us to dramatically improve the performance of 3D approximate trajectory reconstruction in Real2Sim. This also removes the need for expensive RL as trajectories can be re-targeted directly to the robot. ArXiv
Mar 2022 One paper accepted to CVPR 2022! Vision-and-language navigation can benefit strongly from graph transformers. ArXiv
Jan 2022 Promoted to Senior ML Scientist at Wadhwani AI.
Nov 2021 Gave the keynote talk at a really interesting workshop on media understanding focusing on context and environment. Hosted by Google and USC’s Center for Computational Media Intelligence (CCMI).
Oct 2021 Happy to give a talk at Adobe Research Bengaluru a few days ago! Some exciting work on document processing there.
Sep 2021 One paper on long-tail image classification accepted to ICVGIP 2021. Rather than re-sampling from the “tail class”, we adapt a recent few-shot learning work to analyze the impact of feature generation.
Jul 2021 One paper accepted to ICCV 2021! We propose in-domain, self-supervised pretraining using Airbnb listings to improve Vision-and-Language Navigation models. ArXiv Github
Jul 2021 Launched new website based on the al-folio theme. Time to say goodbye to my old self-made Jinja+Python website and embrace Liquid+Jekyll!
Jul 2021 Excited to join IIIT Hyderabad as an Assistant Professor!
Jun 2021 Analyzing longer videos helps improve spatio-temporal action detection. Read more about it in our CVIU article in the Special Issue on Recent Advances in Modeling, Methodology and Applications of Action Recognition and Detection.
May 2021 Outstanding reviewer award for CVPR 2021.
May 2021 Visual Weighing Machine wins the Best World Changing Idea - APAC at Fast Company’s competition.
Dec 2020 Outstanding reviewer award for ACCV 2020.
Oct 2020 My first work on robotics accepted to CoRL 2020! We try to teach robots simple object manipulations by learning to translate videos into a 3D state space, Real2Sim.
Aug 2020 Outstanding reviewer award for ECCV 2020.
Feb 2020 One paper accepted to CVPR 2020! We show that joint modeling of interactions and relationships between movie characters helps improve performance of both, in a weakly supervised setting.
Jul 2019 Two papers accepted to ICCV 2019! In the first, we present a large scale dataset consisting of 130 million video-language clips obtained from over a million instructional Youtube videos, HowTo100M. Our second work, Ball Cluster Learning, is a novel loss function for clustering face tracks without knowing the number of characters.
May 2019 Best Paper Award at FG 2019 for our work on self-supervised face clustering.
Feb 2018 Two papers accepted to CVPR 2018! In MovieGraphs we present a dataset for analzing how people behave in social situations by an in-depth analysis of 51 moies. In Movie4D we propose an approach to make any movie suitable for 4D cinema by predicting effects experienced by main characters!

Note: Older items can be found strewn across my old website.