Makarand Tapaswi

Jun 2024	Visiting Seattle for CVPR. Giving a talk at the workshop on What is Next in Video Understanding?. Also visiting Allen AI and happy to talk about our lab’s work at Apple and Amazon Prime Video.
Jun 2024	Proud advisor moment. Haran was featured in RSIP vision’s CVPR Daily Friday edition magazine as an undergrad presenting a paper at CVPR! Congratulations Haran!! We have great students at IIIT Hyderabad and it is a pleasure to walk them through their first research experience!
Jun 2024	Our work at Wadhwani AI on Newborn Anthropometry has received the best paper award at the CVPR workshop Computer Vision for Physiological Measurement. arXiv
Jun 2024	Visited Bengaluru, happy to talk about our lab’s work at Adobe Research, Google DeepMind India, and CDS, IISc. Was super cool to meet folks at Sensara.tv and see their lovely work on video understanding in one platform!
Apr 2024	Congrats to Aditya and Dhruv for successfully defending their theses and completing their MS by Research! Very proud to have them as my first (single-advisor) MS students. They not only have papers at CVPR, but have been instrumental in setting a fantastic lab culture!
Apr 2024	Thank you Adobe Research for extending the research gift for 2024!
Feb 2024	Two papers accepted to CVPR 2024! The first is on using recaps to predict TV episode story summaries arXiv, and the second is on identity-aware video captioning arXiv.
Dec 2023	Tutorial (Slides) on Video Understanding through Language at ICVGIP 2023.
Nov 2023	Happy to be serving as Area Chair for ECCV 2024 and ACCV 2024!
Oct 2023	Visited my alma mater NITK Surathkal after 14 years! A lot has changed on campus since we graduated. Happy to give a talk about our Wadhwani AI work.
Sep 2023	Excited to share that SERB has approved funding for my Start-up Research Grant application on video understanding! This happens to be my first proposal funded by the Indian government.
Jul 2023	Speaking about computer vision projects at Wadhwani AI at NCVPRIPG 2023 industry session.
Jul 2023	Speaking about our Wadhwani AI work on newborn anthropometry at Precision Public Health Asia 2023 Conference.
Jun 2023	Excited to receive a research gift from Adobe! Sincere thanks to all involved in this process. Looking forward to a collaboration with Adobe Research India.
May 2023	Wrote an article explaining Transformers for the newspaper The Hindu. link (paywall) \| pdf
Feb 2023	Two papers accepted to CVPR 2023! The first is on emotion recognition in movies arXiv, and the second is on understanding time in videos arXiv.
Dec 2022	Super excited that our paper at ISMIR 2022 on automatic soundtracking for books (by using movie soundtracks) was awarded the Brave New Idea Award! Fantastic example of do what you love and good things will happen :)
Nov 2022	Excited to receive the Google India Research Award 2022! Thanks to all involved in this process. Looking forward to do more fun video-language work.
Nov 2022	Giving a talk at the Deep Video Understanding workshop co-located with ICMI 2022 on our recent work at NeurIPS 2022 on incorporating grounding with video situation recognition.
Nov 2022	Honored to be serving as Area Chair for ICCV 2023!
Oct 2022	One paper accepted to WACV 2023! We introduce a new audio-video-language dataset of lecture videos and show that contrastive learning of narrations and video clip helps learn suitable representations to perform unsupervised lecture segmentation. ArXiv
Sep 2022	Two papers accepted to NeurIPS 2022! The first is on incorporating Grounding information in Video Situation Recognition, and the second is on 3D Object Grounding based on natural language instructions. Grounded VidSitu arXiv, 3D Object Grounding arXiv
Sep 2022	One paper accepted to CoRL 2022! We show how Transformers can combine state history, multiple camera views, and natural language instructions to perform a variety of manipulation tasks on the RLBench benchmark. ArXiv
Jul 2022	One paper accepted to ISMIR 2022, my first in music! Building upon work on book-movie alignment during my PhD, we show how to generate soundtracks for books by sourcing music from their movie adaptations. A fun project on the first book-movie pair of Harry Potter! ArXiv
Jul 2022	One paper accepted to ECCV 2022! Another work on vision-and-language navigation, we show that 3D unlabeled environments can be repurposed to generate meaningful training data with pseudo 3D object labels and GPT-2 based captions. ArXiv
Jun 2022	One paper accepted to IROS 2022! We show that modeling physics as a differentiable ODE allows us to dramatically improve the performance of 3D approximate trajectory reconstruction in Real2Sim. This also removes the need for expensive RL as trajectories can be re-targeted directly to the robot. ArXiv
Mar 2022	One paper accepted to CVPR 2022! Vision-and-language navigation can benefit strongly from graph transformers. ArXiv
Jan 2022	Promoted to Senior ML Scientist at Wadhwani AI.
Nov 2021	Gave the keynote talk at a really interesting workshop on media understanding focusing on context and environment. Hosted by Google and USC’s Center for Computational Media Intelligence (CCMI).
Oct 2021	Happy to give a talk at Adobe Research Bengaluru a few days ago! Some exciting work on document processing there.
Sep 2021	One paper on long-tail image classification accepted to ICVGIP 2021. Rather than re-sampling from the “tail class”, we adapt a recent few-shot learning work to analyze the impact of feature generation.
Jul 2021	One paper accepted to ICCV 2021! We propose in-domain, self-supervised pretraining using Airbnb listings to improve Vision-and-Language Navigation models. ArXiv Github
Jul 2021	Launched new website based on the al-folio theme. Time to say goodbye to my old self-made Jinja+Python website and embrace Liquid+Jekyll!
Jul 2021	Excited to join IIIT Hyderabad as an Assistant Professor!
Jun 2021	Analyzing longer videos helps improve spatio-temporal action detection. Read more about it in our CVIU article in the Special Issue on Recent Advances in Modeling, Methodology and Applications of Action Recognition and Detection.
May 2021	Outstanding reviewer award for CVPR 2021.
May 2021	Visual Weighing Machine wins the Best World Changing Idea - APAC at Fast Company’s competition.
Dec 2020	Outstanding reviewer award for ACCV 2020.
Oct 2020	My first work on robotics accepted to CoRL 2020! We try to teach robots simple object manipulations by learning to translate videos into a 3D state space, Real2Sim.
Aug 2020	Outstanding reviewer award for ECCV 2020.
Feb 2020	One paper accepted to CVPR 2020! We show that joint modeling of interactions and relationships between movie characters helps improve performance of both, in a weakly supervised setting.
Jul 2019	Two papers accepted to ICCV 2019! In the first, we present a large scale dataset consisting of 130 million video-language clips obtained from over a million instructional Youtube videos, HowTo100M. Our second work, Ball Cluster Learning, is a novel loss function for clustering face tracks without knowing the number of characters.
May 2019	Best Paper Award at FG 2019 for our work on self-supervised face clustering.
Feb 2018	Two papers accepted to CVPR 2018! In MovieGraphs we present a dataset for analzing how people behave in social situations by an in-depth analysis of 51 moies. In Movie4D we propose an approach to make any movie suitable for 4D cinema by predicting effects experienced by main characters!

Note: Older items can be found strewn across my old website.