Publications

in reversed chronological order, generated by jekyll-scholar.

2025

  1. cvpr2025-velociti.jpg
    VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2025
    New!
  2. naacl2025-identifyme.jpg
    IdentifyMe: A Challenging Mention Resolution Benchmark for LLMs
    Kawshik ManikantanMakarand TapaswiVineet Gandhi, and Shubham Toshniwal
    In Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), May 2025
    New! Short paper
  3. icassp2025-pouringsounds.jpg
    The Sound of Water: Inferring Physical Properties from Pouring Liquids
    Piyush BagadMakarand TapaswiCees G M Snoek, and Andrew Zisserman
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025
    New! (long version on arXiv)
  4. wacv2025-videomem.jpg
    Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability
    In Winter Conference on Applications of Computer Vision (WACV), Feb 2025
  5. tmlr2025-selfretrieval.jpg
    No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
    Manu GaurDarshan Singh, and Makarand Tapaswi
    Transactions on Machine Learning Research (TMLR), Jan 2025
  6. isbi2025-gencdmlfsl.jpg
    Generalized Cross-domain Multi-label Few-shot Learning for Chest X-rays
    Aroof AimenArsh VermaMakarand Tapaswi, and Narayanan C Krishnan
    In International Symposium on Biomedical Imaging (ISBI), Apr 2025
    long version on arXiv

2024

  1. emnlp2024-meira.jpg
    Major Entity Identification: A Generalizable Alternative to Coreference Resolution
    Kawshik ManikantanShubham ToshniwalMakarand Tapaswi, and Vineet Gandhi
    In Empirical Methods in Natural Language Processing (EMNLP), Nov 2024
  2. eccvw2024-evalfomo-d3bench.jpg
    Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation
    Manu GaurDarshan Singh, and Makarand Tapaswi
    In ECCV Workshop on Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo), Sep 2024
  3. icmlw2024mi-clap.jpg
    Localizing Auditory Concepts in CNNs
    Pratyaksh GautamMakarand Tapaswi*, and Vinoo Alluri*
    In ICML Workshop on Mechanistic Interpretability (ICMLW-MI), Jul 2024
  4. cvpr2024-recaps.jpg
    "Previously on ..." From Recaps to Story Summarization
    Aditya Kumar SinghDhruv Srivastava, and Makarand Tapaswi
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2024
    Media: Talk at Twelve Labs  
  5. cvpr2024-micap.jpg
    MICap: A Unified Model for Identity-aware Movie Descriptions
    Haran Raajesh*Naveen Reddy Desanur*Zeeshan Khan, and Makarand Tapaswi
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2024
    Media: RSIP Vision   Talk at Twelve Labs  
  6. cvprw2024-cvpm-nurturenet.jpg
    NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry
    Yash Khandelwal, Mayur Arvind , Sriram Kumar , Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, and Makarand Tapaswi
    In CVPR Worskhop on Computer Vision for Physiological Measurements (CVPM), Jun 2024
    Best Paper Award
  7. arxiv2401-figclip.jpg
    FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
    Darshan SinghZeeshan Khan, and Makarand Tapaswi
    In , Jun 2024
    arXiv Preprint

2023

  1. cvpr2023-emotx.jpg
    How you feelin’? Learning Emotions and Mental States in Movie Scenes
    Dhruv SrivastavaAditya Kumar Singh, and Makarand Tapaswi
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2023
    Media: IIIT-H Blog   Times of India   Hindu Businessline   Telangana Today   Deccan Chronicle  
  2. cvpr2023-testoftime.jpg
    Test of Time: Instilling Video-Language Models with a Sense of Time
    Piyush BagadMakarand Tapaswi, and Cees G M Snoek
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2023
  3. www2023nlp4kgc-grapeqa.jpg
    GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering
    Dhaval Taunk, Lakshya Khanna, Pavan Kandru, Vasudeva VarmaCharu Sharma, and Makarand Tapaswi
    In WWW Workshop on Natural Language Processing for Knowledge Graph Construction (NLP4KGc), May 2023
  4. wacv2023-avlectures.jpg
    Unsupervised Audio-Visual Lecture Segmentation
    Darshan SinghAnchit GuptaC V Jawahar, and Makarand Tapaswi
    In Winter Conference on Applications of Computer Vision (WACV), Jan 2023

2022

  1. ismir2022-bookmusic.jpg
    Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations
    Jaidev ShriramMakarand Tapaswi, and Vinoo Alluri
    In International Society for Music Information Retrieval Conference (ISMIR), Dec 2022
    Brave New Idea Award!
    Media: IIIT-H Blog   Hindu Businessline   Eenadu (Telugu)   Times of India  
  2. neurips2022-groundedvidsitu.jpg
    Grounded Video Situation Recognition
    Zeeshan KhanC V Jawahar, and Makarand Tapaswi
    In Neural Information Processing Systems (NeurIPS), Dec 2022
  3. neurips2022-3dvg.jpg
    Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
    In Neural Information Processing Systems (NeurIPS), Dec 2022
  4. ml4h2022-sslcxr.jpg
    Can we Adopt Self-supervised Pretraining for Chest X-Rays?
    Arsh Verma, and Makarand Tapaswi
    In Machine Learning for Healthcare (ML4H) (Extended Abstract), Nov 2022
  5. corl2022-hiveformer.jpg
    Instruction-driven History-aware Policies for Robotic Manipulations
    Pierre-Louis GuhurShizhe Chen, Ricardo Garcia Pinel, Makarand TapaswiIvan Laptev, and Cordelia Schmid
    In Conference on Robot Learning (CoRL), Dec 2022
  6. eccv2022-3dvln.jpg
    Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
    In European Conference on Computer Vision (ECCV), Oct 2022
  7. iros2022-phys-real2sim.jpg
    Learning Object Manipulation Skills from Video via Approximate Differentiable Physics
    In International Conference on Intelligent Robots and Systems (IROS), Oct 2022
  8. cvpr2022-duet.jpg
    Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022

2021

  1. cviu2021-longterm_ava.jpg
    Long term Spatio-Temporal Modeling for Action Detection
    Makarand Tapaswi*Vijay Kumar*, and Ivan Laptev
    Computer Vision and Image Understanding (CVIU), Jun 2021
  2. icvgip2021-tailcalibx.jpg
    Feature Generation for Long-tail Classification
    In Indian Conference on Computer Vision, Graphics, and Image Processing (ICVGIP), Dec 2021
  3. iccv2021-airbert.jpg
    Airbert: In-domain Pretraining for Vision-and-Language Navigation
    In International Conference on Computer Vision (ICCV), Oct 2021

2020

  1. corl2020-real2sim.jpg
    Learning Object Manipulation Skills via Approximate State Estimation from Real Videos
    Vladimir Petrik*Makarand Tapaswi*Ivan Laptev, and Josef Sivic
    In Conference on Robot Learning (CoRL), Nov 2020
  2. cvpr2020-mgintrel.jpg
    Learning Interactions and Relationships between Movie Characters
    Anna KuklevaMakarand Tapaswi, and Ivan Laptev
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020
  3. fg2020-ccl.jpg
    Clustering based Contrastive Learning for Improving Face Representations
    Vivek SharmaMakarand TapaswiSaquib Sarfraz, and Rainer Stiefelhagen
    In IEEE International Conference on Automatic Face and Gesture Recognition (FG), May 2020
  4. tbiom2019-selfsupervised.jpg
    Video Face Clustering with Self-Supervised Representation Learning
    Vivek SharmaMakarand TapaswiM. Saquib Sarfraz, and Rainer Stiefelhagen
    IEEE Transactions on Biometrics (T-BIOM), May 2020

2019

  1. iccv2019-bcl.jpg
    Video Face Clustering with Unknown Number of Clusters
    Makarand TapaswiMarc T. Law, and Sanja Fidler
    In International Conference on Computer Vision (ICCV), Oct 2019
  2. iccv2019-howto100m.jpg
    HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
    In International Conference on Computer Vision (ICCV), Oct 2019
    Media: Data Skeptic Podcast  
  3. fg2019-face.jpg
    Self-Supervised Learning of Face Representations for Video Face Clustering
    Vivek SharmaMakarand TapaswiSaquib Sarfraz, and Rainer Stiefelhagen
    In IEEE International Conference on Automatic Face and Gesture Recognition (FG), May 2019
    Best Paper Award!
  4. iclr2019-pmn.jpg
    Visual Reasoning by Progressive Module Networks
    Seung Wook KimMakarand Tapaswi, and Sanja Fidler
    In International Conference on Learning Representations (ICLR), May 2019
  5. arxiv1912-shmoop.jpg
    The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries
    Atef Chaudhury, Makarand TapaswiSeung Wook Kim, and Sanja Fidler
    arXiv:1912.13082, May 2019
  6. iccvw2019-tcbp.jpg
    Deep Multimodal Feature Encoding for Video Ordering
    Vivek SharmaMakarand Tapaswi, and Rainer Stiefelhagen
    In ICCV Workshop on Large Scale Holistic Video Understanding, May 2019

2018

  1. cvpr2018-moviegraphs.jpg
    MovieGraphs: Towards Understanding Human-Centric Situations from Videos
    Paul VicolMakarand TapaswiLluis Castrejon, and Sanja Fidler
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2018
    Media: UofT News   phys.org  
  2. cvpr2018-movie4d.jpg
    Now You Shake Me: Towards Automatic 4D Cinema
    Yuhao ZhouMakarand Tapaswi, and Sanja Fidler
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2018
    Media: UofT News   Inquisitr   CBC Radio  

2017

  1. iccv2017-situggnn.jpg
    Situation Recognition with Graph Neural Networks
    Ruiyu Li, Makarand TapaswiRenjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler
    In International Conference on Computer Vision (ICCV), Oct 2017

2016

  1. cvpr2016-movieqa.jpg
    MovieQA: Understanding Stories in Movies through Question-Answering
    Makarand Tapaswi, Yukun Zhu, Rainer StiefelhagenAntonio TorralbaRaquel Urtasun, and Sanja Fidler
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016
    Media: MIT Tech Review   NVidia Developer News  
  2. cvpr2016-assocpred.jpg
    Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning
    Ziad Al-HalahMakarand Tapaswi, and Rainer Stiefelhagen
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016
  3. wacv2016-subttonly.jpg
    Naming TV Characters by Watching and Analyzing Dialogs
    In Winter Conference on Applications of Computer Vision (WACV), Mar 2016
  4. icml2016-compbio-emd.jpg
    A Closed-form Gradient for the 1D Earth Mover’s Distance for Spectral Deep Learning on Biological Data
    Manuel Martinez, Makarand Tapaswi, and Rainer Stiefelhagen
    In ICML Workshop on Computational Biology (CompBio-ICML16), Jun 2016

2015

  1. icmr2015-aging.jpg
    Accio: A Data Set for Face Track Retrieval in Movies Across Age
    In International Conference on Multimedia Retrieval (ICMR), Jun 2015
  2. cvpr2015-book2movie.jpg
    Book2Movie: Aligning Video scenes with Book chapters
    Makarand TapaswiMartin Baeuml, and Rainer Stiefelhagen
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015
  3. ijmir2015-plotalign.jpg
    Aligning Plot Synopses to Videos for Story-based Retrieval
    Makarand TapaswiMartin Baeuml, and Rainer Stiefelhagen
    International Journal of Multimedia Information Retrieval (IJMIR), Jun 2015
  4. fg2015-speakingface.jpg
    Improved Weak Labels using Contextual Cues for Person Identification in Videos
    Makarand TapaswiMartin Baeuml, and Rainer Stiefelhagen
    In International Conference on Automatic Face and Gesture Recognition (FG), May 2015
  5. KIT at MediaEval 2015 – Evaluating Visual Cues for Affective Impact of Movies Task
    Marin Vlastelica Pogančić, Sergey Hayrapetyan, Makarand Tapaswi, and Rainer Stiefelhagen
    In MediaEval2015 Multimedia Benchmark Workshop (MediaEval2015), Sep 2015

2014

  1. icvgip2014-facecluster.jpg
    Total Cluster: A person agnostic clustering method for broadcast videos
    Makarand Tapaswi, Omkar M. Parkhi, Esa Rahtu, Eric Sommerlade, Rainer Stiefelhagen, and Andrew Zisserman
    In Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Dec 2014
  2. icip2014-falsepos.jpg
    Cleaning up after a Face Tracker: False Positive Removal
    Makarand Tapaswi, Cemal Çağrı Çörez, Martin BaeumlHazım Kemal Ekenel, and Rainer Stiefelhagen
    In International Conference on Image Processing (ICIP), Oct 2014
  3. avss2014-tracksplit.jpg
    A Time Pooled Track Kernel for Person Identification
    Martin BaeumlMakarand Tapaswi, and Rainer Stiefelhagen
    In Conference on Advanced Video and Signal-based Surveillance (AVSS), Aug 2014
  4. cvpr2014-storygraphs.jpg
    StoryGraphs: Visualizing Character Interactions as a Timeline
    Makarand TapaswiMartin Baeuml, and Rainer Stiefelhagen
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2014
  5. icmr2014-alignment.jpg
    Story-based Video Retrieval in TV series using Plot Synopses
    Makarand TapaswiMartin Baeuml, and Rainer Stiefelhagen
    In International Conference on Multimedia Retrieval (ICMR), Apr 2014

2013

  1. cvpr2013-fmlr.jpg
    Semi-supervised Learning with Constraints for Person Identification in Multimedia Data
    Martin BaeumlMakarand Tapaswi, and Rainer Stiefelhagen
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2013
  2. slam2013-repere.jpg
    QCompere @ Repere 2013
    Hervé Bredin, Johann Poignant, Guillaume Fortier, Makarand Tapaswi, Viet Bac Le, Anindya Roy, Claude Barras, Sophie Rosset, Achintya Sarkar, Hua Gao, Alexis Mignon, Jakob Verbeek, Laurent Besacier, Georges Quénot, Hazım Kemal Ekenel, and Rainer Stiefelhagen
    In Workshop on Speech, Language and Audio in Multimedia (SLAM), Aug 2013

2012

  1. avss2012.jpg
    Contextual Constraints for Person Retrieval in Camera Networks
    Martin BaeumlMakarand Tapaswi, Arne Schumann, and Rainer Stiefelhagen
    In Conference on Advanced Video and Signal-based Surveillance (AVSS), Sep 2012
  2. cvpr2012.jpg
    “Knock! Knock! Who is it?” Probabilistic Person Identification in TV series
    Makarand TapaswiMartin Baeuml, and Rainer Stiefelhagen
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2012
    Media: ITWorld  
  3. eccvw2012-repere.jpg
    Fusion of Speech, Faces and Text for Person Identification in TV Broadcast
    Hervé Bredin, Johann Poignant, Makarand Tapaswi, Guillaume Fortier, Viet Bac Le, Thibault Napoleon, Hua Gao, Claude Barras, Sophie Rosset, Laurent Besacier, Jakob Verbeek, Georges Quénot, Frédéric Jurie, and Hazım Kemal Ekenel
    In Workshop on Information Fusion in Computer Vision for Concept Recognition (held with ECCV 2012) (IFCVCR), Oct 2012
  4. me-gt2012.jpg
    KIT at MediaEval2012 - Content-based Genre Classification with Visual Cues
    Tomas Semela, Makarand TapaswiHazım Kemal Ekenel, and Rainer Stiefelhagen
    In MediaEval2012 Multimedia Benchmark Workshop (MediaEval2012), Oct 2012

2010

  1. icassp2010.jpg
    Direct modeling of spoken passwords for text-dependent speaker recognition by compressed time-feature representations
    Amitava Das, and Makarand Tapaswi
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar 2010

2008

  1. icvgip2008.jpg
    Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles
    Amitava Das, Ohil K. Manyam, and Makarand Tapaswi
    In Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Dec 2008
  2. Multilingual spoken-password based user authentication in emerging economies using cellular phone networks
    Amitava Das, Ohil K. Manyam, Makarand Tapaswi, and Veeresh Taranalli
    In Workshop on Spoken Language Technology (SLT), Dec 2008

Disclaimer

This publication material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.