2026 02 21 | Makarand Tapaswi

One paper accepted to CVPR 2026 Findings track on Multimodal Entity Coreference. We perform spatio-temporal grounding of entities across videos with shot boundaries and describe them with consistency for video situation recognition. arXiv soon!