2026 02 21
One paper accepted to CVPR 2026 Findings track on Multimodal Entity Coreference. We perform spatio-temporal grounding of entities across videos with shot boundaries and describe them with consistency for video situation recognition. arXiv soon!