One paper accepted to ECCV 2022! Another work on vision-and-language navigation, we show that 3D unlabeled environments can be repurposed to generate meaningful training data with pseudo 3D object labels and GPT-2 based captions. ArXiv