Abstract: We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. Existing 3D visual grounding tasks focus on localizing a unique ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果一些您可能无法访问的结果已被隐去。
显示无法访问的结果