VLM Visual Language Model Perception

Eye Movement-enhanced Visual Looming Model for Driver Emergency Break Reaction Time

Abstract: The accurate prediction of driver brake reaction time (BRT) in emergency is essential for advanced driver assistant system design. However, the current BRT models lack sufficient ...

IEEE

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

Abstract: Visual grounding seeks to localize the image region corresponding to a free-form text description. Recently, the strong multimodal capabilities of Large Vision-Language Models (LVLMs) have ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

Eye Movement-enhanced Visual Looming Model for Driver Emergency Break Reaction Time

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

今日热点