Abstract: Multimodal large models (MLLMs) have shown strong performance in language tasks, but there is still room for improvement in visual capabilities. To address this challenge, we introduce ...