Abstract: To solve the problem of balancing high cost and high performance in large language model (LLMs) inference scenarios, an adaptive routing strategy (MA-Router) with multi-modal attention ...
The rapid progress of large language models (LLMs) has catalyzed the emergence of multimodal large language models (MLLMs) that unify visual understanding and image generation within a single ...