Abstract: To solve the problem of balancing high cost and high performance in large language model (LLMs) inference scenarios, an adaptive routing strategy (MA-Router) with multi-modal attention ...