Multimodal chain-of-thought (MCoT) reasoning has garnered attention for its ability to enhance step-by-step reasoning in multimodal contexts, particularly within multimodal large language models ...
Through systematic experiments DeepSeek found the optimal balance between computation and memory with 75% of sparse model ...
Recent advances in Large Language Models (LLMs) have showcased impressive reasoning abilities in structured tasks like mathematics and programming, largely driven by Reinforcement Learning with ...
Abstract: Visual Question Answering (VQA) is an advanced artificial intelligence task that combines computer vision and natural language processing technologies. Its core objective is to enable ...
Abstract: The current automatic speech recognition (ASR) technology has achieved remarkable success, but it is vulnerable to adversarial attacks. To solve the problem that existing adversarial attack ...