MiniMax has introduced OctoCodingBench, a benchmark designed to evaluate how well coding agents follow process-level instructions. The study found that while many models can pass output tests, they ...