This repository contains the code and experiment results for an NLP project that evaluates test-time scaling strategies on Gemini-2.5-Flash for math reasoning. We compare five inference-time ...
由于训练数据样本长度高度异质,这种样本打包极大地提高了SFT的效率。作者选择了8192作为目标序列长度,以匹配Llama 3.1的原生训练上下文窗口,并且整体打包效率达到96%,这意味着只有4%的token是填充token。