papers

Test-time compute scaling laws for open-weight MoE models

arXiv cs.LG·2mo ago·1 min read

This paper presents empirical scaling laws for inference-time compute in sparse Mixture-of-Experts models, demonstrating that careful expert routing yields 3–5× effective compute improvements on math benchmarks.

Open original ↗