papers
Test-time compute scaling laws for open-weight MoE models
arXiv cs.LG·2mo ago·1 min read
This paper presents empirical scaling laws for inference-time compute in sparse Mixture-of-Experts models, demonstrating that careful expert routing yields 3–5× effective compute improvements on math benchmarks.