μpscaling small models: Principled warm starts and hyperparameter transfer
Yuxin Ma, Nan Chen, Mateo Díaz, Soufiane Hayou, Dmitriy Kunisky, Soledad Villar
Preprint (2026).We propose a μP-based model upscaling method that allows hyperparameter transfer.
model upscaling
hyperparameter transfer
mup
tensor program
training dynamics
infinite-width limit