A downloadable game

Random matrix theory (RMT) offers a host of tools to make sense of neural networks. In this paper, we look at the heavy-tailed random matrix theory developed by Martin and Mahoney (2021). From the spectrum of eigenvalues, it’s possible to derive generalization metrics that are independent of data, and to make decompose the training process into five unique phases. Additionally, the theory predicts and tests a key form learning bias known as “self-regularization.” In this paper, we extend the results from computer vision to language models, finding many similarities and a few potentially meaningful differences. This provides a glimpse of what more “top-down” interpretability approaches might accomplish: from a deeper understanding of the training process and path-dependence to inductive bias and generalization.

Download

Download
rmt-llms.pdf 1.6 MB

Leave a comment

Log in with itch.io to leave a comment.