This post was originally published on Substack. Click the link to read the full article.
Why linear learning-rate scaling silently kills SGD’s implicit regularization and destroys test accuracy.
Read the full article on Substack
This post was originally published on Substack. Click the link to read the full article.
Why linear learning-rate scaling silently kills SGD’s implicit regularization and destroys test accuracy.
Read the full article on Substack