Links: Concerning Momentum Methods [Goh]

Main link content: Gabriel Goh, 2017. “Why Momentum Really Works”.

(What momentum? This momentum.) Content warning: scattered thoughts.

This was originally going to be part of the technical links posts, but every time I read a great article / blog post with enough information content to count as a great article, I end up with bunch of tabs open pointing to other references … so anyway, here is a list of links and some brief notes.

Fascinating things / main give aways (in addition to explanation of the momentum algorithm):

Note: how the author spends most of their time analyzing how the method at hand performs on convex quadratic. In other words, nice objective functions that have gradients of particularly nice form $Ax + b$ .
Note: moving the analysis to eigenspace, including the study of dynamics of the algorithm in the eigenspace (with $x, y, \lambda$ instead of $w, z$ ).
Note: the convergence rates can be analyzed in spectral analysis (?) fashion.
Note: All references look more or less interesting, but the following I consider worth writing down:

Moritz Hardt, 2013. “Zen of Gradient Descent”. Blog post.

Ning Qian, 1999. “On the momentum term in gradient descent learning algorithms.” doi: https://doi.org/10.1016/S0893-6080(98)00116-6)

Brendan O’Donoghue, Emmanuel Candès, 2013. “Adaptive Restart for Accelerated Gradient Schemes”. doi:10.1007/s10208-013-9150-3

Yu. Nesterov, 2008. “Accelerating the cubic regularization of Newton’s method on convex problems”. doi: 10.1007/s10107-006-0089-x

Nestorov, 2004. Introductory Lectures on Convex Optimization. Bit expensive like all English-language textbooks.

Weijie Su, Stephen Boyd, Emmanuel J. Candes, 2015. “A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights”.

Nicolas Flammarion, Francis Bach, 2015. “From Averaging to Acceleration, There is Only a Step-size”

Search terms for future reference:

Linear First Order Methods
Gradient methods
Convex optimization