SVRG++

2 minute read

Published:

论文阅读笔记: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives

原始的 SVRG 或者 L-SVRG 需要函数满足强凸性质, SVRG++ 将该方法拓展到普通凸函数,算法使用两层循环,外层循环 $s$ 每次的轮此递增,而内层循环 $k$ 使用方差缩减技术

\[\begin{align*} \mathbb{E} [f(x_{k+1}) - f(x_{\ast})] &\le \mathbb{E}[ f(x_k) - f(x_{\ast}) + \nabla f(x_k)^\top(x_{k+1}- x_k) + \frac{L}{2} \Vert x_{k+1} - x_k \Vert^2] \\ &=f(x_k) - f(x_{\ast}) + \mathbb{E} [\nabla f(x_k)^\top(x_{k+1} - x_k)] + \frac{L}{2} \Vert x_{k+1} - x_k \Vert^2 \\ &\le \nabla f(x_k)^\top(x_k - x_{\ast}) + \mathbb{E} [\nabla f(x_k)^\top(x_{k+1} - x_k)] + \frac{L}{2} \Vert x_{k+1} - x_k \Vert^2\\ &= \mathbb{E} [g_k^\top (x_{k+1} - x_{\ast})] + \frac{L}{2} \Vert x_{k+1} - x_k \Vert^2+ \mathbb{E} [ g_k - \nabla f(x_k))^\top(x_{k+1} - x_{k}) \\ &= \frac{1}{\eta_k} \mathbb{E} [(x_k - x_{k+1})^\top(x_{k+1} - x_{\ast})] + \frac{L}{2} \Vert x_{k+1} - x_k \Vert^2 + \mathbb{E} [ (g_k - \nabla f(x_k))^\top(x_{k+1} - x_{k}) \\ &= \frac{1}{2 \eta_k} \mathbb{E}[\Vert x_k - x_{\ast} \Vert^2 - \Vert x_{k+1} - x_\ast \Vert^2 ] + (\frac{L}{2} - \frac{1} {2 \eta_k}) \Vert x_{k+1} - x_k \Vert^2 + \mathbb{E} [ g_k - \nabla f(x_k))^\top(x_{k+1} - x_{k}) \\ &\le \frac{1}{2 \eta_k} \mathbb{E}[\Vert x_k - x_{\ast} \Vert^2 - \Vert x_{k+1} - x_\ast \Vert^2 ] + \frac{\eta_k}{2 (1 -\eta_k L)} \mathbb{E} [ \Vert g_k - \nabla f(x_k,\xi) \Vert^2] \\ \text{Where } g_k &= \nabla f(x_k,\xi) - \nabla f(\tilde s,\xi) + \nabla f(\tilde x) \end{align*}\]

根据方差缩减的性质,

\[\begin{align*} \mathbb{E} [ g_k - \nabla f(x_k) \Vert^2] &\le \mathbb{E} [\Vert \nabla f(x_k,\xi) - \nabla f(\tilde x,\xi) \Vert^2] \\ &\le2 \mathbb{E}[ \Vert \nabla f(x_k,\xi) - \nabla f(x_{\ast},\xi) \Vert^2] +2 \mathbb{E}[ \Vert \nabla f(\tilde x,\xi) - \nabla f(x_{\ast},\xi) \Vert^2] \\ &\le 4L (f(x_k) - f(x_{\ast}) + f(\tilde x) - f(x_{\ast}) ) \end{align*}\]

取 $ \eta_k = 1/7L$, 对每一轮次 $s$,

\[\begin{align*} \mathbb{E} [ f(x_{k+1}^{(s)}) - f(x_{\ast})] &\le \frac{1}{3} \mathbb{E} [ f(x_k^{(s)}) -f(x_{\ast})+ f(\tilde x^{(s)}) - f(x_{\ast})] +\frac{1}{2 \eta_k} \mathbb{E}[ \Vert x_k^{(s)} - x_{\ast} \Vert^2 - \Vert x_{k+1}^{(s)} - x_{\ast} \Vert^2] \\ \end{align*}\]

对该轮次中的所有 $m_s$ 次迭代求和,

\[\begin{align*} \frac{2}{m_s} \sum_{k=0}^{m_s - 1}\mathbb{E} [ f(x_{0}^{(s) } - f(x_{\ast}))] &\le \frac{1}{m_s} \mathbb{E} [ f(x_0^{(s)}) -f(x_{m_s}^{(s)})] +\frac{3}{2 \eta_k m_s} \mathbb{E}[\Vert x_0^{(s)} - x_{\ast} \Vert^2 - \Vert x_{m_s}^{(s)} - x_{\ast} \Vert^2] - +f(\tilde x^{(s)}) - f(x_{\ast}) \\ \end{align*}\]

取下一轮的Sanpshot $\tilde x^{(s+1)} = \sum_{k=0}^{m_s} f(x_{k+1}^{(s)})/m_s$ , 根据函数的凸性,

\[\begin{align*} 2(f(\tilde x^{(s+1)}) - f(x_{\ast})) &\le \frac{2}{m_s} \sum_{k=0}^{m_s - 1}\mathbb{E} [ f(x_{k+1}^{(s) } - f(x_{\ast}))] \\ &\le \frac{1}{m_s} \mathbb{E} [ f(x_0^{(s)}) -f(x_{0}^{(s+1)})] +\frac{3}{2 \eta_k m_s} \mathbb{E} [\Vert x_0^{(s) } - x_{\ast} \Vert^2 - \Vert x_0^{(s+1)} - x_{\ast} \Vert^2] +f(\tilde x^{(s)}) - f(x_{\ast}) \\ \end{align*}\]

令 $m_{s+1} =2 m_s$, 以及Lyapunov函数为,

\[\begin{align*} \mathcal{V}^{(s)} &= f(\tilde x^{(s)}) - f(x_{\ast}) + \frac{3}{2 \eta_k m_s} \Vert x_0^{(s)} - x_{\ast} \Vert^2 + \frac{1}{m_s} (f(x_0^{(s)} - f(x_{\ast})) \\ &\le (\tilde x^{(s)}) - f(x_{\ast}) + \frac{21L}{2 m_s} \Vert x_0^{(s)} - x_{\ast} \Vert^2 + \frac{1}{m_s} (f(x_0^{(s)} - f(x_{\ast})) \\ \end{align*}\]

此时Lyapunov函数呈指数下降,

\[\begin{align*} \mathcal{V}^{(s+1)} &\le \frac{1}{2} \mathcal{V}^{(s)} \end{align*}\]

因此选取合适的参数后计算复杂度为,

\[\begin{align*} \mathcal{O}( Sn + 2^S m_0) &= \mathcal{O}(n \log \frac{1}{\epsilon} + \frac{L}{\epsilon}),\text{ with } m_0 = \mathcal{O}(L), S = \mathcal{O}( \log \frac{1}{\epsilon}) \end{align*}\]