EG+: ExtraGradient+
Published:
论文阅读笔记: Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization
Method Overview
与 ADGA 类似,文章关注于非凸非凹的极小极大问题。也即,
\[\begin{align} \min_x \max_y f(x,y) \end{align}\]但与ADGA的核心不同之处在于,EG+ 采用同步更新的方式,而非ADGA中交替更新的方式,而是采用如下算子更新,
\[\begin{align} F(z) = [\nabla_x f(z), - \nabla_y f(z)]^\top, \text{With } z = [x,y]^\top \end{align}\]首先自然地假设该算子为连续算子,也即假设优化函数满足 $L$ -光滑,
\[\begin{align} \Vert F(z) - F(z') \Vert \le L \Vert z - z' \Vert \end{align}\]并且假设该算子满足 Weak MVI 条件,也即,
\[\begin{align} F(z)^\top(z - z_{\ast}) \ge -\frac{\rho}{2} \Vert F(z) \Vert^2, \rho \in [0, \frac{1}{4L}) \end{align}\]算法采用如下迭代形式,
\[\begin{align} z_{k+1/2} &= z_k - \frac{a_k}{\beta} F(z_k) \\ z_{k+1} &= z_k - a_k F(z_{k+1/2}) \end{align}\]证明采用如下函数,
\[\begin{align} \mathcal{V}_k = a_k F(z_{k+1/2})^\top (z_{k+1/2} - z_{\ast}) + \frac{a_k \rho }{2} \Vert F(z_{k+1/2}) \Vert^2 \end{align}\]根据Weak MVI的条件易知,上述函数非负且当且仅当最优点时候为0.
Convergence
本节证明算法的收敛性,利用下面的关系式,
\[\begin{align} (x- y)^\top(y-z) &= \frac{1}{2} \Vert x - z \Vert^2 - \frac{1}{2} \Vert x- y\Vert^2 - \frac{1}{2} \Vert y - z \Vert^2 \end{align}\]根据迭代公式,
\[\begin{align} a_k F(z_{k+1/2})^\top (z_{k+1} - z) &= (z_k - z_{k+1})^\top(z_{k+1} - z) ,\forall z \\ a_k F(z_k)^\top(z_{k+1/2} -z) &= \beta (z_k - z_{k+1/2})^\top(z_{k+1/2} -z) , \forall z \end{align}\]因此,
\[\begin{align} \mathcal{V}_k &= a_k F(z_{k+1/2})^\top (z_{k+1/2} - z_{\ast}) + \frac{a_k \rho }{2} \Vert F(z_{k+1/2}) \Vert^2 \\ &= a_k F(z_{k+1/2})^\top(z_{k+1} - z_{\ast} ) + a_k F(z_{k+1/2})^\top(z_{k+1/2} - z_{k+1} ) + \frac{a_k \rho }{2} \Vert F(z_{k+1/2}) \Vert^2 \\ &= a_k F(z_{k+1/2})^\top(z_{k+1} - z_{\ast} ) + a_k F(z_{k})^\top(z_{k+1/2} - z_{k+1} ) + a_k(F(z_{k+1/2}) - F(z_k))^\top(z_{k+1/2} - z_{k+1})+ \frac{a_k \rho }{2} \Vert F(z_{k+1/2}) \Vert^2 \\ &= (z_k - z_{k+1})^\top(z_{k+1} - z_{\ast}) + \beta (z_k - z_{k+1/2})^\top(z_{k+1/2} - z_{k+1}) + a_k(F(z_{k+1/2}) - F(z_k))^\top(z_{k+1/2} - z_{k+1})+ \frac{a_k \rho }{2} \Vert F(z_{k+1/2}) \Vert^2 \\ &\le \frac{1}{2} \Vert z_k - z_{\ast} \Vert^2 - \frac{1}{2} \Vert z_k - z_{k+1} \Vert^2 - \frac{1}{2} \Vert z_{k+1} - z_{\ast} \Vert^2 + \frac{\beta}{2} \Vert z_k - z_{k+1} \Vert^2 - \frac{\beta}{2 } \Vert z_k -z_{k+1/2} \Vert^2 - \frac{\beta}{2 } \Vert z_{k+1/2} - z_{k+1} \Vert^2 \\ &\quad + a_k L \Vert z_{k+1/2} - z_k \Vert \Vert z_{k+1} - z_{k+1/2} \Vert + \frac{a_k \rho }{2} \Vert F(z_{k+1/2}) \Vert^2 \\ &\le \frac{1}{2} \Vert z_k - z_{\ast} \Vert^2 - \frac{1}{2} \Vert z_{k+1} - z_{\ast}\Vert^2 \\ &\quad + (\frac{\beta}{2} - \frac{1}{2}) \Vert z_k - z_{k+1} \Vert^2 + \frac{a_k \rho}{2} \Vert F(z_{k+1/2}) \Vert^2 + (\frac{a_k L}{2} - \frac{\beta}{2} ) \Vert z_{k+1/2} - z_k \Vert^2 + (\frac{a_k L}{2} - \frac{\beta}{2}) \Vert z_{k+1/2} - z_{k+1} \Vert^2 \\ &= \frac{1}{2} \Vert z_k - z_{\ast} \Vert^2 - \frac{1}{2} \Vert z_{k+1} - z_{\ast}\Vert^2 \\ &\quad+ (\frac{a_k^2 \beta}{2 } - \frac{a_k^2}{2} + \frac{a_k \rho}{2}) \Vert F(z_{k+1/2}) \Vert^2 + (\frac{a_k L}{2} - \frac{\beta}{2 }) \Vert z_{k+1/2} - z_k \Vert^2 + (\frac{a_k L}{2} - \frac{\beta}{2}) \Vert z_{k+1/2} - z_{k+1} \Vert^2 \\ \end{align}\]选择合适的参数,可以证明算法的收敛性,令
\[\begin{align} \beta = \frac{1}{2}, a_k = \frac{1}{2L} \end{align}\]则有,
\[\begin{align} \mathcal{V}_k &\le \frac{1}{2} \Vert z_k - z_{\ast} \Vert^2 - \frac{1}{2} \Vert z_{k+1} - z_{\ast}\Vert^2 + (\frac{a_k^2 \beta}{2 } - \frac{a_k^2}{2} + \frac{a_k \rho}{2}) \Vert F(z_{k+1/2}) \Vert^2 \\ &= \frac{1}{2} \Vert z_k - z_{\ast} \Vert^2 - \frac{1}{2} \Vert z_{k+1} - z_{\ast}\Vert^2 -\frac{a_k}{2} (\frac{1}{4L} - \rho) \Vert F(z_{k+1/2}) \Vert^2 \\ \end{align}\]移项,
\[\begin{align} \frac{1}{4L}(\frac{1}{4L} - \rho) \Vert F(z_{k+1/2}) \Vert^2 \le \frac{1}{2} \Vert z_k -z_{\ast} \Vert^2 - \frac{1}{2} \Vert z_{k+1} -z_{\ast} \Vert^2 \end{align}\]递推则可以得到,
\[\begin{align} \min \Vert F(z_{k+1/2}) \Vert^2 &\le \frac{2L \Vert z_0 - z_{\ast} \Vert^2}{(\frac{1}{4L} - \rho)(k+1)} \end{align}\]因此证明了算法的收敛率为 $\mathcal{O}(\frac{1}{k})$, 或者说 要达到 $\epsilon$ -最优解需要的迭代次数为 $\mathcal{O}(\frac{1}{\epsilon})$,