Adaptive Extra Gradient for Minimax Problems
Published:
Paper Reading: Adaptive extra-gradient methods for min-max optimization and games. [ICLR 21]
给定一个单调算子 $F$, 文章关注于求解如下的变分不等式问题
\[\begin{align*} \langle F(z^\ast), z - z^\ast \rangle \ge 0, \forall z \in Z. \end{align*}\]考虑如下的外梯度法,
\[\begin{align*} z_{t+1/2} &= \Pi_Z (z_t - \gamma_t F(z_t)) \\ z_{t+1} &= \Pi_Z (z_t - \gamma_t F(z_{t+1/2})). \end{align*}\]算法所输出的点为如下的加权平均
\[\begin{align*} \bar z_T = \frac{\sum_{t=1}^T \gamma_t z_{t+1/2}}{\sum_{t=1}^T \gamma_t}. \end{align*}\]假设 $F$ 为 $L$-Lipschitz 算子。
文章给出一种自适应的步长选取方式,使得对于两种设定都可以在无需知道问题参数的前提同时达到最优的收敛率。
具体来说,步长选取如下
\[\begin{align*} \gamma_t = \frac{1}{\sqrt{1 + \sum_{s=1}^{t-1}} \delta_s^2 }, \quad \delta_s = \Vert F(z_{s+1/2}) - F(z_s) \Vert. \end{align*}\]根据更新公式,以及投影的最有条件,我们知道对于任意的 $z \in Z$, 成立
\[\begin{align*} &\quad \langle z_{t+1} - z, \gamma_t F(z_{t+1/2}) \rangle \\ &\le \langle z- z_{t+1} ,z_{t+1} - z_t \rangle \\ &= \frac{1}{2} \Vert z_t -z \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z_t \Vert^2. \end{align*}\]同理,我们也有
\[\begin{align*} &\quad \langle z_{t+1/2} - z, \gamma_t F(z_t) \rangle \\ &\le \langle z- z_{t+1/2} , z_{t+1/2} - z_t \rangle \\ &= \frac{1}{2} \Vert z_t - z \Vert^2 - \frac{1}{2} \Vert z_{t+1/2} - z \Vert^2 - \frac{1}{2} \Vert z_{t+1/2} - z_t \Vert^2. \end{align*}\]根据恒等式, 对于 $z^\ast$, 我们有
\[\begin{align*} &\quad \langle z_{t+1/2} - z^\ast , \gamma_t F(z_{t+1/2}) \rangle \\ &= \langle z_{t+1} - z^\ast, \gamma_t F(z_{t+1/2}) \rangle + \langle z_{t+1/2} - z_{t+1}, \gamma_t F(z_{t+1/2} ) \rangle \\ &= \langle z_{t+1} - z^\ast, \gamma_t F(z_{t+1/2}) \rangle + \langle z_{t+1/2} - z_{t+1}, \gamma_t F(z_t) \rangle + \gamma_t \langle z_{t+1/2} - z_{t+1} , F(z_{t+1/2}) - F(z_t) \rangle \\ &\le \frac{1}{2} \Vert z_t -z^\ast \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z^\ast \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z_t \Vert^2 \\ &\quad + \frac{1}{2} \Vert z_t - z_{t+1} \Vert^2 - \frac{1}{2} \Vert z_{t+1/2} - z_{t+1} \Vert^2 - \frac{1}{2} \Vert z_{t+1/2} - z_t \Vert^2 \\ &\quad + \frac{1}{2} \Vert z_{t+1/2} - z_{t+1} \Vert^2 + \frac{\gamma_t^2}{2 } \Vert F(z_{t+1/2} ) - F(z_t) \Vert^2 \\ &= \frac{1}{2} \Vert z_t -z^\ast \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z^\ast \Vert^2 - \frac{1}{2} \Vert z_{t+1/2} - z_t \Vert^2 + \frac{\gamma_t^2}{2 } \Vert F(z_{t+1/2} ) - F(z_t) \Vert^2 \\ &\le \frac{1}{2} \Vert z_t -z^\ast \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z^\ast \Vert^2 - \left(\frac{1}{2L^2} - \frac{\gamma_t^2}{2} \right)\Vert F(z_{t+1/2}) - F(z_t) \Vert^2 \end{align*}\]移项后得到
\[\begin{align*} &\quad \frac{1}{4L^2} \Vert F(z_{t+1/2}) - F(z_t) \Vert^2 + \langle z_{t+1/2} - z^\ast , \gamma_t F(z_{t+1/2}) \rangle \\ &\le \frac{1}{2} \Vert z_t -z^\ast \Vert^2 - \frac{1}{2} \Vert z_{t+1} - z^\ast \Vert^2 - \left(\frac{1}{4L^2} - \frac{\gamma_t^2}{2} \right)\Vert F(z_{t+1/2}) - F(z_t) \Vert^2. \end{align*}\]注意到 $\gamma_t$ 为单调递减有上界序列,其一定存在极限 $\gamma_{\infty}$.
我们证明该极限 $\gamma_{\infty} >0$, 否则,存在 $t_0$ 使得上式的最后一项为对于 $t \ge t_0$ 都为负,那么递推后得到
\[\begin{align*} +\infty &= \frac{1}{4L^2} \left(\frac{1}{\gamma_{\infty} } - 1 \right)\\ &=\frac{1}{4L^2} \sum_{t=1}^T \Vert F(z_{t+1/2}) - F(z_t) \Vert^2 \\ &\le \frac{1}{2} \Vert z_1 - z^\ast \Vert^2 + \left(\frac{\gamma_t^2}{2} -\frac{1}{4L^2} \right)\sum_{t=1}^{t_0} \Vert F(z_{t+1/2}) - F(z_t) \Vert^2 \\ &:= D \le +\infty. \end{align*}\]这就导出了矛盾,这说明 $\gamma_t \ge \gamma_{\infty} >0$. 那么$ \sum_{t=1}^T \gamma_t = \Omega(T)$.
这可以说明如下的收敛率
\[\begin{align*} \frac{1}{\sum_{t=1}^T \gamma_t}\langle F( z_{t+1/2}) , z_{t+1/2} - z^\ast \rangle \le \frac{D}{\sum_{t=1}^T \gamma_t} = \mathcal{O}\left( \frac{1}{T} \right). \end{align*}\]