GDA for Nonconvex-Strongly-Concave Minimax Optimization

1 minute read

Published:

Paper Reading: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems

考虑使用梯度上升下降(GDA)算法

\[\begin{align*} x_{t+1} &= x_t - \eta_x \nabla_x f(x_t,y_t) \\ y_{t+1} &= y_t + \eta_y \nabla_y f(x_t,y_t) \end{align*}\]

求解如下的极小极大优化问题

\[\begin{align*} \min_x \left\{\Phi(x) := \max_{y \in \mathcal{Y}} f(x,y) \right\}. \end{align*}\]

我们假设函数满足 $L$ 光滑,并且关于 $y$ 满足 $\mu$-强凸,希望得到算法关于 $\Phi(x)$ 的收敛。

定义问题的条件数 $\kappa:=L/\mu$, 可以验证 $y^*(x)$ 满足 $\kappa$-Lipschitz, 因此我们知道 $\Phi(x)$ 满足 $(\kappa+1)L$-光滑,因此

\[\begin{align*} \Phi(x_{t+1}) &\le \Phi(x_t) + \nabla \Phi(x_t)^\top (x_{t+1} - x_t) + \kappa L \Vert x_{t+1} - x_t \Vert^2 \\ &= \Phi(x_t) - \eta_x \nabla \Phi(x_t)^\top \nabla_x f(x_t,y_t) + \eta_x^2\kappa L \Vert \nabla_x f(x_t,y_t) \Vert^2 \\ &= \Phi(x_t) - \frac{\eta_x}{2} \Vert \nabla \Phi(x_t) \Vert^2 - \left( \frac{\eta_x}{2} - \eta_x^2 \kappa L \right)\Vert \nabla_x f(x_t,y_t) \Vert^2 + \frac{\eta_x}{2} \Vert \nabla \Phi(x_t) - \nabla_x f(x_t,y_t) \Vert^2. \end{align*}\]

令 $\eta_x \le 1/ (4 \kappa L)$, 可以得到

\[\begin{align*} \Phi(x_{t+1}) \le \Phi(x_t) - \frac{\eta_x}{2} \Vert \nabla \Phi(x_t) \Vert^2 - \frac{\eta_x}{4} \Vert \nabla_x f(x_t,y_t) \Vert^2 + \frac{\eta_x}{2} \Vert \nabla \Phi(x_t) - \nabla_x f(x_t,y_t) \Vert^2. \end{align*}\]

可以发现算法收敛率取决于最后一项,根据 Danskin‘s 定理,成立 $ \nabla \Phi(x) = \nabla_x f(x,y^*(x))$. 因此,

\[\begin{align*} \Phi(x_{t+1}) \le \Phi(x_t) - \frac{\eta_x}{2} \Vert \nabla \Phi(x_t) \Vert^2 - \frac{\eta_x}{4} \Vert \nabla_x f(x_t,y_t) \Vert^2 + \frac{\eta_x L^2}{2} \Vert y_t - y^*(x_t) \Vert^2. \end{align*}\]

这意味着我们需要分析内层优化的误差,选取 $\eta_y =1/L$ 然后根据梯度下降的结论,我们知道

\[\begin{align*} \Vert y_{t+1} -y^*(x_{t}) \Vert^2 \le \left( 1 - \frac{1}{\kappa} \right) \Vert y_t - y^*(x_t) \Vert^2. \end{align*}\]

进一步根据Young’s 不等式就可以得到

\[\begin{align*} \Vert y_{t+1} - y^*(x_{t+1}) \Vert^2 \le \left( 1 - \frac{1}{2 \kappa} \right) \Vert y_t - y^*(x_t) \Vert^2 + 2 \kappa \Vert y^*(x_{t+1}) - y^*(x_{t}) \Vert^2. \end{align*}\]

进一步利用

\[\begin{align*} \Vert y^*(x_{t+1}) - y^*(x_t) \Vert^2 \le \kappa^2 \Vert x_{t+1} - x_t \Vert^2 = \kappa^2 \eta_x^2 \Vert \nabla_x f(x_k,y_k) \Vert^2. \end{align*}\]

注意到选取 $\eta_x = 1 /(3 \kappa^2 L)$, 我们将有

\[\begin{align*} \Phi(x_{t+1}) + \eta_x \kappa L^2 \Vert y_{t+1} - y^*(x_{t+1}) \Vert^2 \le \Phi(x_{t}) + \eta_x \kappa L^2 \Vert y_{t} - y^*(x_{t}) \Vert^2 - \frac{\eta_x}{2} \Vert \nabla \Phi(x_t) \Vert^2. \end{align*}\]

记最初的的最优间隙 $\Delta:= \Phi(x_0) - \Phi^$ 以及最初子问题的最优距离 $D:= \Vert y_0 - y^(x_0) \Vert^2$, 那么算法可以在

\[\begin{align*} \mathcal{O} \left( \frac{\kappa^2 L \Delta + \kappa L^2 D}{\epsilon^2} \right) \end{align*}\]

的一阶oracle复杂度内寻找到点 $\hat x$ 满足 $ \Vert \nabla \Phi(\hat x) \Vert \le \epsilon$.