Back To Proximal Algorithms Home.
4.1 Proximal minimization
xk+1:=proxλf(xk)- Standard gradient method applied to the Moreau envelope.
- Iteration for finding a fixeed point of proximal operator (non-expansion).
- Disappearing Tikhonov regularization, iterative refinement.
- Gradient flow.
4.2 Proximal gradient method
Consider the problem in the following form:
minimizef(x)+g(x)In this form, we split the objective into two terms, one of which is differentiable. This splitting is not unique, so different splittings lead to different implementations of the proximal gradient method for the same original problem.
The proximal gradient method is:
xk+1:=proxλkg(xk−λkΔf(xk))4.3 Accelerated proximal gradient method
So-called ‘accelerated’ versions of the basic proximal gradient algorithm include an extrapolation step in the algorithm. One simple version is :
yk+1:=xk+ωk(xk−xk−1)xk+1:=proxλkg(yk+1−λkΔf(yk+1))4.4 Alternating direction method of multipliers
Then the alternating direction method of multipliers (ADMM), also known as Douglas- Rachford splitting, is:
xk+1:=proxλf(zk−uk)zk+1:=proxλg(xk+1+uk)uk+1:=uk+xk+1−zk+1- It can be shown that ADMM converge the optimal value of the original problem.
- ADMM algorithm allows us to perform the optimization in parallel.
- The algortihm is extremely efficient when update x and z speratedly has a low computational cost (compared to the update together - standard descent step).
- Although the algorithm converges when we have two variables, the convergence is not guaranteed when have three or more vaiables. Which it actually performs well in many occasions.
4.5 LM interpretations
So far we have the following interpretations for LM algorithm:
- L2 regularized Newton.
- Trust region algorithm.
- Proximal operator of second order approximation.
- Asymptotionally disappearing Tikhonov regularization of quadratic functions, Iterative refinement.
4.6 The augmented Largrangian method
Consider the primal cone convex optimization problem:
minimizef(x)subject toAx−b∈K (which is Ax⪰Kb)Its largangian is :
L(x,λ)=f(x)−λT(Ax−b)=−((ATb)Tx−f(x))+λTb=bTλ−f∗(ATλ)≜g(λ)With λ∈K∗. Then the dual problem is:
maximizeg(λ)subject toλ∈K∗- If g is differentiable we can use a gradient ascent methods.
- If g is non-differentiable we can use the subgradient methods.
- If g is non-differentiable we can consider the Moreau-Yosida approximation (as the following).
Be carefully that the upper is dealing with a concave function g, and we want to maximize it.
We can reform it into :
gμ(λ)=supz∈K∗(g(z)−12μ‖u‖22) s.t u=λ−z