Back To Proximal Algorithms Home.

4.1 Proximal minimization

xk+1:=proxλf(xk)
  • Standard gradient method applied to the Moreau envelope.
  • Iteration for finding a fixeed point of proximal operator (non-expansion).
  • Disappearing Tikhonov regularization, iterative refinement.
  • Gradient flow.

4.2 Proximal gradient method

Consider the problem in the following form:

minimizef(x)+g(x)

In this form, we split the objective into two terms, one of which is differentiable. This splitting is not unique, so different splittings lead to different implementations of the proximal gradient method for the same original problem.

The proximal gradient method is:

xk+1:=proxλkg(xkλkΔf(xk))

4.3 Accelerated proximal gradient method

So-called ‘accelerated’ versions of the basic proximal gradient algorithm include an extrapolation step in the algorithm. One simple version is :

yk+1:=xk+ωk(xkxk1)xk+1:=proxλkg(yk+1λkΔf(yk+1))

4.4 Alternating direction method of multipliers

Then the alternating direction method of multipliers (ADMM), also known as Douglas- Rachford splitting, is:

xk+1:=proxλf(zkuk)zk+1:=proxλg(xk+1+uk)uk+1:=uk+xk+1zk+1
  • It can be shown that ADMM converge the optimal value of the original problem.
  • ADMM algorithm allows us to perform the optimization in parallel.
  • The algortihm is extremely efficient when update x and z speratedly has a low computational cost (compared to the update together - standard descent step).
  • Although the algorithm converges when we have two variables, the convergence is not guaranteed when have three or more vaiables. Which it actually performs well in many occasions.

4.5 LM interpretations

So far we have the following interpretations for LM algorithm:

  • L2 regularized Newton.
  • Trust region algorithm.
  • Proximal operator of second order approximation.
  • Asymptotionally disappearing Tikhonov regularization of quadratic functions, Iterative refinement.

4.6 The augmented Largrangian method

Consider the primal cone convex optimization problem:

minimizef(x)subject toAxbK (which is AxKb)

Its largangian is :

L(x,λ)=f(x)λT(Axb)=((ATb)Txf(x))+λTb=bTλf(ATλ)g(λ)

With λK. Then the dual problem is:

maximizeg(λ)subject toλK
  • If g is differentiable we can use a gradient ascent methods.
  • If g is non-differentiable we can use the subgradient methods.
  • If g is non-differentiable we can consider the Moreau-Yosida approximation (as the following).
gμ(λ)supzK(g(z)12μλz22)

Be carefully that the upper is dealing with a concave function g, and we want to maximize it.

We can reform it into :

gμ(λ)=supzK(g(z)12μu22) s.t u=λz

Back To Proximal Algorithms Home.