Splitting algorithms
Monotonicity

Description from the paper A first-order primal-dual algorithm for convex problems with applications to imaging : Computing the solution of a convex optimization problem is equivalent to the problem of finding zeros of maximal operator T associated with the subgradient of the optimization problem. And The proximal point algorithm is probably the most fundamental algorithm for finding zeros of T . It is written as the recursion :

\[w^{n+1} = (I+\tau^{n}T)^{-1}(w^{n})\]

where $\tau^{n} >0$ is are the steps. Unfortunately, in most interesting cases $(I+\tau^{n}T)^{-1}$ is hard to evaluate and hence the partical interest of the proximal algorithm is limited.

If the operator T can be split up into a sum of two maximal monotone operators A and B such that T = A+B and $(I+\tau A)^{-1}$ and $(I+\tau B)^{-1}$ are easiser to evaluate thane $(I+\tau T)^{-1}$, then one can devise algorithms which only need to evaluate the resolvent operators with respect to A and B. DRS(Douglas-Rachford Splitting) is one of the algorithms focus it, which is known to be a special case of the prximal point algorithm.

1. Splitting algorithms

From paper Splitting algorithms for the sum of two nonlinear operators

For solving the stationary equation (optimal condition), where C is a multivalued(set-valued) monotone opeartor on Hilbert space H:

\[0\in C(u^{*})\]

Splitting Algorithm : consider the case of splitting the operator C, that C = A+B, and A , B are maximal monotone. (standard methods for the 1979) :

Forward-Backward Algorithm :

\[u^{n+1} := (I+\lambda A)^{-1}(I-\lambda B)u^{n}\]

It is not unconditionally stable [1]_ , but convergent if $\lambda$ sufficiently small, and B is Lipschitz continuous.

The corresponding updates algorithm :

\[\begin{cases} \frac{u^{n+1/2}- u^{n}}{\lambda} + Bu^{n} = 0\\ \frac{u^{n+1}-u^{n+1/2}}{\lambda} + Au^{n+1} = 0 \end{cases}\]

Double-Backward Algorithm :

\[u^{n+1} := (I+\lambda A)^{-1}(I+\lambda B)^{-1}u^{n}\]

It is unconditionally stable, but doesn’t convergence except with some special modification.

Assumption : assume the stationary problem $0\in C(u^{*})$ has at least one solution. Hence :

\[there\ exist\ u\in H, a\in A(u), b\in B(u)\quad such\ that\ a+b=0\]

Notation :

\[v = (I + \lambda B)(u) = J_{B}^{\lambda} (u)\]

.. [1] unconditionally stable : $u^{n}$ remains bounded independently of n for any $\lambda$.

1.1 Peaceman-Rachford

\[u^{n+1} := (I+\lambda B)^{-1}(I-\lambda A)(I+\lambda A)^{-1}(I-\lambda B)u^{n}\]

The algorithm is :

\[v^{n+1} := (2J_{A}^{\lambda} - I)(2J_{B}^{\lambda}-I)v^{n}\]

Convergence : If B is single valued and satisfies the following property : For all $x_{n},x\in D(B)$ such that $Bx_{n}$ is bounded, and $x_{n}\to \bar{x}$ weakly, and

\[(Bx_{n}-Bx, x_{n}-x)\to 0, \quad as n\to \infty\]

then one has $x=\bar{x}$; then $u^{n}$ converges weakly to u, solution of $0\in C(u^{*})$, which is unique.

1.2 Douglas-Rachford

The convergence for this algorithm occurs for more general operators A and B than the previous algorithm.

\[u^{n+1} := (I+\lambda B)^{-1}[(I+\lambda A)^{-1}(I-\lambda B) +\lambda B]u^{n}\]

The algorithm is :

\[v^{n+1} := J_{A}^{\lambda}(2J_{B}^{\lambda} - I)v^{n} + (I - J_{B}^{\lambda})v^{n}\]

Convergence Assume that $J_{B}^{\lambda}$ is weakly closed, then under the previous Assumption, $u^{n}$ converges weakly to a solution $u=J_{B}^{\lambda}v$ of $0\in C(u^{*})$. (If $J_{A}^{\lambda}$ is compact, then the convergence is strong.)

General case : in the general case, the algorithm is “almost convergent”.

1.3 Applications to evolution equations

See the original paper.

2. Monotonicity

2.1 Proximal Algorithm

paper On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators.

Monotone : monotone page Important theorem : An operator T on H is monotone if and only if its resolvent $J_{cT} = (I+ cT)^{-1}$ is firmly nonexpansive.

Recall Proximal Algorithm :

\[\bar{x} = \mathbb{prox}_{\lambda, f} (v) = \arg\min_{x} f(x) + \frac{1}{2\lambda}\|x-v\|_{2}\]

The optimal condition for $\bar{x}$ is :

\[0\in \frac{\partial}{\partial x}\mid_{\bar{x}} = \partial f(\bar{x}) + \frac{1}{\lambda}(\bar{x} - v)\] \[v\in (I+\lambda\partial f)\bar{x}\] \[\bar{x} = (I + \lambda \partial f)^{-1}(v)\]

Where $(I + \lambda \partial f)^{-1}$ is the resolvent operator, the proximal algorithm is to find the fixed point of the resolvent. And for the fixed point, we have :

\[v\in (I+\lambda\partial f)v\] \[0\in \partial f\]

which is exactly the optimal condition for optimizing an objective function f.

Generalized Proximal Algorithm : $u^{n+1}:=J_{T}^{\lambda}(u^{n})$

\[u^{n+1} := (1-\rho_{n})u^{n} + \rho_{n}J_{T}^{\lambda}(u^{n})\]

2.2 DRS

This article broden the analysis by exploiting the connection between firm nonexpansiveness and maximal monotonicity.

Recall the Douglas-Rachford splitiing algorithm $v^{n+1}= G_{A,B}^{\lambda}(v^{n})$ , where :

\[G_{A,B}^{\lambda} = J_{A}^{\lambda}(2J_{B}^{\lambda} - I) + (I - J_{B}^{\lambda})\]

Using the Corollary 2.3 from Monotone . Which is actually the expression using the notation of the paper in the previous chaper (u, v).

\[\begin{cases} (I - J_{B}^{\lambda})v^{n} = \lambda Bu^{n} = \lambda b^{n}\\ (2J_{B}^{\lambda} - I )v^{n} = u^{n} - \lambda b^{n} \\ J_{A}^{\lambda}(y^{n}) = J_{A}^{\lambda}(w^{n} + \lambda a^{n}) = w^{k} \end{cases}\]

The algorithm process is (note $v^{n} = (I+\lambda B)u^{n}$):

1. from the last step, we have $u^{n}$ and $\lambda b^{n} = \lambda Bu^{n}$.
1. find $w^{n+1}$, such that $J_{A}^{\lambda, -1} w^{n+1} =u^{n} - \lambda b^{n}$ (equivalent to $w^{n+1}+\lambda A w^{n+1} = u^{n} - \lambda Bu^{n}$). note $Aw^{n}=a^{n}$
1. find $n^{n+1}$, such that $J_{B}^{\lambda, -1}u^{n+1}=w^{n+1} + \lambda Bu^{n}$ (equivalent to $u^{n+1}+\lambda Bu^{n+1} = w^{n+1} + \lambda Bu^{n}$)

The algorithm could be expressed as :

\[G_{A,B}^{\lambda} = \{(u+\lambda b, w+\lambda b) \mid (u,b)\in B, (w,a)\in A, v+\lambda a = w-\lambda b \}\]

2.3 Splitting Operator S

Consider the operator S the splitting operator of A and B with respect to $\lambda$ :

\[S_{A,B}^{\lambda} = (G_{A,B}^{\lambda})^{-1} - 1\]

It could be expressed as :

\[S_{A,B}^{\lambda} = \{(w+\lambda b, u-w) \mid (u,b)\in B, (w,a)\in A, w+\lambda a = u-\lambda b \}\]

Theorem If A and B are (maximal) monotone then A is (maximal) monotone.

Theorem The Douglas-Rachford iteration is equivalent to applying the proximal point algorithm to the maximal monotone operator A, with the proximal point stepsize fixed at 1, and exact evaluation of resolvents.

\[v^{n+1}= G_{A,B}^{\lambda}(v^{n}) = (I + S_{A,B}^{\lambda})^{-1}(v^{n})\]

2.4 ADMM

ADMM

Considering the problem :

\[minimize_{x\in \mathcal{R}^{n}} \quad f(x) + g(Mx)\] \[\begin{align*} &minimize_{x\in \mathcal{R}^{n}} \quad f(x) + g(w) \\ &subject\ to \quad Mx = w \end{align*}\] \[maximize_{p\in \mathcal{R}^{m}} \quad -(f^{*}(-M^{T}p) + g^{*}(r))\]

Let :

\[\begin{cases} A = \partial[f^{*}(-M^{T})] \\ B = \partial g^{*} \end{cases}\]

Then apply Douglas-Rachford splitting to A and B, yield the ADMM(alternating direction method of multipliers) , shown by Babay in Applications of the method of multipliers to variational inequalities .

2.5 ECE236C

see ECE236C for a more detail description of the realization of DRS algorithm.

DR iteration as fixed-points iteration.
DRS is not invariant with respect to scaling (so we need adpating the scaling variable : $\lambda$ in the previous chapters).