Back To ADMM Home.

It to find a model best fit the measurements. It normally has a observation error term (‘l’ for loss), and a regularization term (r). l and r are chosen to be convex. As the following shows a problem to fit a linear model:

minimizel(Axb)+r(x)

Assume l is additive:

l(Axb)=mi=1li(aTixbi)

li is the loss of ith training example. ai is the feature vector of example i (the ith system input), and bi is the ouput(response) of the example i (the ith observation).

  • r choose l2 : r(x)=λ|x|22, is the tikhonov regularization or a ridge penalty.
  • r choose l1 : r(x)=λ|x|1 is a lasso penalty.
  • In some case, part of the parameters should not be regularized (e.g. offset parameters).

  • Split across training examples.
  • Split across features.

8.1 Examples

  • Regression
  • Classification
  • Image segmentation, denoise, decomposition.

8.2 Splitting across examples

Large amount of relatively low-dimensional data. Goal: solve in a distributed way. Partition A and b (example inputs and measurements):

A=[A1:AN],b=[b1:bN],

The problem will be :

minimizeNi=1li(Aixbi)+r(x)

Reform the problem into consensus form to enable distributed calculation (turn into a standard ADMM type of problem):

minimizeNi=1li(Aixibi)+r(z)subject toxiz=0, i=1,...,N

Using the scaled form of ADMM updates (see 7.3 sharing problems for more details):

xk+1i:=argminxi(li(Aixibi)+(ρ/2)xizk+uki22)zk+1:=argminz(r(z)+(Nρ/2)ˉxk+1z+ˉuk22)uk+1i:=uki+xk+1izk+1
  • Lasso
  • Sparse Logistic Regression
  • Support Vector Machine

8.2.1 Group Lasso

For Lasso have the function f being squared l2 norm, and r being the l1 norm. Then the ADMM udpates are:

xk+1i:=argminxi((1/2)Aixibi22+(ρ/2)xizk+uki22)zk+1:=Sλ/ρN(ˉxk+1+ˉuk)uk+1i:=uki+xk+1izk+1

See here for some details about the update of x. The difference from the serial version is that :

  • The update of different group of variables xi could be carry out in parallel.
  • The collection and averaging steps.

8.2.2 Distributed l_1-regularized logistic regression

Could be compared with the serial version. Could be seen function and Script using L-BFGS for distributed calculations.

8.2.3 Support Vector Machine

Here we model a linear support vector machine problem, which is a linear model fitting problem. Which is to find a best linear model applied to feature variables x (wTxj+b) to best fit the observation y (yj), where y is a binary variable.

Which is to say, if the observation yj is 1, we want wTxj+b1 and if the observation yj is -1, we want wTxj+b1. Which is a optimization problem :

minimizeMj=1(1yj(wTxj+b))

In partice, we can truncate the results of the model to 1 or -1, so the problem will be better if we optimize this:

minimizeMj=1(1yj(wTxj+b))+

Where we have M obervations in total. The problem is equivalent to :

minimizeMj=1(1+[yjxTjyj][wb])+

By forming :

A=[y1xT1y1::yMxTMyM],x=[wb],

We have the reformed problem:

minimize(Ax+1)+

Adding the regularization term of the linear model weights w:

minimize(Ax+1)++(1/2λ)w22

If we apply the distributed model where i indicates a sub-set of samples, we have :

minimizeNi=1(Aix+1)++(1/2λ)w22

Applying the consensus variable z :

minimizeNi=1(Aixi+1)++(1/2λ)w22subject toxi=z

We further simplify the problem with a small adjustment in the regularization term : instead of regularize w we will regularize z directly. Then we will have the problem:

minimize1T(Aixi+1)++(1/2λ)z22subject toxi=z

The corresponding ADMM updates are :

xk+1i:=argminxi(1T(Aixi+1)++(ρ/2)xizk+uki22)zk+1:=argminz((1/2λ)z22+Ni=1(ρ/2)xk+1iz+uki22)uk+1i:=uki+xk+1izk+1

The update of x will be solved by another optimization problem:

  cvx_begin
      variable x_var(n)
      minimize ( sum(pos(A{i}*x_var + 1)) + rho/2*sum_square(x_var - z(:,i) + u(:,i)) )
  cvx_end

The update of z is simple, using the first order optimal condition we have :

(1/λ)zk+1+Ni=1(ρ(xk+1izk+1+uki))=0 zk+1=ρN(1/λ)+Nρ(ˉxk+1+ˉuk)

Then, we get the final updates of ADMM of linear SVM :

xk+1i:=argminxi(1T(Aixi+1)++(ρ/2)xizk+uki22)zk+1:=ρN(1/λ)+Nρ(ˉxk+1+ˉuk)uk+1i:=uki+xk+1izk+1

Code and Script could be found in ADMM Stanford page (A distributed version but solved in serial).

8.3 Splitting across Features

Model fitting problems with a modest number of examples and a large number of features.

  • NLP(natural language processing) : pairs of adjucent words (bigrams), etc.
  • Bioinformatics: DNA mutation, etc.

Partition of the parameter vector x as x=(x1,,xN), and A as A=[A1,,AN], the problem will be:

minimizel(Ni=1Aixib)+Ni=1ri(xi)

Reform into consensus problem:

minimizel(Ni=1zib)+Ni=1ri(xi)subject toAixizi=0, i=1,...,N

The corresponding scaled form of ADMM is :

xk+1i:=argminxi(ri(xi)+(ρ/2)Aixizki+uki22)zk+1:=argminz(l(Ni=1zib)+Ni=1(ρ/2)Aixk+1izi+uki22)uk+1i:=uki+Aixk+1izk+1i

8.3.1 Group Lasso

Code and Script.

Back To ADMM Home.