7.1 Global Variable Consensus
Corresponding to the distribution across samples in the next chapter.
Consider the problem with one single objective function, but can be distributed into several parts:
minimizef(x)=N∑i=1fi(x)We can always extend the definition of the function fi to be plus infinity if x is out of its domain. The advantage of ADMM is that each split of the objective function could be implemented into different processors, such that the algorithm could run in parallex.
Then we can rewrite the problem into a global consensus problem (since that the constraints are that all the local variables should agree) :
minimizeN∑i=1fi(xi)subject toxi=z, i=1,...,NThe augmented Lagrangian is :
L(x,z,y)=N∑i=1fi(xi)+yTi(xi−z)+(ρ/2)‖xi−z‖22)The updates of ADMM will be :
xk+1i:=argminxi(fi(xi)+ykTi(xi−zk)+(ρ/2)‖xi−zk‖22)zk+1:=argminzN∑i=1(ykTi(xk+1i−z)+(ρ/2)‖xk+1i−z‖22)yk+1i:=yki+ρ(xk+1i−zk+1)The update of z could be further simplified by :
zk+1=argminzN∑i=1(ykTi(xk+1i−z)+(ρ/2)‖xk+1i−z‖22)=argminzN∑i=1−(yki+ρxk+1i)Tz+(ρ/2)NzTz zk+1=(1/N)N∑i=1(xk+1i+(1/ρ)yki)we can further simpliy it by introcuding the notation of averages:
zk+1:=ˉxk+1+(1/ρ)ˉyk ˉyk+1:=ˉyki+ρ(ˉxk+1−zk+1)=0So we have :
zk+1:=ˉxk+1Finally we have the ADMM updates:
xk+1i:=argminxi(fi(xi)+ykTi(xi−ˉxk)+(ρ/2)‖xi−ˉxk‖22)zk+1=ˉxk+1yk+1i:=yki+ρ(xk+1i−ˉxk+1)7.2 Sharing
Corresponding to the distribution across features in the next chapter.
minimizeN∑i=1fi(xi)+g(N∑i=1xi)In the next chapter, we will have function f corresponding to the regularization term, and g corresponding to the objective function splitted by feature subsets.
The ADMM form the equivalent problem is, where each subset corresponding to a subset of features:
minimizeN∑i=1fi(xi)+g(N∑i=1zi)subject toxi=zi, i=1,...,NThe scaled form of ADMM is :
xk+1i:=argminxi(fi(xi)+(ρ/2)‖xi−zki+uki‖22)zk+1:=argminz(g(N∑i=1zi)+(ρ/2)N∑i=1‖xk+1i−zi+uki‖22)uk+1i:=uki+xk+1i−zk+1iWe can further simplify the z update by the problem:
minimizeg(Nˉz)+(ρ/2)N∑i=1‖zi−ai‖22subject toˉz=(1/N)N∑i=1ziWhere ai=uki+xk+1i, minimize the objective function while fixing the ˉz, we have the lagrangian is:
L(zi,λ)=g(Nˉz)+(ρ/2)N∑i=1‖zi−ai‖22+λT(ˉz−(1/N)N∑i=1zi)From the first order optimal condition we have :
ρ(zi−ai)−λ/N=0With the dual function d being:
d(λ)=g(Nˉz)+Nρ/2‖λρN‖22+λT(ˉz−(1/N)N∑i=1(ai+λρN))Using the first order optimal condition of the dual function :
(ˉz−ˉa)−λρN=0Finally, we have :
zi=ai+(ˉz−ˉa)To solve ˉz, we solve :
minimizeg(Nˉz)+(ρ/2)N∑i=1‖ˉz−ˉa‖22=g(Nˉz)+(Nρ/2)‖ˉz−ˉa‖22And applying the update of zi into the udpate of x , we have the ADMM udpates expression:
xk+1i:=argminxi(fi(xi)+(ρ/2)‖xi−xki+ˉxk−ˉzk+ˉuk‖22)ˉzk+1:=argminˉz(g(Nˉz)+(Nρ/2)‖ˉz−ˉuk−ˉxk+1‖22)ˉuk+1:=ˉuk+ˉxk+1−ˉzk+1