TOC

Introduction

In optimization, duality refers to a pair of closely related problems, known as the primal problem and the dual problem. The primal problem is the original optimization problem that one wants to solve, while the dual problem is a different optimization problem that is derived from the primal problem.

The concept of duality is important in convex optimization because it provides useful insights into the structure of optimization problems and allows for the development of efficient algorithms for solving them. Specifically, the duality theorem states that under certain conditions, the optimal value of the primal problem is equal to the optimal value of the dual problem. This provides a way to check the optimality of a solution obtained from one problem by verifying its optimality in the other problem. Duality also allows for the development of optimization algorithms that solve the dual problem rather than the primal problem. This can be advantageous in certain situations, such as when the dual problem is easier to solve or has a simpler structure.

The Lagrangian

Let’s start the with the standard form of the optimization problem:

minimize f0(x), subject to fi(x)0,i=1..m,hi(x)=1..p,xRn. The domain D is intersection of all domains, not empty and the optimal value is p. The main idea of Lagrangian duality is to augment the constraints into the object functions weightedly. As follows:

L:RnxRmxRpR:L(x,λ,ν)=f0(x)+i=1mλifi(x)+i=1pνihi(x).

The domain L=DxRmxRp. λi,νi are called the Lagrange multipliers. The problem becomes the Lagrangian dual function: g:RmxRpR so that we find the minimum value of the Lagrangian over x: for λRm,νinRp:

g(λ,ν)=infxDL(x,λ,ν)=infxD(f0(x)+i=1mλifi(x)+i=1pνihi(x)).

The dual function will give us the lower bounds of the optimal value p of the problem. For any λ0 and ν:g(λ,ν)p. We can prove it as follows. Suppose x is a feasible point: fi(x)0,hi(x)=0,λ0. So i=1mλifi(x)+i=1pνihi(x)0. Therefore the Lagrangian L(x,λ,ν)=f0(x)+i=1mλifi(x)+i=1pνihi(x)fo(x). This makes g(λ,ν)=infxDL(x,λ,ν)L(x,λ,ν)f0(x). Since g(λ,ν)f0(x) for all feasible x, it is followed that g(λ,ν)p.

To find the best lower bound, we consider the problem: maximize g(λ,ν) subject to λ0. This is called the Lagrange dual problem. The optimal value of the Larange dual problem, d, is the best lower bound on p:dp, this equality is called weak duality. The difference pd is referred to as the optimal duality gap. If equality happens, the gap is zero, we call it the strong duality.

Strong duality does not hold in general, but if the primal problem is convex: minimize f0(x) subject to fi(x)0,i=1..m,Ax=b with f0,...fm convex, we usually have strong duality. There are many results establishing conditions on the problem, beyong convexity, to make the strong duality holds. Those conditions are called constraint qualifications. One simple constraint qualification is Slater’s condition: there exist an xrelintD such that fi(x)<0,i=1..m,Ax=b. Such a point can be called strictly feasible, since the inequality constraints hold with strict inequalities. Slater’s theorem states that strong duality holds if Slater’s condition holds and the problem is convex.

Mixed strategies in matrix games

Consider a game of two players. Player 1 makes a move k{1,...n} and player 2 makes a move l{1,...m}. Player 1 then make payment Pkl to player 2 where PRnxm is the payoff matrix for the game. The goal of player 1 is to make the payment as small as possible, while the goal of player 2 is to maximize it. The players use mixed strategies, with the following probability distribution over the choices: prob(k=i)=ui,i=1...n;prob(l=i)=vi,i=1...m. The expected payoff from player 1 to player 2 is then k=1nl=1mukvlPkl=uTPv. Player 1 would choose u to minimize uTPv while player 2 wishes to maximize it.

If player 2 knows player 1’s strategy in advance, she would try to maximize the expected payoff sup{uTPvv0,1Tv=1}=maxi=1..m(PTu)i. Player 1 choose to minimize this payoff: maxi=1..m(PTu)i subject to u0,1Tu=1. The smallest expected payoff is denoted p1.

If player 1 knows player 2’s strategy in advance, she would try to minimize uTPv:inf{uTPvu0,1Tu=1}=mini=1...n(Pv)i. Player 2 then maximize this with her v: mini=1...n(Pv)i subject to v0,1Tv=1. The optimal value of this scenario is p2.

It seems that knowing opponent’s strategy is advantegous, p1p2, but with mixed strategies, using duality, we can prove that p1=p2.

Let’s start by formulating the problem as Linear Programming: minimize t subject to u0,1Tu=1,PTut1,tR. Adding the multiplier λ for PTut1,μ for u0 and $ \nu for \textbf{1}^T u = 1 $$, the Lagrangian is:

t+λT(PTut1)μTu+ν(11Tu)=ν+(11Tλ)t+(Pλν1μ)Tu.

The dual function is g(λ,μ,ν)={ν,1Tλ=1,Pλν1=μ otherwise

The dual problem is then: maximize ν subject to λ0,1Tλ=1,μ0,Pλν1=μ. This is equivalent to: maximize ν subject to λ0,1Tλ=1,Pλν1. Those two problems become equal, LPs are feasible, we have strong duality, the optimal values of the two problems are equal.

More examples

Find the minimum and maximum of the function f0(x,y)=x+y such that f1(x,y)=x2+y2=2! The Lagragian is L(x,y,λ)=x+y+λ(x2+y22). Set the derivatives to be zero:

x,y,λL(x,y,λ)=0

This is equivalent to {1+2λx=01+2λy=0x2+y2=2 Hence, {x=y=12λλ2=14 λ=±12

And we have two roots (x,y){(1,1),(1,1)}. The max f0(x,y)=f0(1,1)=2 and the min f0(x,y)=f0(1,1)=2.

Another examples on two probability distributions and their cross entropy. As we know, the cross entropy function is to measure the similarity of two probability distributions. If the value is small the two distributions are close. Let’s take a probability distribution p=[p1,p2...pn]T,pi[0,1],i=1npi=1. For another probability distribution q=[q1,q2...qn],qi0,i:f0(q)=i=1npilog(qi) is the cross entropy function between p and q. We find q to minimize the cross entropy. With the constraint i=1nqi=1, the Lagrangian is: L(q1,q2..qn,λ)=i=1npilog(qi)+λ(i=1nqi1). To find optimum, we set the derivatives to zero:

q1,...qn,λL(q1,...qn,λ)=0

This is equivalent to:

{piqi+λ=0,i=1..nq1+q2+...+qn=1

So, pi=λqi, hence, 1=i=1npi=λi=1nqi=λλ=1qi=pi,i. The cross entropy function achieve minimum when the two probability distributions are the same.

Optimality conditions

Complementary slackness

Suppose we have strong duality, let x be a primal optimal and (λ,ν) be a dual optimal point. This means:

f0(x)=g(λ,ν) =infx(f0(x)+i=1mλifi(x)+i=1pνihi(x)) f0(x)+i=1mλifi(x)+i=1pνihi(x)) f0(x)

The first line states that the optimal duality gap is zero. The second line is the definition of the dual function. The third line follows since the infimum of the Lagrangian over x is less than or equal to its value at x=x. The last inequality follows from λi0,fi(x)0,i=1..m,hi(x)=0,i=1..p. The two inequalities in this chain hold with equality.

An important conclusion from this: i=1mλifi(x)=0, so λifi(x)=0,i=1...m. This condition is called the complementary slackness. The complementary slackness condition is expressed as: λi>0fi(x)=0 or equivalently fi(x)<0λi=0.

KKT optimality condition

Assume that the functions f0,...fm,h1,...hp are differentiable. For non convex problem, assume strong duality, x,(λ,ν) be the primal and dual optimal points. Since x minimizes L(x,λ,ν) over x, the gradient vanishes at x:

f0(x)+i=1mλifi(x)+i=1pνihi(x)=0

Thus,

fi(x)0,i=1...m hi(x)=0,i=1,...p λi0,i=1...m λifi(x)=0,i=1...m f0(x)+i=1mλifi(x)+i=1pνihi(x)=0 which are called the Karush-Kuhn-Tucker (KKT) conditions. To summarize, any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.