Está en la página 1de 9

Lecture Notes on Optimization

by Jane Ye, March 14, 2011

Notations:

• A set is any collection of items. For example R is the set of all real numbers.

• x ∈ R means x is in the set R.

• R+ = {x ∈ R : x ≥ 0} is the set of all nonnegative numbers.

• R++ = {x ∈ R : x > 0} is the set of all positive numbers.

• R2 = {(x, y) : x ∈ R, y ∈ R}.
2 = {(x, y) : x ≥ 0, y ≥ 0}.
• R+
2
• R++ = {(x, y) : x > 0, y > 0}.

• A ∪ B = {x : either x ∈ A or x ∈ B}.

• A ∩ B = {x : x ∈ A and x ∈ B}

Convex Sets

• A convex combination of two points x, x0 in Rn is a point represented by x̄ = λx +


(1 − λ)x0 for any given λ ∈ [0, 1]. All convex combinations of two points x, x0 form
the line segment jointing the two points x and x0 .

• A set X in Rn is said to be a convex set if for any two points in the set X, all its
convex combinitions are also included in the set X. In other word, a set is convex
if and only if for any given two points in the set, the line segment jointing the two
points is included in the set.

• Note that there is no such thing as a “concave” set.

Convex functions and concave functions

• A function f defined on a convex set X is concave (or so-called concave down in


calculus) if for every two points x, x0 in X and any λ ∈ [0, 1],

f (λx + (1 − λ)x0 ) ≥ λf (x) + (1 − λ)f (x0 ).

Geometrically, a function of one variable is concave if the secant line lies below the
graph of the function, or equivalently if the tangent line lies above the graph of the
function.

1
• A function f defined on a convex set X is strictly concave if for every two different
points x, x0 in X and any λ ∈ (0, 1),

f (λx + (1 − λ)x0 ) > λf (x) + (1 − λ)f (x0 ).

Geometrically, a function of one variable is strictly concave if the secant line lies
strictly below the graph of the function. If the graph of a function has a straight part
then the function can not be strictly concave.

• A function f is (strictly) convex (or so-called concave up in calculus) if and only if


−f is (strictly) concave.

• A strictly concave (convex) function must be concave (convex). But the converse may
not be true. For example the function f (x) = |x| is convex but not strictly convex;
f (x) = −|x| is concave but not strictly concave.

Level sets, better sets and worse sets

• The level set (or level curve if the function has only two variables) of a function
y = f (x) with a domain X in Rn is the set

L(c) = {x ∈ X : f (x) = c}

for some constant c. If y = f (x) is a production function, then its level set is known
as an isoquant and if it is a utility function then its level set is known as a indifference
curve.

• The better set of the point x0 ∈ Rn for function y = f (x) with a domain X in Rn is
the region
B(x0 ) = {x ∈ X : f (x) ≥ f (x0 )}.

• The worse set of the point x0 ∈ Rn for function y = f (x) with a domain X in Rn is
the region
W (x0 ) = {x ∈ X : f (x) ≤ f (x0 )}.

2
The first order differentials

• The first order total differential for a function y = f (x1 , x2 ) is

dy = f1 (x1 , x2 )dx1 + f2 (x1 , x2 )dx1 .

When f is a linear function, dy = 4y. When f is not linear, dy ≈ 4y and the


approximation is better when dx is small compared to x.

• The first order total differential can be used to calculate the slope of a level curve
for a function, say z = f (x, y). The procedure is as follows: consider the equation
f (x, y) = c where c is a constant. Take the total differential on both side of the
equation f (x, y) = c. Since c is a constant, the total differential of a constant is zero.
dy dy
So we have 0 = fx dx + fy dy. Solving dx , we have dx = − ffxy .

Quasiconvexity and quasiconcavity of functions

• A function f with domain X in Rn is quasiconcave if for every point x in X, the better


set is convex.

• A function f with domain X in Rn is quasiconvex if for every point x in X, the worse


set is convex.

• A concave (convex) function must be quasiconcave (quasiconvex) but the converse


may not be true. A bell-shaped function opening down (opening up) is quasiconcave
(quasiconvex) but not concave (convex).

• A monotone function is both quasiconvex and quasiconcave.

Quadratic Forms:

• Given an n × n matrix A and a n × 1 vector x, the scalar-valued function

q(x) = x> Ax

is a quadratic form. There is no loss of generality in assuming that the matrix which
generates the form is symmetric, since the matrix A∗ with elements
1
a∗ij = (aij + aji )
2
is symmetric and has the same quadratic form as A.

3
• Given a matrix A, one can write the associated quadratic form q(x) = x> Ax and
conversely given a quadratic function q(x), one can find a symmetric matrix A such
that q(x) is the quadratic form associated with matrix A.

Signs of quadratic forms: Let A be a n × n symmetric matrix and x be a vector of


n × 1.

• if
q(x) = x> Ax > 0(< 0) for all x 6= 0,

then q(x) is said to be a positive (negative) definite quadratic form and A is said to
be a a positive (negative) definite matrix.

• if
q(x) = x> Ax ≥ 0(≤ 0) for all x,

then q(x) is said to be a positive (negative) semidefinite quadratic form and A is said
to be a a positive (negative) semidefinite matrix.

• if q(x) is neither a positive semidefinite nor a negative semidefinite quadratic form,


then q(x) is said to be an indefinite quadratic form and A is said to be an indefinite
matrix. Equivalently if there exist two vectors x 6= y such that x> Ax and y > Ay have
opposite signs, then A is said to be indefinite.

It is obvious if x is the zero vector then the quadratic form q(x) = x> Ax = 0. The
quadratic form is positive definite if and only if the quadratic form as a function is strictly
convex and has a unique minimum at the origin. The shape of the surface is like a bowl.
The quadratic form is negative definite if and only if the quadratic form as a function is
strictly concave and has a unique maximum at the origin. The shape of the surface is like
a dome. If the quadratic form is positive (negative) semidefinite then the quadratic form
as a function is convex (concave) and has a minimum (maximum) at the origin but the
extremum may not be unique. The shape of the surface may look like a sheet of paper
rolled upwards (downwards). If the quadratic form is indefinite, then the quadratic form as
a function is neither convex nor concave. The shape of the surface is like a saddle.

4
Tests for definiteness in terms of determinants:
Let An×n be a sysmmetric matrix of size n. A leading principal submatrix of order k is
obtained by deleting the last n − k rows and columns. Let Ak denote the kth order leading
principal submatrix. Then the determinant |Ak | is called the kth order leading principle
minor.

(a) A is positive definite if and only if |Ak | > 0 for all k = 1, 2, . . . , n;

(b) A is negative definite if and only if |A1 | < 0, |A2 | > 0, |A3 | < 0, · · ·.

(c) if |A1 | > 0, . . . , |An−1 | > 0, |An | = 0, then A is positive semidefinite.

(d) if |A1 | < 0, |A2 | > 0, · · · , (−1)n−1 |An−1 | > 0, |An | = 0, then A is negative semidefinite.

(e) if aii ajj − a2ij < 0 for some i 6= j, then A is indefinite.

Note that (c)-(e) above are not given in the textbook.

For example, if n = 3 and


 
a
 11
a12 a13 
A =  a12 a22 a23 
 
 
a113 a23 a33

is a symmetric matrix. Then


|A1 | = |a
11 | = a11 is the leading principle minor of order 1.
a11 a12

|A2 | = 2
= a11 a22 − a12 is the leading principle minor of order 2.
a12 a22
|A1 | = |A| is the leading principle minor of order 3.

• A is positive definite if and only if |A1 | > 0, |A2 | > 0, |A3 | > 0.

• A is negative definite if and only if |A1 | < 0, |A2 | > 0, |A3 | < 0.

• if |A1 | > 0, |A2 | > 0, |A3 | = 0, then A is positive semidefinite.

• if |A1 | < 0, |A2 | > 0, |A3 | = 0, then A is negative semidefinite.

• if either a11 a22 − a212 < 0 or a11 a33 − a213 < 0 or a22 a33 − a223 < 0, then A is indefinite.

5
For a function of n variables y = f (x), x ∈ Rn , the first order differential is

df (x) = f1 (x)dx1 + . . . + fn (x)dxn .

Using the gradient notation we can now express this as

df (x) = ∇f (x)T dx,

where ∇f (x) is the gradient vector and dx is the vector dx = (dx1 , dx2 , . . . , dxn ). The
second order total differential is

d2 f (x) = (dx)> ∇2 f (x)dx,

where ∇2 f (x) = (fij (x)) is the Hessian matrix of f .

Tests of Concavity and Convexity of functions by the signs of the second


order total differential

• A twice continuously differentiable function f (x) is convex (concave ) if and only if


d2 f (x) ≥ 0(≤ 0) for all x and all dx;

• if d2 f (x) > 0(< 0) for all x and all dx 6= 0, then f is strictly convex (concave).

Note that the second order total differential is the quadratic form of the Hessian matrix.
Hence testing the definiteness of the Hessian matrix is equivalent to testing the signs of the
second order total differential. Hence we can use the Hessian matrix to test the convexity
and concavity as we did in using the second derivatives to test the concavity in calculus
of one variable. Now for functions of more than one variables, we use the Hessian matrix
to replace the second order derivative. Indeed, when n = 1 the Hessian matrix of f is the
second derivative of f . So it is natural that we have the following test:

Tests of convexity and concavity by definiteness of the Hessian matrix:


For a twice continuously differentiable function f (x),

• f is convex (concave) if and only if the Hessian matrix ∇2 f (x) is positive (negative)
semidefinite for all x

• If the Hessian matrix ∇2 f (x) is positive (negative) definite for all x, then f is strictly
convex (concave).

6
Tests of quasiconvexity and quasiconcavity by bordered Hessian:
The bordered Hessian of y = f (x1 , x2 ) is
 

0 f1 f2 
H̄ =  f1 f11 f12 
 
 
f2 f12 f22

It can be verified easily that

det(H̄) = −f11 f22 + 2f1 f2 f12 − f12 f22 .

det(H̄) > 0 for all x =⇒ f (x1 , x2 ) is quasiconcave

det(H̄) < 0 for all x =⇒ f (x1 , x2 ) is quasiconvex.

Curvature properties of the Cobb-Douglas function


For any A > 0, α > 0, β > 0, the Cobb-Douglas function y = Axα1 xβ2 is quasiconcave on
x1 > 0, x2 > 0. It is strictly concave if and only if α + β < 1. When α + β = 1, it is concave
but not strictly concave. If α + β > 1, it is quasiconcave but not concave.

FOC and SOSC for unconstrained optimization problems


Assume that f is twice continuously differentiable.

• Stationary point: If ∇f (x∗ ) = 0, then x∗ is called a stationary point.

• First order condition for optimality (FOC): A local extremum must be a stationary
point.

• Second order sufficient condition for global extremum (SOSC): Suppose that x∗ is a
stationary point, i.e., ∇f (x∗ ) = 0. If the Hessian matrix ∇2 f (x) is negative semidefi-
nite for ALL x, then f is concave and x∗ is a global maximizer; If the Hessian matrix
∇2 f (x) is negative definite for ALL x then f is strictly concave and x∗ is the unique
global maximizer. If the Hessian matrix ∇2 f (x) is positive semidefinite for all x, then
f is convex and x∗ is a global minimizer. If the Hessian matrix ∇2 f (x) is positive
definite for all x then f is strictly convex and x∗ is the unique global minimizer.

• Second order sufficient condition for local extremum (SOSC): Suppose that x∗ is a
stationary point, i.e., ∇f (x∗ ) = 0. If the Hessian matrix ∇2 f (x∗ ) is negative (positive)
definite, then f is concave (convex) around x∗ and x∗ is a local maximizer (minimizer).
If the Hessian matrix ∇2 f (x∗ ) is indefinite, then x∗ is a saddle point (i.e. neither a
maximum nor minimum).

7
FOC and SOSC for constrained optimization problems
Consider the constrained optimization problem

(P ) max(min) f (x1 , x2 )

s.t. g(x1 , x2 ) = 0.

The Lagrange function for (P) is the function

L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ).

• Critical points (or a stationary point): A solution (x∗1 , x∗2 ) to the following system of
equations is called a critical point and λ∗ is called a Lagrange multiplier.
∂L
= f1 (x∗1 , x∗2 ) + λ∗ g1 (x∗1 , x∗2 ) = 0
∂x1
∂L
= f2 (x∗1 , x∗2 ) + λ∗ g2 (x∗1 , x∗2 ) = 0
∂x2
∂L
= g(x∗1 , x∗2 ) = 0.
∂λ
• First order necessary condition for constrained optimization (FOC): If (x∗1 , x∗2 ) is a
local extremum of the problem (P) and ∇g(x∗, x∗2 ) 6= 0, then (x∗1 , x∗2 ) is a critical pint.

• Second order sufficient condition (SOSC): The Hessian matrix of the Lagrange func-
tion is
 
L
 11
L12 g1 
H(x1 , x2 , λ) =  L12 L22 g2 
 
 
g1 g2 0
It varies with (x1 , x2 , λ). If we fix (x∗1 , x∗2 , λ∗ ), then we denote
 
L∗ L∗12 g1∗ 
 11
detH ∗ = detH(x∗1 , x∗2 , λ∗ ) = det  L∗12 L∗22 g2∗  = 2g1∗ g2∗ L∗12 − g1∗ 2 L∗22 − g2∗ 2 L∗11
 
 
g1∗ g2∗ 0
where L∗11 := f11 (x∗1 , x∗2 ) + λ∗ g11 (x∗1 , x∗2 ) etc. Assume that (x∗1 , x∗2 , λ∗ ) satisfies the
FOCs. Since the restricted second order total differential is defined by
  
L∗11 L∗12 g2∗
“d2 L∗ ” = [g2∗ − g1∗ ]   ,
L∗12 L∗22 −g1∗
we have “d2 L∗ ” = − det(H ∗ ) (i.e. the signs of the determinate of the matrix H ∗ and
“d2 L∗ ” are opposite). Therefore

det(H ∗ ) > 0 for all x =⇒ (x∗1 , x∗2 ) yields a local maximum

det(H ∗ ) < 0 for all x =⇒ (x∗1 , x∗2 ) yields a local minimum.

8
The meaning of Lagrange multiplier—shadow price
Consider the parametric constrained optimization problem

P (α) max(min) f (x1 , x2 )

s.t. g(x1 , x2 ) = α,

where α is a parameter.
The Lagrange function for P(α) is the function

L(x1 , x2 , λ) = f (x1 , x2 ) + λ(α − g(x1 , x2 )).

Suppose that we can solve the problem P(α) and find a solution (x∗1 (α), x∗2 (α)) with La-
dx∗1 dx∗1
grange multiplier λ∗ (α) and assme that dα , dα exist and are differentiable. Then the value
function V (α) = f (x∗1 (α), x∗2 (α)) is differentiable with the derivative equal to the Lagrange
multiplier, i.e.,
V 0 (α) = λ∗ (α).

Note that this is a very useful formula since one does not need to find the value function in
order to find the derivative of the value function. One can get this information free from
solving the optimization problem.
In particular let α0 be a fixed number. Then λ∗ (α0 ), the Lagrange multiplier for problem
P(α0 ) measures the rate of change of the maximum (minimum) value of the objective
function when α changes from α0 (i.e. when constraint g(x1 , x2 ) = α0 is relaxed or tighten
slightly). For this reason, λ∗ (α) is referred to as the “shadow price” of α.
In the consumer problem (with p1 , p2 given):

max u(x1 , x2 )

s.t. p1 x1 + p2 x2 = m.

The value function V (m) is called the indirect utility function and the Lagrange multiplier
λ∗ (m0 ) is the rate of the change of the optimal utility when the budget is tighten or relaxed
slightly from m0 . It is “the marginal utility of income”.
In the expenditure minimization problem (with p1 , p2 given)

min p1 x1 + p2 x2

s.t. u(x1 , x2 ) = ū.

The value function E(ū) is called the expenditure function and the Lagrange multiplier
λ∗ (u0 ) is the rate of the change of the cost when the utility constraint is tighten or relaxed
slightly from ū = u0 . It is “the marginal cost of utility”.

También podría gustarte