Está en la página 1de 70

Mathematical Methods of Physics

N. Read
0. SOME GENERAL NOTATION
A set S is a collection of distinct objects (elements) in no particular order: S =
a, b, c, . . .. Say a S if a is an element of S. If S has no elements, then it is the
empty set, S = . Most mathematical objects can be viewed as sets with some additional
structure.
Say S
1
is a subset of S
2
, S
1
S
2
, if every element of S
1
is an element of S
2
(S
1
may be
equal to S
2
). Some authors write for , but I will try to use S
1
S
2
only for S
1
S
2
and S
1
,= S
2
. (As always, ask if it is not clear or if you think I made a mistake.)
Intersection and union: intersection S
1
S
2
= set of elements in both S
1
and S
2
. Union
S
1
S
2
of S
1
and S
2
is the set of elements that are in either S
1
or S
2
(or both, in which
case such an element, as always, appears only once in the union). Sometimes we also use
the complement in S of a subset S
1
S; it is written as S
1
or S S
1
, and is dened as the
set of elements of S that are not in S
1
.
Given two sets, a function or map f : S
1
S
2
from S
1
to S
2
is dened for every element
of S
1
, that is for every a S
1
there is an element f(a) in S
2
. S
1
is called the domain, S
2
the codomain of f. The image set of f, written f(S
1
), is the set of all elements b in S
2
such
that there is an element a S
1
such that f(a) = b. Thus f(S
1
) S
2
.
A function f is called one to one, or one-one, or injective, if f(a) = f(b) only if a = b. It
is called onto, or surjective, if for all c S
2
there is an a S
1
such that c = f(a). If f is
both injective and surjective, then it is called bijective, or said to be a bijection. Then the
inverse function f
1
: S
2
S
1
exists, with the obvious denition, and is also bijective. (If
this fact is not clear to you, construct a proof.)
Notation: [S[ = the number of elements in set S.
Moving from set theory to logic, we will use abbreviations for for alland for there
exists. So for example, f is onto if c S
2
, a S
1
such that f(a) = c. (In fact, it is if
and only if, but one usually omits and only if in a denition.)
Further I sometimes use for (logical) NOT (negation), for AND, for OR. Also
for logically implies, as in p q, or p implies q, where p, q are statements. It can also
be read as q follows from p, p only if q, p is sucient for q, or q is necessary for p.
1
means if and only if (or i), or is logically equivalent to, that is p is necessary and
sucient for q.
Note that (p q) (p q) (proof by truth table; note that in expressions like
p q the acts only on the symbol immediately to its right, in this case p, not on p q).
Further, rather than proving p q, it is equivalent, and often more convenient, to prove
the contrapositive q p, because (p q) (q p). [This is not proof by
contradiction, which rather is proving (p q) F (false). This is logically equivalent
also, but more dicult to do correctly.]
I. LINEAR ALGEBRA
A. Vector spaces
Let R stand for the eld of real numbers, and C that of complex numbers. Both sets
are endowed with their usual arithmetic operations. Let K stand for either R or C.
A real or complex vector space V is a set of elements called vectors, with the following
additional structures:
1) any vector can be multiplied by a scalar (in K) to get another vector in V , that is if
v V , K, then there exists a vector v V . That is, we really have a function from
the set of all pairs (, v) to V .
2) any two vectors v
1
, v
2
V can be added to obtain another vector v
1
+ v
2
V . Again,
this is a function from pairs of vectors to vectors.
3) a vector 0 V , such that 0 +v = v for all v in V . For any v V , a vector v such
that v +v = 0.
4) These structures obey various properties: v
1
+v
2
= v
2
+v
1
for all v
1
, v
2
V (commuta-
tivity of addition). Also associativity of addition, distributivity of multiplication of a vector
by a scalar over vector addition and over scalar addition, and compatibility of multiplication
in K and into vectors:
1
(
2
v) = (
1

2
)v. Finally, for 0 K and 1 K (with their usual
meanings), 1v = v, 0v = 0 for all v.
2
Example: Column vectors, containing n elements of K,
a =
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1
a
2

a
n
_
_
_
_
_
_
_
_
_
_
_
_
_
. (1)
When the vector space operations are dened by 1) scalars are multiplied into all the entries
simultaneously, and 2) vectors are added by adding the corresponding elements, these can
be shown to obey all the axioms for a vector space.
Denition: A set of n vectors (n > 0) v
1
, . . . , v
n
in a vector space V is linearly dependent
if scalars
1
, . . . ,
n
s.t.

1
v
1
+ +
n
v
n
= 0, (2)
with
1
, . . . ,
n
not all zero. They are called linearly independent if no such set of scalars
exists (i.e. if such linear combinations are non-zero except when all
i
are zero). Note that
if any v
i
is zero, then the set is automatically linearly dependent.
Idea: using a linearly-independent set of vectors v
1
, . . . , v
n
, write all vectors in V as
linear combinations
v = a
1
v
1
+ + a
n
v
n
.. (3)
For any set of vectors v
1
, . . . , v
n
, the set of all linear combinations of the form (3) forms
a subspace of V . A subspace of a vector space is a subset of the vectors that is itself a vector
space (with the same eld of scalars K).
If the set of vectors v
1
, . . . , v
n
is not linearly independent, then a linear combination
representing v is not unique: we can add to it a linear combination that equals the zero
vector, giving the same v. So we want the representation of v as a linear combination to
1) exist for all v, so we need a large enough set of vectors (such a set is said to be complete);
2) be unique, that is a
1
, . . . , a
n
should be unique, so we need a linearly-independent set.
Denition: a basis is a complete linearly-independent set of vectors v
1
, . . . , v
n
. The
components of a vector v are the coecients a
i
in its representation as in eq. (3).
Note: from the equations, it appears that we have specied that a basis is a nite set of
vectors. (By the way, by nite I always mean not innite, and not the appalling, but
common, physics usage for non-zero.) It is possible to use similar denitions for innite
3
basis sets, but a more fruitful approach is to consider Hilbert spaces, which we will touch
on later. Here we will simply say that the number of vectors in a basis is if no nite
basis set exists (that is, if there exist innite sets of linearly-independent vectors). Thus the
cardinality of the basis set is the number n = 0, 1, 2, . . . , if a nite basis set exists, and
otherwise.
Theorem: Any two bases for V have the same cardinality (in the sense of the preceding
denition). This number is called the dimension of V .
Proof: Easyshow that if two bases have dierent cardinality, then either linear inde-
pendence or completeness is violated, a contradiction. Simply write elements of one basis as
linear combinations of the elements of the other, use linear independence and completeness,
etc. QED.
In the remainder of this section, we consider only nite-dimensional vector spaces.
Note that if dimV = n, then any set of n + 1 vectors is linearly dependent. Finally, by
using a basis, any vector space is equivalent to the space of column vectors, in which the
entries in the columns are the coecients in the basis. This relation is not unique, in the
sense that the choice of a basis is not uniquethere are innitely many bases for V , in fact
(for n > 0).
I want to emphasize that nowhere so far have we used a scalar or inner product of
vectors, nor the related notion of orthogonality.
B. Linear maps of vector spaces
Our next goal is to formulate the natural version of a function for use with vector spaces.
These are the linear maps. A linear map (or linear transformation) between vector spaces
V
1
, V
2
(both over the same scalars K) is a function T : V
1
V
2
that obeys T(
1
v
1
+
2
v
2
) =

1
Tv
1
+
2
Tv
2
,
1
,
2
K, and v
1
, v
2
V
1
. Note we write the map on the left, as
with functions, but usually drop the bracket, so T(v) becomes Tv V
2
for v V
1
. Notice
how the denition utilizes the vector space structure in V
1
and V
2
.
If we use bases v
1
, . . . , v
n
for V
1
, and w
1
, . . . , w
m
for V
2
, then Tv
i
is in V
2
for each i,
and can be expressed as a linear combination of basis elements, giving
Tv
j
=

j
w
i
T
ij
. (4)
4
Then by writing any vector in V
1
as a column vector (using the basis),
(v) =
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1
a
2

a
n
_
_
_
_
_
_
_
_
_
_
_
_
_
, (5)
we can do the same for its image under T, which is represented by the column vector
(Tv) =
_
_
_
_
_
_
_
_
_
_
_
_
_
T
11
T
1n

T
m1
T
mn
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1
a
2

a
n
_
_
_
_
_
_
_
_
_
_
_
_
_
, (6)
in which the usual way of multiplying a matrix (here mn) into a column vector to obtain
a column vector (with m entries) is understood. Thus,
a matrix can represent a linear map between vector spaces
by using bases. This is the active view of a matrix. For a given linear map T, its
matrix elements depend on the basis chosen. If V
1
= V
2
, naturally we might choose to use
the same basis for both purposes (i.e. for rows and columns).
Let us point out that the set of all linear maps from V
1
to V
2
(or of the corresponding
m n matrices) is itself a vector space. Given two maps T
1
, T
2
: V
1
V
2
, and scalars
1
,

2
, we can dene
1
T
1
+
2
T
2
by
(
1
T
1
+
2
T
2
)v =
1
(T
1
v +
2
(T
2
v), (7)
and
1
T
1
+
2
T
2
is easily seen to be linear. The dimension of this space is mn. The (vector)
space of all m n matrices with elements in K is sometimes denoted M
mn
(K), or by
M
n
(K) when m = n.
A basis change can also be represented by a (square) matrix. Suppose that v
1
, . . . , v
n
and w
1
, . . . , w
n
are two bases for the same space V . Then there is a unique set of scalars

ij
(why?) such that
w
1
=
11
v
1
+ +
n1
v
n
,
5

w
n
=
1n
v
1
+ +
nn
v
n
. (8)
If we (formally) place the basis vectors as elements into column vectors, then this is a matrix
equation (it contains
T
, the transpose of ). For the components of a vector, however, if
v = a
1
v
1
+ + a
n
v
n
= b
1
w
1
+ + b
n
w
n
, then
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1

a
n
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_

11

1n

n1

nn
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
b
1

b
n
_
_
_
_
_
_
_
_
_
_
_
_
_
, (9)
and this matrix is the transpose of that we obtained for the change of basis (notice that
(
T
)
T
= ). However, note that here the a
i
s appear on the left hand side, whereas the
corresponding vectors v
i
appeared on the right hand side, so maybe it is not so surprising.
This view of a matrix is passive, just a change of basis.
A square matrix can represent either a linear map of V to itself (active view), or if it is
invertible, a change of basis (passive view).
A note on changing basis on a linear map: suppose that T : V V is a linear map. If
we have two bases for the same space V , v
1
, . . . , v
n
, and w
1
, . . . , w
n
, as before, and T is
given by the matrix (T) with respect to the v
i
basis (as before), and (T

) with respect to
the w
i
basis, then these matrices are related by
(T) = ()(T

)()
1
, (10)
where ()
1
is the inverse matrix of (): ()()
1
= I, the identity matrix. This inverse
matrix always exists. The important point in this equation is that the matrices on the two
sides of (T

) are inverses of each other. (Such a relation is sometimes called conjugation


by , or said to be a similarity transformation.)
More generally, if T : V
1
V
2
, then we can change basis on V
1
and simultaneously
on V
2
by matrices
1
and
2
similarly, and then the transformation of A becomes (T) =
6
(
2
)(T

)(
1
)
1
. (Of course, (
1
) or (
2
) could be the identity matrix, representing no change
of basis.)
The notion of the inverse map or matrix will now be discussed.
Denition: The identity map id : V V is dened by idv = v v V . In a basis, it is
represented by the identity or unit matrix 1 or I.
Denition: A linear map T : V
1
V
2
is an isomorphism if an inverse linear map T
1
:
V
2
V
1
exists, with T
1
T = id on V
1
, and TT
1
= id on V
2
. Note that such a map is a
bijection, and the inverse is unique if it exists. For nite-dimensional matrices, a left inverse
obeying T
1
T = id is also a right inverse TT
1
= id, and vice versa.
We say a matrix is invertible if an inverse matrix exists, and that spaces are isomorphic
if there exists an isomorphism between them.
Theorem: Two vector nite-dimensional vector spaces over the same eld K are iso-
morphic if and only if they are of the same dimension. A nite-dimensional and an innite-
dimensional vector space are never isomorphic.
Proof: Using the usual bases for V
1
, V
2
, if they have the same nite dimension m = n,
dene T by Tv
i
= w
i
, and for other vectors using linearity. Then it is clear what T
1
is. If
the dimensions are dierent, suppose without loss of generality that n < m, and dene T
the same way as before. Then the image of T is a subspace of V
2
, and does not contain w
m
.
That is, T is not surjective, and no inverse can exist. The same is true whatever we dene
the images of v
i
to be. QED
The isomorphism is not unique, as we may see by repeating the construction using dif-
ferent bases for V
2
.
Going further, suppose we have linear maps A : V
1
V
2
, B : V
2
V
3
, then we can
dene a composite map BA : V
1
V
3
by (BA)v = B(Av) for all v V
1
. Using bases, the
composite is given by matrix multiplication BA, or in index notation by
(BA)
ij
=

k
B
ik
A
kj
. (11)
The process can be repeated again, so if C : V
3
V
4
, we can form for example C(BA), and
then verify that
C(BA) = (CB)A (12)
(associativity of composition of linear maps, or of matrix multiplication).
7
When applied to A : V V , we are allowed to form powers of A by matrix multiplication,
such as A
2
, A
3
= A
2
A = AA
2
, A
4
, . . . . This means that we can also form polynomials of A,
such as x
1
I +x
2
A+ +x
m
A
m
, and even innite series provided we consider convergence.
For example, we can dene the exponential of the (map or matrix) A (or tA, for t a scalar),
which is itself a map or matrix, by
exp tA =

r=0
t
r
A
r
r!
. (13)
In this example the series converges for any t and (nite dimensional) A, just like the
exponential series of a real or complex number does. Note this approach to the denition of
functions of a map or matrix from V to V does not require that the matrix be diagonalizable
(a concept we will discuss later). But it only applies to functions that can be dened by a
power series; this will also be discussed later.
C. General theory of linear maps
Physical applications of matrices are numerous and well-known. A typical problem is to
solve a linear system of equations, like
a
11
x
x
+ a
12
x
2
+ . . . + a
1n
x
n
= y
1
,

a
m1
x
1
+ a
m2
x
2
+ + a
mn
x
n
= y
m
, (14)
which needs to be solved for x
j
, where the sets of a
ij
and y
i
are assumed to be known.
Questions immediately arise: does a solution for x
j
even exist? If so, is it unique? How do
we nd solutions eciently in practice?
Using index notation we write this as

k
a
ij
x
j
= y
i
, and recognize it as of the form
Ax = y for vectors x, y and a linear map A; A is a map from V
1
to V
2
, where dimV
1
= n,
dimV
2
= m, in which x and y (respectively) live.
Ideally, we would like to nd an inverse A
1
so that the solution is simply
x = A
1
y, (15)
8
exactly as we would solve the simple algebraic equation ax = y for x. The diculty is that
A
1
or x may not exist (as in the case of the algebraic equation when a = 0) or may not be
unique. Further, calculating the solution is harder than in the case of x = y/a.
Actually, given a particular y, we dont really need A
1
; that would give us the solution
for any y. It may be that solution(s) for x exist for some vectors y and not for others, so we
will need to distinguish between these cases, and solve the cases that can be solved. When
this occurs, there is no inverse A
1
.
This discussion can be repeated for dierential operators D in place of a map A (i.e.
to solve linear dierential equations), in which case the inverse D
1
is called a Greens
function. This will be taken up later, but keep in mind that a Greens function is nothing
but an inverse operator (and operators are just linear maps that act on functions instead of
on vectors).
To address these issues, we rst introduce some more terminology. Like any function, the
linear map A has a domain V
1
, a codomain V
2
, and an image AV
1
V
2
, written imA. The
image imA is not only a subset of V
2
, it is a subspace. This follows directly from the fact
that A is linear. Clearly, in our problem, a solution for x can exist if and only if y lies in
imA.
Similarly, we can dene the kernel of A, written ker A V
1
, to be the subspace of V
1
containing all vectors annihilated by A, that is v ker A Av = 0 V
2
. This is a subspace
because if Av
1
= 0 and Av
2
= 0, then A(
1
v
1
+
2
v
2
) = 0 for all
1
,
2
. The kernel has an
immediate application to our problem: if ker A is non-zero, then if any solutions for x exist,
they are not unique, as x+v is also a solution, for any v ker A. Notice how similar this is
to the addition of a complementary function to the particular solution in the solution
of an inhomogeneous linear dierential equation (to be discussed later).
As vector spaces, ker A and imA are nite dimensional, and
Theorem: dimker A + dimimA = dimV
1
.
We could prove this now, but I prefer to delay the proof until we discuss some additional
concepts.
The dimension of imA is also called the rank of A, written rk A. Notice that imA is
spanned by the images of a basis set in V
1
(however, the images of these vectors may not be
linearly independent, which is why rk A = dimimA can be less than dimV
1
). Equivalently,
we can dene the rank of A as the dimension of the subspace of V
2
spanned by the columns
9
of A, viewed as vectors in V
2
. We can also dene the kernel of A as the space of linear
relations among the columns of A (together with the zero vector).
So now we can conclude that if m > n, then rk A < m (strictly less than), and y for
which there is no solution for x. Similarly, if m < n, then rk A < m < n, so ker A ,= 0,
and if solutions exist, they are not unique. Basically these conditions say that the number
of unknowns (n) is less than/greater than the number of equations (m).
Now concentrate on the square case, m = n, viewed as A : V V using the same basis
for rows and columns of A. We have existence and uniqueness if and only if ker A = 0 (let
me write 0 now for the zero vector, and for the zero vector space, as the meaning should
be clear in context), that is if rk A = n. This holds i all the columns of A are linearly
independent, and then A
1
exists as a map/matrix.
Geometrically, this condition means that the images of the basis vectors form a linearly-
independent set. Picture them as forming a parallelepiped in V (or a parallelogram in 2D
case). If we can describe the volume of this object, then it should be non-zero only when
the image vectors are linearly independent. This is accomplished by the determinant.
D. Determinants
Formally, we can dene the determinant in terms of a multilinear function of n vectors in
V that takes values in K. Dene f : V V V K (written as f(v
1
, v
2
, . . . , v
m
) K)
as multilinear if, when viewed as a function of v
i
with all other arguments (inputs) v
j
,
j ,= i, held xed, it is a linear map of V to K, and this holds for all i (note that the
denition of linearity holds for f : V K as well as for f : V V

, as we can view K as a
one-dimensional vector space). (Note that this clearly should hold with m = n = dimV for
our volume function.) Further, we want the determinant to vanish if any two of its inputs
are equal, since then the parallelepiped collapses into a hyperplane. This then implies that
f(v
2
, v
1
, v
3
, . . . , v
n
) = f(v
1
, v
2
, v
3
, . . . , v
n
) by using linearity (make the rst two inputs
both v
1
+ v
2
, and see what happens), and similarly for the exchange of any two inputs. It
is then said to be alternating or antisymmetric. We now dene the determinant of a matrix
A (in this subsection, we write simply A for the matrix (A)) to be a multilinear, alternating
function of the n columns of the matrix that is equal to 1 for the identity matrix. It turns out
that this makes it unique (without the last condition, it would be unique up to a constant
10
factor). It is given by the standard formula:
det A =

i,j,k,...,z

ijkz
a
i1
a
j2
a
k3
a
zn
, (16)
or by several other formulas. Here is the usual tensor in n dimensions,

ijkz
=
_

_
0 if any two if i, j, k, . . . , z are equal
the sign of the permutation 123 n ijk z otherwise
(17)
The sign of a permutation is +1 for the identity, that is if ijk z is 123 n, and changes
sign for each transposition (i.e. exchange of two indices). To see it as an alternating multi-
linear function of the column vectors a
j
(each with elements a
ij
), we may write
f(a
1
, . . . , a
n
) = det A. (18)
You may check the multilinearity and alternation properties from these formula.
The determinant can be expanded in minors in the familiar way, expanding along a
row (or column). In index notation this becomes
det A =
n

j=1
(1)
j+1
a
1j
M
1j
. (19)
This is for expansion along the rst row. M
ij
, the i, jth minor, is the determinant of the
matrix obtained by deleting row i and column j from A. More generally, we can use any
row or column, and if we dene the i, jth cofactor as C
ij
= (1)
i+j
M
ij
, we have
det A =
n

j=1
a
ij
C
ij
=
n

i=1
a
ij
C
ij
, (20)
which holds for any i in the rst form, and any j in the second. These can be proved using
the preceding denition.
We can dene the determinant of a linear map A : V V to be the determinant of the
matrix of A using a basis. However, we will have to prove that the result is independent of
the choice of basis, so that it is well-dened. We will come back to this in a moment.
Important properties of the determinant: there arent many.
det AB = det Adet B = det BA, (21)
and by substituting B = I, we obtain
det A =
n
det A. (22)
11
Proof that det AB = det Adet B = det BA: we dened det A by using all the columns of
A: in the jth column a
j
(j = 1, . . . , n), the entries are a
ij
as i runs from 1 to n. Similarly,
det AB can be written using the columns of AB which in index notation are

k
a
ik
b
kj
. Each
of these is a linear combination (with coecients b
kj
) of the columns a
k
of A. Recall that
the determinant is a multilinear function of its n input vectors. So we can expand out the
determinant in linear combinations of the determinants of various choices from among the
columns of A; but choosing the same column twice gives zero. For other choices, we get
det A. Then it is not too hard to see (with a little care and patience) that the coecients,
each of which is a product of n elements of B, sum up to give the required result.
We needed to give the proof for matrices, not for linear maps, because the proof that the
determinant of a linear map is independent of the basis uses det AB = det Adet B. Recall
that under a change of basis, the matrix A in the rst basis becomes A

= A
1
in the
second, where is the change of basis matrix, and is invertible. It follows that det is
not zero, det
1
= 1/ det , and so det A

= det A.
There are no similarly simple results for det(A + B) in general. However, if A contains
two square diagonal blocks A
1
, A
2
of sizes n
1
, n
2
(i.e. a
ij
= 0 if i n
1
and j > n
1
, or
if i > n
1
and j n
1
), which means that A is the (direct) sum of two block diagonal
matrices, one with A
1
and 0 on the diagonal, the other with 0 and A
2
on the diagonal, then
we have
det A = det A
1
det A
2
. (23)
(In fact, this can also be viewed as a special case of the product result, because this block
diagonal matrix can also be factored into a matrix with A
1
and I on the diagonal times
(matrix product!) one with I and A
2
on the diagonal.)
The inverse of a matrix can be written in terms of determinants. We have Cramers
formula
(A
1
)
ij
= C
ji
/ det A. (24)
This clearly makes no sense if det A = 0, but in all other cases this gives the inverse. If
det A = 0, we say that A is degenerate, or singular, or non-invertible. If not, put non in
front of these terms (and cancel the double negative).
Summary: Putting together what we have seen, we now know that, if A is a linear map
12
A : V V , or an n n matrix,
A is invertible, i.e. A
1
exists as a linear map/matrix, if and only if det A ,= 0. (25)
An important corollary to this is that
Ax = 0 (26)
has a non-zero solution for x if and only if det A = 0. That is, if det A ,= 0, the kernel of
A is the trivial subspace, while if det A = 0, then the kernel of A is non-trivial (and there
are reverse statements for the image space: it is the whole space, or not). It is dicult
to overstate the importance of these statements they tell us a lot about existence and
uniqueness of solutions to linear equations involving A.
E. Gaussian elimination
Cramers formula may be useful in the course of a theoretical argument. For computation,
however, it is very inecient (for large n). Calculating one determinant is not too bad, but
here one has to calculate n
2
of them (of size n1 n). A much more ecient procedure is
to use Gaussian elimination. In this procedure, to solve a set of linear equations one adds a
multiple of one equation to another, seeking to eliminate some variables so that the others
are determined, one by one. (I refrain from details, as these are described in many textbooks,
and you probably learned this in high school.) Variations on the procedure exist, but the
best when doing it by hand on paper is to keep it simple so as to avoid errors. Diculties
may be encountered when one of the diagonal elements of the matrix (or its version after
earlier transformations) is zero, and something clearly must fail if det A = 0.
The process is equivalent to multiplying the matrix form of the equation, Ax = y on the
left (where else?) by an elementary lower-triangular matrix of a form like
L =
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0 0
0 1 0
0 1
0
0 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
, (27)
13
which has 1s on the diagonal, and a single non-zero entry anywhere below the diagonal.
After multiplying by enough such matrices, and then by suciently many upper-triangular
matrices U (with a single non-zero entry above the diagonal) we aim to reduce A to diagonal
elements only, at which point we can nd all x
i
by dividing by these diagonal elements
unless any of them are zero. That is, we multiply by the diagonal matrix with entries x
1
i
on the diagonal.
The same procedure can be used to calculate det A. Multiplying A on the left by a matrix
of the form L, the determinant is unchanged because det L = 1. Notice that the determinant
of an upper- (or lower-) triangular matrix, that is one satisfying a
ij
= 0 if i > j (resp., if
i < j) is equal to the product of diagonal elements, by expanding along the rst column
(resp., row). Hence reducing A to upper or lower triangular form is enough to obtain the
determinant; it is not necessary to make it diagonal.
We can obtain the inverse itself, instead of solving Ax = y. Suppose we nd a sequence
of row operations L
i
and U
j
s (still multiplying on the left), such that A becomes diagonal:
L
1
L
2
U
N
A = D, where D is diagonal (i.e. D
ij
= 0 if i ,= j). Then division by D gives
D
1
L
1
U
N
A = I. So it must be that A
1
= D
1
L
1
U
N
. Again, this requires that D
not contain vanishing diagonal elements, and as det A = det D, this is also the condition
det A ,= 0.
These techniques are much more ecient in practice for large n.
A nal note on changing basis on a matrix: suppose that A : V
1
V
2
is a linear map,
then we can change basis on V
1
and simultaneously on V
2
by matrices
1
and
2
similarly,
and then the transformation of A is (A

) = (
2
)(A)(
1
1
). In this case, det A is not invariant
in general (but it is if V
1
= V
2
, so
1
=
2
). In Gaussian elimination, we viewed A as a
map V
1
V
2
, and performed basis transformations on V
2
only, so (
1
) = I. (It is also
legitimate to change basis on V
1
when solving Ax = y. That means we solve for known
linear combinations of the original x
i
s, which might be more convenient, and is just as good
as solving for x
i
themselves, because we can invert the change of basis to recover x
i
.)
F. Eigenvalues, eigenvectors, and diagonalization
Now we begin to consider the problem of trying to diagonalize the matrix of a linear map
A : V V by nding a basis on V in which it is diagonal, using the transformation rule of
14
eq. (10). A linear map V V is sometimes called an endomorphism.
A common problem that arises is to nd a non-zero vector x and a corresponding scalar
such that
Ax = x. (28)
That is, we only know A and must solve for both x and . Then x is called an eigenvector
and is called an eigenvalue (and said to be associated with the eigenvector x); the problem
of nding a solution for the pair , x is called an eigenvalue problem. (Note an eigenvector
is required to be non-zero, to avoid the trivial solution.) If we can nd a basis of eigenvectors
of A, then the matrix of A in that basis will be diagonal with entries
i
, the corresponding
eigenvalues.
Remark: if det A = 0, then A has eigenvectors of eigenvalue 0. Why?
The approach that must be followed in general is in two steps. Before starting we write
the problem in the form (A I)x = 0.
Step 1: This equation says that we look for vectors x in the kernel of AI. The kernel
is non-zero if and only if det(A I) = 0. Solving this equation for will give us all the
eigenvalues. Working out the determinant gives us the characteristic equation,

n
a
1

n1
+ + (1)
n
a
n
= 0 (29)
where a
1
, . . . , a
n
are a set of numbers related to the elements of A, in particular a
n
= det A
and a
1
= tr A =

n
i=1
A
ii
(the latter is called the trace of A). The left-hand side is also called
the characteristic polynomial of A. Notice that the characteristic polynomial is invariant
under a change of basis for the matrix (A) (because det(A I) is), so is an invariant of
the linear map A. This gives us a set of n numerical invariants a
i
of A, which include det A,
tr A, and others.
The following result is of interest:
Theorem: (Cayley-Hamilton) A satises its own characteristic equation, that is A
n

a
1
A
n1
+ + (1)
n
a
n
= 0. Note that this is a matrix equation, and is less obvious than
you may think. (Proof omitted.)
If we are willing to use complex values for , then the Fundamental Theorem of Algebra
tells us that a polynomial equation of degree n has exactly n solutions (or roots of the
polynomial), possibly some being repeated [example: x
3
3ax
2
+3a
2
xa
3
has roots x = a,
a, and a, because it equals (x a)
3
]. That is, any degree n polynomial can be factored
15
into

n
i=1
(
i
) in which some factors may occur more than once. This assures us that
complex solutions for the eigenvalues do exist. But if we are working with a real vector
space, eigenvalues that are not real are problematic.
Step 2: For each distinct root
i
, solve
Ax
i
x = 0 (30)
as a linear system of equations to nd x. For example, try Gaussian elimination; you will not
get a unique solution for x, but only relations among its entries, because the solutions form
a subspace (the kernel of A
i
I) of V , not just a single vector. But at least the relations
can be made simpler than the equation with which we started. At least one solution does
exist for each distinct
i
, because ker(A
i
I) is non-zero. If the kernel is one dimensional,
you will get a solution something like
1
x
1
=
2
x
2
,
2
x
2
=
3
x
3
, and so on, where
i
are numbers in F, some (but not all) of which might be zero. This determines x up to
multiplication by a scalar.
If
i
is complex, then the resulting eigenvector will have complex entries even when the
matrix (A) is real. If our vector space was supposed to be real [so (A) has real entries], then
these do not lie in (or do not exist in) that space. Put another way, complex solutions may
not be physical. We can handle this as follows. First, because (A) is real, a root of the
characteristic equation [i.e. eigenvalue of (A)] may be a real number, or if it has non-zero
imaginary part then is also a root (eigenvalue). Furthermore, from the eigenvalue equation
(A)v = v viewed as an equation of column vectors, we see by complex conjugating the
components that v (the complex conjugate column vector to v) is also an eigenvector, with
eigenvalue (again, we see that non-real come in complex conjugate pairs). By taking the
real and imaginary parts of v, neither of which vanishes (why?), we obtain two real non-
zero vectors, and the two-dimensional space they span is mapped into itself by A. So for
real vector spaces, diagonalization may not be possible even when there are n eigenvectors,
because those vectors may be complex. The best we can do in such cases is to reduce it
to 2 2 blocks on the diagonal. Examples of this will be discussed later in the context of
orthogonal (or rotation) matrices.
More importantly, there may not be n linearly-independent eigenvectors even in the case
of K = C. (A vanishing linear combination, with all coecients non-zero, of eigenvectors can
always be analyzed into linear dependencies of eigenvectors all having the same eigenvalue.
16
Prove this.) If there are, they form a basis of eigenvectors, and the matrix of A relative
to that basis is diagonal. This does not always occur. In general, the closest to a diagonal
matrix one can come is described by the following
Theorem: (Jordan canonical form) For K = C, for any map A : V V , a basis in which
A has the block form
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0
0 0
0 0

_
_
_
_
_
_
_
_
_
_
_
_
_
, (31)
where the 0s are zero matrices, and the s are square matrices each of the form (called a
Jordan block)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_

i
1 0 0 0
0
i
1
0 0
i
1


i
1
0 0 0 0
i
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
. (32)
Each distinct eigenvalue
i
may occur in more than one Jordan block, but the total number
of occurrences of each distinct eigenvalue
i
on the diagonal of A must equal its multiplicity
as a solution of the characteristic equation.
Proof: Omitted. But you should check that the characteristic equation of A has the
expected factors.
An m m Jordan block is of the form
i
I + N, where
i
I is diagonal, and N obeys
N
m
= 0 (but for a Jordan block, no lower power of N is zero). A map or matrix obeying
N
m
= 0 for some positive integer m is called nilpotent (the case m = 1 means N = 0, which
is not very interesting). A nilpotent matrix is like a raising or lowering operator in quantum
mechanics. It can always be brought to the form of the block above (with
i
replaced by 0)
by a change of basis, or to a block diagonal form with several such blocks, which may have
dierent sizes. Notice that as det(
i
I + N) =
m
i
, the block is invertible if
i
,= 0 (whereas
N is not invertible).
Also, for each Jordan block, the column vector (1, 0, 0, . . . 0) is an eigenvector, but the
17
other basis vectors for the same block are not. There is just one eigenvector per Jordan
block. Moreover, the number of distinct eigenvalues must be the number of Jordan
blocks. Consequently, if there are no repeated roots of the characteristic polynomial, all
Jordan blocks have size 1, and there exists a basis in which the map/matrix is diagonal (i.e.
there exists a complete set of eigenvectors).
Ex: If m = 2, an example is
A =
_
_
_
a 1
0 a
_
_
_. (33)
The characteristic equation is ( a)
2
= 0. A has an eigenvector
_
_
_
1
0
_
_
_ with eigenvalue
a. If we try to solve Ax = ax to nd all (linearly-independent) eigenvectors, we nd that
x
2
= 0, so this is the only eigenvector, apart from multiplying by a scalar. (This generalizes:
if A is in the form of a single Jordan block, one can easily show that for an eigenvector
x
2
= x
3
= x
m
= 0.) Note that A is invertible if det A(= a
2
) ,= 0, using the formula for
the inverse of a 2 2 above.
To obtain the Jordan canonical form basis for A in the general case, we can proceed as
follows. We focus on one of the eigenvalues
i
(roots of characteristic equation). We nd
a linearly-independent set of solutions to (A
i
I)x = 0. If the number of these vectors
equals the multiplicity of the root
i
, we are done and we can go on to the next eigenvalue.
If not, we look for solutions to (A
i
I)
2
x = 0. A linearly-independent set of solutions will
include (or at least will span) the eigenvectors already obtained, but there will be additional
generalized eigenvectors. Each of these additional vectors will be mapped by A
i
I into
the space spanned by the true eigenvectors, as predicted by the Jordan canonical form. If
the size of this set equals the multiplicity of
i
, we are done, but if not we repeat by studying
(A
i
I)
2
x = 0, and so on. Eventually, by studying how these vectors are mapped, we can
nd the basis that brings A to Jordan canonical form.
Application: we can prove the formula
det A = e
tr ln A
(34)
for any invertible linear map A (invertible implies det A ,= 0). [This would be easy to prove
if we could make A diagonal by a change of basis, but that is not always possible.] We will
18
actually prove the (equivalent) result
det e
A
= e
tr A
(35)
(and from this it follows that e
A
is always invertible, by using the (analogous) fact that for
numbers a, e
a
is nonzero).
Proof: we can always nd a change of basis such that the matrix A in the new basis is
an upper triangular matrix T, A =
1
T (e.g. we can use the transformation to Jordan
canonical form). Then
e
A
=
1
e
T
, (36)
and from e
T
= I + T +
1
2
T
2
+ . . ., e
T
is upper triangular, and the diagonal entries are
(e
T
)
ii
= e
T
ii
. Now
det e
A
= det e
T
(37)
=

i
e
T
ii
(38)
= exp

i
T
ii
(39)
= e
tr T
(40)
= e
tr A
, (41)
QED.
Now it is time for a confession. The method above for nding eigenvalues and eigenvectors
is the essential technique, which you should use for these problems in this course. But as
with the basic denitions for determinants and inverses, it is not ecient in practice for
large matrices. In fact, as you may know, the characteristic equation (like any polynomial
equation) cannot in general be solved in terms of algebraic operations, including taking
roots like m

, except when n 4. There are other methods for nding eigenvalues that
are ecient for larger matrices, but in the end they produce only approximate eigenvalues
that can be made as accurate as one wishes. These methods are beyond the scope of this
course. Once you have the eigenvalues, you can nd the eigenvectors using the methods
described above.
19
G. Bilinear and sesquilinear forms
First we consider real scalars, K = R. Recall that we dened multilinearity when dis-
cussing determinants. Here we consider the case of m = 2 inputs, and call it bilinearity.
Thus we consider a function B : V V K that is bilinear, B(v
1
, v
2
) is linear in v
1
for
v
2
xed, and vice versa. A bilinear form is symmetric if also B(v
2
, v
1
) = B(v
1
, v
2
) for all
v
1
, v
2
. We will usually call such a symmetric bilinear form S. It is suitable for use as the
denition of a scalar or inner product of vectors.
Ex: Column vectorswe could dene the symmetric bilinear form of two column vectors,
say
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1

a
n
_
_
_
_
_
_
_
_
_
_
_
_
_
,
_
_
_
_
_
_
_
_
_
_
_
_
_
b
1

b
n
_
_
_
_
_
_
_
_
_
_
_
_
_
, (42)
to be e.g. S = a
1
b
1
a
2
b
2
. . . a
m
b
m
(with independent signs, and m n), or as another
example, S = a
1
b
2
+ a
2
b
1
.
In general, using a basis, we could write the form as S(u, v) =

i,j
u
i
S
ij
v
j
, in which u
i
, v
i
are the components of the vectors, and if v
i
are the basis vectors, then S
ij
= S(v
i
, v
j
) = S
ji
(check this). We can also dene the norm-square of a a vector as
N
2
(v) = S(v, v). (43)
N
2
is a quadratic form; it obeys N
2
(v) =
2
N
2
(v). Given a quadratic form N
2
, we can
recover a symmetric bilinear form S from the polarization identity
S(u, v) =
1
2
_
N
2
(u +v) N
2
(v) N
2
(v)
_
. (44)
You can check that this is bilinear and symmetric (if N
2
was dened from S, then we get
the same S back).
Ex. of symmetric bilinear and quadratic forms: 1) the scalar product in Euclidean space
of three or any other number of dimensions, and corresponding mod-square of a vector; 2)
the metric in special relativity, X
2
= X
2
0
X
2
1
X
2
2
X
2
3
; 3) kinetic energy terms in classical
mechanics are quadratic forms, e.g.
1
2
mv
2
, or
1
2
I (rotational; I is the inertia tensor).
20
Diagonalization of symmetric bilinear form, or existence of an orthogonal basis set (i.e.
a basis of vectors v
i
such that S(v
i
, v
j
) = 0 if i ,= j):
First dene the kernel of S as the set of singular vectors v such that S(u, v) = 0 for all
u V ; the kernel is a subspace. S is degenerate of ker S ,= 0, non-degenerate otherwise.
Proof of existence of orthogonal basis: First suppose S is non-degenerate. The idea is
proof by induction, so rst we observe that if n = 1, any non-zero vector will do as there is
nothing to check. If n > 1, then choose a non-zero vector v
1
such that N
2
(v
1
) ,= 0. Such
a vector exists, because if on the contrary N
2
(v) = 0 for all v, then using polarization we
would have S(u, v) = 0 for all u, v, and this S is degenerate, big time. Then S restricted
to the subspace F = v
1
: K is non-degenerate, and is also non-degenerate when
restricted to the orthogonal subspace, F

= v V : S(v, v
1
) = 0. Further we can show
that F F

= 0, and that all v V can be written as v = v


1
+w for some scalar and
vector w F

. (Proof: = S(v, v
1
)/N
2
(v
1
), then v v
1
F

.) As S is non-degenerate
on F

, and dimF

= n1, this reduces the problem to one dimension less. If the theorem
is proved for dimension n1, then by applying it to F

, we can include the vector v


1
which
is orthogonal to F

, and we have an orthogonal basis for V . (Note that the orthogonal set
for V is linearly independent also, by using orthogonality, so it really is a basis.) Hence the
result holds for all n.
If the form is degenerate, let E = ker S. We can nd a subspace W V such that
W E = 0, and any vector in V can be written as v = e + w, where w W. Then S
restricted to W is non-degenerate. Using the proof for the non-degenerate case, we can nd
an orthogonal basis for W. Combine this with any linearly-independent set of vectors in E;
the resulting set is linearly-independent, and orthogonal, as vectors in E are orthogonal to
all vectors in V by denition of E = ker S. QED
Note: the proof is a little lengthy, and will not appear on tests. The result is the important
part. But the proof is illustrative of a mode of thought, and similar techniques will be used
again.
For basis vectors such that N
2
(v
i
) ,= 0, we can rescale them and change the magnitude
of N
2
, but never change the sign as our scalars are real numbers, and
2
0. In this way,
we can bring the inner product S to the form diag (+1, +1, . . . , 1, 1, . . . , 0, 0 . . .) (diag
means the matrix with these diagonal elements), in which in any particular case the number
of +1s, 1s or 0s might be zero. However, it is a theorem that the number of occurrences of
21
+1, 1, and 0 in this diagonal version of a given form S does not depend on the orthogonal
basis chosen.
Note: under a change of basis, v
i
v

i
=

j

ij
v
j
, the matrix of S, S
ij
= S(v
i
, v
j
),
transforms as
S

ij
= S(v

i
, v

j
) =

k,l

ik
S
kl

jl
= (S
T
)
ij
. (45)
This diers from the behavior of the matrix of a linear map, showing that we should pay
attention to what our matrices mean.
Dene the rank of S to be n dim ker S. Dene the signature of S using the above
diagonal version, but ignoring any zeroes, to be the sequence ++ + (if you are
a physicist), or the number (S) of +s minus the number of s (if you are a mathematician).
In either case, the rank and the signature, together with dimV , fully determine the diagonal
version of S up to rescaling of basis vectors.
If the number of negative diagonal elements in the diagonal version is zero (i.e. if rank
equals ), or equivalently if N
2
(v) 0 for all v, say that S or N
2
is positive. If N
2
(v) = 0
implies that v = 0, or equivalently if all the diagonal elements in diagonal form are +, then
say that S or N
2
is positive denite. (Similarly dene negative and negative denite by
using N
2
in place of N
2
.)
Ex: Euclidean space, n = 3, form is + + +, positive denite.
n = 4, +++ (or minus this, in some texts) is Lorentz signature, as in special relativity.
For kinetic energy, a positive quadratic form is desirable!
Until further notice, we consider only a positive-denite symmetric bilinear form (or scalar
product). We can write it as S(u, v) = u v, and dene the norm N(v) = [v[ =
_
N
2
(v).
Some very important results/techniques follow:
Triangle inequality:
[v +w[ [v[ +[w[. (46)
Schwartz inequality:
[v w[ [v[.[w[. (47)
Proof: 0 [av bw[
2
. Expand and put a = [w[, b = [v[, then divide by [v[.[w[. (The case
in which the latter quantity is zero is trivial.) QED
Gram-Schmidt orthogonalization: Given a basis set v
1
, . . . , v
n
, we rst normalize the
rst one, v

1
= v
1
/[v
1
[, then form u
2
= v
2
(v
2
v

1
)v

1
, and v

2
= u
2
/[u
2
[. Thus v

1
and v

2
22
are orthogonal and normalized. Continue similarly to obtain an orthonormal basis set.
A transformation between two orthonormal bases v
i
, w
i
has a special form (note
that v
i
v
j
=
ij
, and sim. for w
i
). If w
i
=

j

ij
v
j
, then it follows that
T
= I, the
identity matrix, so
T
=
1
, and such a matrix is called orthogonal.
Similarly, a linear map O is called orthogonal if it preserves inner products/norms:
N
2
(Ov) = N
2
(v). Then in an arbitrary (not necessarily orthonormal) xed basis, this
becomes O
T
SO = S (as matrices). Then in an orthonormal basis, S = I (as matrices), and
O
T
= O
1
(as matrices).
If S is not positive denite, then call its matrix in an orthonormal basis (we should
assume non-degenerate case), then in either the case of basis change (passive), or linear map
(active), we arrive at the condition
O
T
O = . (48)
In the example of the Minkowski metric , such a matrix O represents a Lorentz transfor-
mation, for either passive or active transformations. For these non-positive-denite cases,
the term pseudo-orthogonal is sometimes also used (probably assuming non-degeneracy).
H. Complex version
For complex scalars K = C, we will introduce a dierent concept from bilinear forms (it
is possible to consider bilinear forms also for complex scalars, but this is typically less useful
in physics). Many results for bilinear forms over R have analogs over C, as we will see.
A sesquilinear form is a function Q : V V K which is linear in the second input
when the rst is xed (as for a bilinear form), but is antilinear in the rst input:
Q(
1
v
1
+
2
v
2
, v
3
) =
1
Q(v
1
, v
3
) +
2
Q(v
2
, v
3
), (49)
where = complex conjugate of . In place of symmetric for a bilinear form (which would
not make sense for sesquilinear), dene a sesquilinear form to be Hermitian if Q(v
2
, v
1
) =
Q(v
1
, v
2
). Other common notations for a Hermitian form are Q(v
1
, v
2
) = (v
1
, v
2
) = v
1
[v
2
.
Notice the latter Dirac-like notation.
Ex: For column vectors with components a
i
, b
i
, examples of Hermitian forms are
Q(a, b) = a
1
b
1
a
2
b
2
+ . . . (where all choices of the signs are possible), or a
1
b
2
+ a
2
b
1
.
23
In general, using a basis,
Q(u, v) =

i,j
u
i
Q
ij
v
j
, (50)
where v
i
is the basis set, and Q
ij
= Q(v
i
, v
j
). Hermitian implies that Q
ji
= Q
ij
. Under
a change of basis v
i
v

i
=

j
v
j

ji
, the matrix of Q, Q
ij
= Q(v
i
, v
j
), transforms as
Q

ij
= Q(v

i
, v

j
) =

k,l

ki
Q
kl

lj
= (
T
Q)
ij
. (51)
Notation: for a matrix A, its conjugate-transpose A
T
has entries
_
A
T
_
ij
= (A
T
)
ij
= A
ji
.
The conjugate transpose of a matrix A is elsewhere often called the adjoint of A, and written
A

. A is then said to be Hermitian if A = A


T
. Notice that if A is viewed as the matrix of a
linear map, then the conjugate transpose does not behave properly under a change of basis,
i.e. the conjugate-transpose operation depends on the choice of basis (but is ne for the
matrix of a sesquilinear form). (The same applies to the transpose in the real case.) Hence
it does not give us a well-dened (basis-independent) operation on a linear map. (We will
dene the adjoint properly below, and show in which cases the matrix of the adjoint is the
conjugate of the transpose. The dierent uses of the terminology may seem confusing, but
the underlying idea of the usual usage is that one is using the standard Hermitian form for
column vectors, so the form is positive denite and the basis is orthonormalsee below.)
Dene the norm-square by
N
2
(v) = Q(v, v), (52)
and Q Hermitian implies N
2
is real (but not necessarily positive). Note that N
2
(v) =
[[
2
N
2
(v).
The polarization identity is more complicated in the complex case. [N
2
(u +v)N
2
(u)
N
2
(v)]/2 produces a real quantity, not a sesquilinear form. The correct identity is
Q(u, v) =
1
4
__
N
2
(u +v) N
2
(u v)
_
i
_
N
2
(u +iv) N
2
(u iv)
__
. (53)
Once again, given a norm-square N
2
, this formula can be used to dene a Hermitian form
Q (i.e. Q(v, u) = Q(u, v)), which reproduces the same norm-square.
Examples of the use of sesquilinear and Hermitian forms are found in quantum mechanics!
Theorem: Given a Hermitian sesquilinear form, a basis of orthogonal vectors, such
that Q(v
i
, v
j
) = 0 if i ,= j.
Proof is similar to that for symmetric.
24
Theorem: If the form is non-degenerate, then an integer r such that r of the orthogonal
basis vectors obey N
2
(v
i
) > 0, and n r obey N
2
(v
i
) < 0. The number r is independent of
the choice of basis.
Then by rescaling the orthogonal basis vectors, the form Q
ij
becomes
diag(+1, +1, . . . , 1, 1 . . . , 0, 0). The rank and signature can then be dened as in
the real symmetric case. We say Q is positive if N
2
(v) 0 for all v (no 1s on diagonal
when Q is diagonalized), and positive denite if N
2
(v) > 0 for all v ,= 0 (all diagonal
elements positive in orthonormal basis). For a positive-denite Hermitian form, dene
N(v) =
_
N
2
(v), or = [[v[[ or [v[.
The following hold for a positive-denite Hermitian form: The triangle inequality (as
before), the Schwartz inequality in the form
[v[w[ [v[.[w[, (54)
and Gram-Schmidt orthogonalization (similar to real case).
Returning to general (not necessarily positive) Hermitian forms, under a change of basis
v
i
v

i
=

j
v
j

ji
, the matrix of Q, Q
ij
= Q(v
i
, v
j
), transforms as
Q

ij
= Q(v

i
, v

j
) =

k,l

ki
Q
kl

lj
= (
T
Q)
ij
. (55)
If both bases are orthonormal, so Q = = diag(+1, +1, . . . , 1, 1, . . . , 0, 0 . . .) in either
basis, then

T
= . (56)
If is non-degenerate, we will say that such a is pseudo-unitary. In the positive-denite
case = I, the condition reduces to
T
=
1
, and is unitary.
Similarly, for an active transformation, a linear map U : V V that preserves the
norm, i.e. N
2
(Uv) = N
2
(v) for all v V , must obey (if the matrix of U is dened by
Uv
i
=

j
v
j
U
ji
on the basis vectors v
i
)
U
T
QU = Q (57)
as matrices. In an orthonormal basis, U
T
U = . Again, for the non-degenerate case, we
can call such U pseudo-unitary, and for the positive-denite case, unitary.
Again, examples for the positive-denite case are found in quantum mechanics.
25
I. Adjoint and self-adjoint linear maps
1. Real scalars
We will combine the notions of linear map and symmetric/Hermitian forms, and obtain
some diagonalization results. Again, we begin with K = R.
Let A : V V be a linear map, and let V be equipped with a symmetric bilinear form
S. Consider S(u, Av), which is a function of two variables, and takes values in the real
numbers, so S(, A) : V V R. For a xed choice of u, we will consider it as a
function of v, S(u, A) : V R, which is linear.
Fact: if S is non-degenerate, any linear map from V to R can be represented in the form
S(u

, ) for some xed vector u

(that is, for any linear map from V to R, there exists a u

such that, . . . etc). More on this later. So if S is non-degenerate, then for a given u, we can
nd a vector u

such that S(u

, v) = S(u, Av) for any v. We dene A

u to be this vector
u

, for every vector u. That is, for S non-degenerate, a linear map A

: V V , called the
adjoint of A, is well-dened by the statement that for all u and v,
S(A

u, v) = S(u, Av). (58)


Note that the adjoint depends on the choice of S, but is basis-independent. You can check
that A

is a linear map from V to V . Non-degeneracy of S is required here. Otherwise, for


example, if v ker S, then if Av , ker S, the right hand side can be non-zero (if S ,= 0),
while the left hand side must vanish for any choice of A

u, a contradiction.
Writing the equation in the denition in a basis v
1
, . . . , v
n
, S(v
i
, v
j
) = S
ij
, and we have
1) in a general basis, Av
j
=

i
A
ij
v
i
,

k
S
ik
A
kj
=

k
(A

)
ki
S
kj
(59)
which denes the elements of the matrix of A

. That is, as matrices,


SA = (A

)
T
S. (60)
Notice then that as matrices, A

= (S
T
)
1
A
T
S
T
= S
1
A
T
S (using S = S
T
), which shows
that if S is degenerate, then A

is ill-dened, because for S degenerate its inverse as a matrix


S
1
does not exist.
26
2) in an orthonormal basis, S = = diag (+1, +1, . . . , 1, 1, . . .) (no zeroes on diago-
nal), so as matrices =
T
=
1
. Then as matrices A

= A
T
.
3) if S is positive denite, and we use an orthonormal basis so S = = I, then the matrix
reads A

= A
T
. We conclude that (K = R real scalar case)
the matrix of the adjoint map equals the usual matrix transpose of the map only if S is
positive denite and an orthonormal basis is used.
If S is non-degenerate, and A

= A (as linear maps), then we say that A is self-adjoint,


so
S(Au, v) = S(u, Av). (61)
Then of course A

= A as matrices in any basis.


Note on my notation/terminology: I use A
T
for the usual transpose of a matrix, (A
T
)
ij
=
A
ji
. What I call the adjoint might also be called the transpose, and does coincide with it
for positive denite S in an orthonormal basis (the most natural thing to use!). What I call
self-adjoint would be called a symmetric map by others, but again it means the matrix is
symmetric (i.e. equal to its transpose) only in the case S = I.
Exercise: If O is an orthogonal linear map, or isometry, that is it preserves the non-
degenerate symmetric bilinear form S, so as matrices O
T
SO = S, then you can show that
O

= O
1
as linear maps (independent of basis, and not only for the positive-denite case).
Finally, note that you can prove that S(, A) : V V R is itself a symmetric bilinear
form if and only if A is self-adjoint.
2. Complex version
For K = C, things work out very similarly. For Q non-degenerate, we can dene A

on
any vector by the formula
Q(A

u, v) = Q(u, Av), (62)


which is required to hold for all u and v. A

is the adjoint of the linear map A, but depends


on Q.
1) In a general basis,

k
Q
ik
A
kj
=

k
(A

)
ki
Q
kj
, (63)
27
or as matrices
QA = (A

)
T
Q. (64)
2) In an orthonormal basis, Q = = diag (+1, +1, . . . , 1, 1, . . .) (no zeroes on diago-
nal), so as matrices =
T
=
1
. Then as matrices A

= A
T
.
3) If Q is positive denite and we use an orthonormal basis, then = I, and
A

= A
T
(65)
as matrices. The right-hand side denes the usual adjoint (i.e. the conjugate transpose)
of a matrix, but I avoid that notation. Complex case, K = C:
The matrix of the adjoint map equals the conjugate of the usual matrix transpose only if
Q is positive denite and an orthonormal basis is used.
Again, if A

= A we say that A is self-adjoint, so


Q(Au, v) = Q(u, Av). (66)
The term Hermitian is also sometimes used. For matrices, A is Hermitian if A = A
T
. These
concepts agree only if Q = I, as we saw above.
Exercises (analogous to the real case): (i) if U is a unitary linear map (isometry) as
dened above, show that U

= U
1
as linear maps (or in an arbitrary basis).
(ii) Show that Q(, A) is a Hermitian form if and only if A is self-adjoint.
3. Diagonalization: the Spectral Theorem
We continue with the case of complex scalars, and assume the vector space V has a
positive-denite Hermitian form (or scalar product) on it. Then in an orthonormal basis, the
matrix of a self-adjoint map is a Hermitian matrix, i.e. equal to its conjugate transpose A =
A
T
. Question: can we diagonalize such a matrix/map, that is nd a basis of eigenvectors?
It turns out that we can, and the basis is even orthonormal. First, some well-known simple
facts:
Thm: (i) Any eigenvalues of A are real. (ii) Eigenvectors corresponding to distinct
eigenvalues are orthogonal.
Proof: (i) If v is an eigenvector of eigenvalue , Av = v, then (writing v[w for
Q(v, w), and v[v = N
2
(v)) v[Av = N
2
(v) = Av[v by self-adjointness, = N
2
(v) so
28
= , and is real. (ii) If Av
1
=
1
v
1
and Av
2
=
2
v
2
for non-zero v
1
and v
2
, and
1
,=
2
,
then v
1
[Av
2
=
2
v
1
[v
2
= Av
1
[v
2
by self-adjointness, =
1
v
1
[v
2
. Now
1
=
1
by
part (i), and so (
1

2
)v
1
[v
2
= 0. Since
1
,=
2
, v
1
and v
2
are orthogonal. QED
In the case of eigenvectors of the same eigenvalue, linear combinations of them are also
eigenvalues, and so given a set of eigenvectors of the same eigenvector, they can be orthog-
onalized by Gram-Schmidt. It only remains to prove that a complete set of eigenvectors
exists. Note: in a nite-dimensional space, the spectrum of a linear map means the set of
its eigenvalues.
Spectral Theorem, nite-dimensional case: A self-adjoint map A has a complete or-
thonormal set of eigenvectors.
Proof: First, choose any root
1
of the characteristic equation of A, then there exists a
corresponding eigenvector v
1
, which can be normalized so N
2
(v
1
) = 1. Let us make use of
the matrix of A with respect to (various) orthonormal bases. We can nd an orthonormal
basis of V in which the rst basis vector is v
1
(proof: take any basis including v
1
, and use
Gram-Schmidt). In this basis, A takes the form
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_

1

0
0

0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
. (67)
But A must still be Hermitian in this basis, so the elements in the rst row after the rst
(
1
) must be zero. This is the key step in the proof. The remaining (n 1) (n 1) block
of A is again Hermitian, and we repeat the process. The eigenvectors we construct in this
subspace are automatically orthogonal to v
1
. After nding n orthonormal eigenvectors in
this way, the proof is complete.
Corollary: Under the same conditions, the matrix of A (in an orthonormal basis) can
be diagonalized by a unitary change-of-basis matrix. That is,
A = UDU
1
(68)
as matrices, where U is unitary and D is diagonal, D = diag (
1
, . . . ,
n
).
29
Proof: The changes of basis we constructed between orthonormal bases are unitary.
Note the consequence of this statement:
AU = UD. (69)
This is a matrix equation, and we can consider each column of it as a column-vector equation.
These equations are
Au
i
=
i
u
i
(70)
where u
i
is the ith column of U. That is, the columns of U are the eigenvectors, and the
fact that they are orthonormal is equivalent to the fact that U is unitary. (Another way to
see this is to write D = U
1
AU which is a matrix equation in the new basis. In fact D is
just another name for the matrix of A in this basis. The rst eigenvector of D is
_
_
_
_
_
_
_
_
_
_
_
_
_
1
0

0
_
_
_
_
_
_
_
_
_
_
_
_
_
. (71)
Back in the original basis, using the basis-change matrix U to map column vectors from the
new to the original basis, the same vector is represented by u
1
, the rst column of U.)
A simple follow-up to the spectral theorem is to consider a unitary map instead of a
self-adjoint one. Then we have
Theorem: The eigenvalues of a unitary map have modulus 1.
Proof: if Uv = v, then N
2
(Uv) = N
2
(v) = [[
2
N
2
(v), so [[
2
= 1.
Note that if Uv
1
=
1
v
1
, then U

v
1
=
1
1
v
1
=
1
v
1
.
Spectral Theorem, unitary maps: A unitary map has a complete orthonormal set of
eigenvectors.
Proof: If v
1
is an eigenvector with eigenvalue
1
, U maps any vector orthogonal to v
1
to
a vector orthogonal to v
1
, because if v[v
1
= 0, then Uv[v
1
= v[U

v
1
=
1
v[v
1
= 0.
This (or the direct use of the note before the theorem) corresponds to the key step in the
self-adjoint case. Then repeat as before, etc.
Another way to see the result is to use U = e
iA
for self-adjoint A.
Now we pass to the case of real scalars. Here the proof uses the complex case, which
is why we did that rst. So suppose that K = R, we have a positive-denite symmetric
30
bilinear form (or inner product) on V , and A is a self-adjoint linear map. In an orthonormal
basis, this means that the matrix of A is symmetric, that is equal to its transpose.
Spectral Theorem, real case: A has a complete orthonormal set of (real) eigenvectors.
In other words, its matrix can be diagonalized by an orthogonal change of basis.
Proof: The matrix of A is symmetric (in an orthonormal basis) and real, so we can
view it as a Hermitian matrix (that is, we can extend the vector space and the scalars to
be complex by using this particular orthonormal basis, and extend A in the obvious way).
The eigenvalues of A are therefore real, which is noteworthy. If v
1
is an eigenvector with
eigenvalue
1
, and v
1
= a + ib with a, b real (as column vectors), then since A and
1
are
real Aa =
1
a, Ab =
1
b. Since a and b cannot both be zero, we can take one of them as
the real eigenvector we wanted, and similarly for the other eigenvectors. Thus
A = UDU
1
(72)
and D and U are real. But a real unitary matrix is orthogonal, say U = O. QED
In the real case, the spectral theorem does not extend completely to the case of an
orthogonal matrix/map O, unlike the complex case. This is because the eigenvalues of an
orthogonal map may be complex (if real, they can only be 1). We try to use the result from
the complex case (applied to the orthogonal map, as in the self-adjoint case above). One
additional idea in the real case is that if Ov
1
=
1
v
1
, then taking complex conjugates we have
Ov
1
=
1
v
1
. So if
1
is not real, then
1
,=
1
is also an eigenvalue of O (this was not true in
the general unitary case). In this case we have two eigenvectors v
1
, v
1
of O, which must be
linearly independent of each other (because they have dierent eigenvalues), and therefore
are neither purely real nor purely imaginary. Taking the real and imaginary components
gives us two (!) non-zero real vectors, and O maps the two-dimensional subspace that they
span into itself as an isometry. In the case of eigenvalues 1 or 1, we obtain eigenvectors that
can be taken to be real as in the case of a self-adjoint map. In this way, we can nd a basis of
real vectors in which the matrix of O has a block diagonal form, where the diagonal consists
of 2 2 blocks that are proper (det = 1) rotations (real orthogonal matrices) with rotation
angle ,= 0 or these correspond to eigenvalues ,= 1and/or 1 1 blocks containing
either +1 or 1.
Ex: For n = 2, consider a rotation by angle . This is described by a matrix with
31
determinant 1 (write it down!). You can check that the characteristic equation is

2
2cos + 1 = ( e
i
)( e
i
) = 0, (73)
so the eigenvalues are = e
i
and = e
i
, which are complex conjugates of each other,
and for ,= 0 or , these eigenvalues are not real, and not equal.
(Continued) An improper rotation is an orthogonal matrix/map with determinant = 1,
while a (proper) rotation has determinant 1. In the n = 2 case, an improper rotation can be
brought to the form diag (+1, 1), and its eigenvalues are real. An improper rotation must
involve a reection, in any dimension.
Ex: for n = 3, this result is called Eulers theorem. It says that there exists a basis in
which any proper (det = +1) rotation becomes a rotation in the two dimensions perpendic-
ular to one of the coordinate axes. Thus a rotation is fully described by a choice of rotation
axis, and a rotation angle (except for the special case of the identity map). The general
improper rotation in n = 3 has the block form with a 1 in one diagonal entry, and a 2 2
rotation matrix with ,= 0, in the other, OR is I.
4. Positive self-adjoint linear maps
A nal point: We will consider here only positive denite S or Q. We saw in either the
real or complex cases that a self-adjoint map A denes a symmetric/Hermitian form via
Q(, A) (similarly for real). Then we say that A is positive (positive denite) if the latter
form is positive (positive denite). This is equivalent to both (i) all the eigenvalues of A
are non-negative (positive), and (ii) for any v, Q(v, Av) 0 (Q(v, Av) 0 and = 0 only if
v = 0).
Positivity and positive-deniteness of a self-adjoint linear map are very useful properties
to have in applications. The denitions dont seem to be useful if we try to extend them to
nonself-adjoint maps.
32
II. LAPLACE TRANSFORMS
A. The transform and applications
The Laplace integral transform is related or similar to the Fourier transform. The Fourier
transform seems to be more important and fundamental.
Given a function f : [0, ) C, dene
/f(s) =
_

0
e
st
f(t) dt (74)
whenever the integral makes sense. So we require f = o(e
s
0
t
) for some s
0
, as then the integral
converges at for s > s
0
(or Re s > Re s
0
). Then /f(s) is analytic in s for s > s
0
, in fact
analytic in the complex s plane for Re s > Re s
0
.
The Laplace transform is useful for solving initial value problems for inhomogeneous
ODEs with constant coecients on the left-hand side; for example, driven electrical circuits
in engineering. It also sometimes has other uses. The idea is to apply the transform in the
hope of solving more easily, and then nding the inverse transform to recover the solution
to the original problem.
Here are some Laplace transforms which can be easily obtained (partial list only):
f(t) /f(s)
1 1/s
e
at 1
sa
sin at
a
s
2
+a
2
e
ct
f(t) /f(s c)
(t c) e
cs
f
(n)
(t) s
n
/f(s) s
n1
f(0) . . . f
(n1)
(0)
(t)
n
f(t) /f
(n)
(s)
(75)
The value of s
0
in each case is not listed, but can easily be obtained.
Usually, one also uses the table to nd the inverse transform of a known /f.
Example:
y

2y = 0, (76)
with initial conditions y(0) = 1, y

(0) = 0. The Laplace transform gives


s
2
/y sy(0) y

(0) (s/y y(0)) 2/y = 0. (77)


33
Note that the initial values already enter! The solution is then
/y =
s 1
(s 2)(s + 1)
(78)
=
1/3
s 2
+
2/3
s + 1
, (79)
so, using the table,
y(t) =
1
3
e
2t
+
2
3
e
t
. (80)
Yes, we could have easily got this without Laplace.
Finding the inverse transform is the hardest part, and you are no doubt asking if the
inverse is unique. For continuous functions f, f /f is injective (one to one), i.e. there is
only one continuous f that has the same /f. So the table method works to this extent.
Note I am not using an actual operator /
1
, at least not explicitly.
Also needed in practice is the following (similar to the convolution theorem for Fourier
transforms): Dene the Laplace convolution integral (not the same as in Fourier case),
(f
L
g)(t) =
_
t
0
dt

f(t t

)g(t

) = (g
L
f)(t). (81)
Then we have the
Theorem:
/(f
L
g)(s) = /f(s) /g(s). (82)
The Proof is fairly easy. This result is also of use when looking for an inverse transform.
It can be added to the table.
Example: Forced harmonic oscillator again.
x

+ 2x

+
2
0
x = f(t), (83)
where f(t) is given, and x(0) = x
0
, x

(0) = x
1
. Let /x(s) = X(s). Then
s
2
X sx
1
x
0
+ 2(sX sx
0
) +
2
0
X = /f(s). (84)
Rearranging
X(s) =
s(x
1
+ 2x
0
) + x
0
s
2
+ 2s +
2
0
+
/f(s)
s
2
+ 2s +
2
0
. (85)
The rst term contains the initial conditions and not f, while the second contains f and not
the initial conditions. This use of linearity is typical when solving inhomogeneous equations.
34
The rst part is similar to the example above. For the second part (i.e. putting x
0
= x
1
= 0),
it has the form /f /G, so x(t) can be written as a convolution,
x(t) =
_
t
0
dt

G(t t

)f(t

), (86)
where G(t) is a function such that
/G(s) =
1
s
2
+ 2s +
2
0
, (87)
and is nothing but the retarded Greens function for the equation with initial conditions
G(0) = G

(0) = 0; by construction it obeys


G

+ 2G

+
2
0
G = (t) (88)
for t > 0. Exercise: nd G using the table.
The Laplace transform can be generalized to systems of constant coecient ODEs also.
This turns the system into a set of linear equations containing the variable s. The transform
can sometimes be used to solve ODEs with non-constant coecients (see Vaughns book).
B. The inverse transform (Bromwich integral)
The inverse transform can also be calculated as the Bromwich integral
f(t) =
1
2i
_
b+i
bi
/f(s)e
st
ds (89)
which is a contour integral parallel to the imaginary s axis. Here b > Re s
0
, where as above
Re s
0
is the lower bound on the region where /f is convergent (analytic). That is, the
contour lies to the right of any points in the s plane at which /f is not analytic. The
formula can be proved (see Arfken and Weber) from its relation to Fourier transforms (by
rotating a contour), showing that when considering Fourier and Laplace transforms, complex
analysis is never far away.
The Bromwich integral is not used so much in practice, it seems; the use of the table is
more common.
35
III. LINEAR MAPS ON A HILBERT SPACE
A. Bounded and unbounded operators
Also called operators. I assume here that we know about the function spaces L
2
(a, b[),
which are functions on (a, b) that are square integrable with respect to weight factor 0,
i.e. the norm-square [[u[[
2

_
b
a
[u(x)[
2
(x)dx < . (Spaces of functions of more variables
are similar, but we focus on ordinary dierential operators.) Strictly, we should take the
quotient by the subspace of functions whose norm is zero (alternatively, some arguments
cannot distinguish functions that dier by a norm-zero function). Note for a = or
b = +, the integral is viewed as a limit from the nite interval case. The spaces L
2
(a, b[)
are Hilbert spaces. A general Hilbert space will usually be denoted V .
Linear maps or operators A : V V obey the usual linearity property. In the innite-
dimensional case, we can distinguish bounded from unbounded operators. An operator A is
bounded (or continuous) if for all v V , v ,= 0 (think of v as a vector or as a function in
L
2
),
[[Av[[
[[v[[
C (90)
for some (nite!) constant C (we can make a similar denition for maps A : V W or
A : V C, and we will use the latter soon). An operator is unbounded if it is not bounded.
To be pedantic, A unbounded means that A is undened on some vectors, not that Av is
dened but has innite norm, since all vectors in V have nite (non-innite) norm. In a
nite-dimensional Hilbert space, all linear maps are bounded. From here on, we will assume
that our Hilbert space is innite dimensional.
In particular, dierential operators, which have the form
Du(x) = p
n
(x)
d
n
dx
n
+ . . . + p
0
(x)u(x), (91)
(i.e. a linear combination of u(x) and its derivatives at x with coecients that may depend
on x) are always unbounded: they cannot be dened on all of V . Some functions cannot be
dierentiated. Also the coecient functions p
i
might diverge at some points, and then for
some u, Du is not nite, and may not lie in L
2
. However, as D is linear, the set of vectors
on which it is dened forms a vector subspace of the Hilbert space. Most of the theory relies
on this subspace being dense. A subspace W is dense in V if for any vector v V , and
36
any > 0 no matter how small, there is a vector w W such that [[v w[[ < . That is,
w W arbitrarily close to any v V .
Why not just redene the Hilbert space to be the dense subspace W? Because then it
is not closed, and a Hilbert space is closed by denition. Here, a subspace W is closed
if any Cauchy sequence of vectors in W has a limit also in W. Thus Hilbert space V is
closed in V . A closed dense subspace in V must be V itself. In nite dimensions, any dense
subspace is closed. The closure or completeness property of Hilbert space is very convenient
in theoretical arguments.
The subspace on which an operator A is dened is called its domain, written domA.
For dierential operators D, the expression for Du as above does not fully dene D as an
operator on V . It is only properly dened once its domain has also been specied. If the
domain is dense, we say D is densely dened. The domain of a dierential operator D
depends crucially on the boundary conditions.
B. Adjoint of an operator and self-adjointness
The discussion closely follows the nite-dimensional version, see above in these notes.
Given a bounded operator A : V V , consider
u[Av (92)
(we will use Dirac notation u[v for the inner product
_
b
a
u(x)v(x)(x)dx, corresponding to
the norm [[u[[ = (u[u)
1/2
by polarization). For xed u, this is a linear map V C; such
a linear map is called a linear functional, or an element of the dual vector space V

. It is
bounded in the same sense as a linear operator, that is
[u[Av[
[[v[[

[[u[[.[[Av[[
[[v[[
C[[u[[, (93)
where we used the Cauchy-Schwartz inequality and the fact that A is bounded (the constant
C is the one in this bound, as above). Because the inner product is non-degenerate, it can
be proved (even for Hilbert space!) that for any bounded linear functional F : V C, there
exists a unique vector w V such that
F(v) = w[v (94)
37
for all v V . Proof: we will use an orthonormal basis for V , say v
n
(notice we assume
this basis is countable; such bases exist in all examples we consider). Then we can describe
F in terms of its eect on basis vectors, F(v
n
) = f
n
, and if v =

n
a
n
v
n
, then
F(v) =

n=1
f
n
a
n
(95)
by linearity. Clearly, the vector w =

n
f
n
v
n
will give the correct answer formally (notice
the complex conjugate). But we must also check that this vector w is normalizable, a fact
that was obvious in the nite-dimensional case. The idea will be to apply F to w itself;
however, this is not quite correct, because w might not be in V (because it might not be
normalizable). So we will approximate it by w
N
=

N
n=1
f
n
v
n
, which is in V . Then
[F(w
N
)[
[[w
N
[[
=

N
n=1
[f
n
[
2
_

N
n=1
[f
n
[
2
_
1/2
=
_
N

n=1
[f
n
[
2
_
1/2
, (96)
and this is less than some nite constant C

for all N, because F is bounded. Thus as


N , the norm of w
N
increases to a nite (bounded) limit, which is [[w[[
2
=

n
[f
n
[
2
.
That means w is normalizable, which completes the proof.
We just proved that for a given u there is a vector w such that
w[v = u[Av (97)
for all v. We call this vector w = A

u, and do the same for all u; A

u is linear in u. Then
A

is called the adjoint of A (with respect to the given inner product). The equation
A

u[v = u[Av (98)


holds for all u and v. It follows that A

is bounded, and A

= A when A is bounded.
Note that the left hand side of equation (97) is bounded as a function of v for xed w,
in the sense of linear maps, that is
[w[v[
[[v[[
[[w[[, (99)
by using the Cauchy-Schwartz inequality. Therefore, the right hand side is bounded in the
same way: for xed u, and all v V ,
[u[Av[
[[v[[
C, (100)
38
where C > 0 is a constant. Any such bounded linear map V C can be represented by
w[v for some w V . This boundedness is therefore a necessary condition to dene the
adjoint.
So far we assumed that A is bounded, so the expressions are well-dened for all u, v V .
But even if A is not bounded, but is dened on a dense subspace domA, we can dene
domA

= u V : the linear map v u[Av is bounded on v domA (101)


and then because the linear map v u[Av is bounded for any such u, it can be extended
to all v V by continuity (i.e. by taking limits of v
n
v with the v
n
domArecall that
domA is dense, so these limits give all vectors in V !), and is then bounded on the whole of
V . It now follows as before that there exists a unique vector w V such that
w[v = u[Av (102)
for all v V (where the right hand side is dened for all v as we just described), and this
uniquely denes A

u = w. Doing this for all u domA

denes A

on all of domA

(and
A

is a linear map). We have the basic relation


A

u[v = u[Av (103)


for u domA

and v domA, with conditions on the domains as before. This is sym-


metrical between A and A

, however we cannot conclude that A

= A in general because
domA

may not be dense, and if not then we cannot even dene A

.
We also dene A to be Hermitian (or symmetric in functional analysis texts) if for all u,
v domA,
u[Av = Au[v. (104)
Notice this makes no reference to the adjoint of A. In the unbounded case, the term Her-
mitian does not mean the same as the term self-adjoint, which will be dened below. (For
bounded operators A, we can already see that Hermitian and self-adjoint, that is A = A

,
are equivalent, exactly as in nite-dimensional cases.)
We can easily show that if A is densely dened and Hermitian, then
domA domA

. (105)
[Proof: Au[v for xed u domA is a bounded map on v domA, and can be extended
to a bounded map V C as before, as domA is dense. Then the same is true for u[Av
39
by Hermiticity, and hence u domA

.] Because A is densely dened, it follows that A

is
also. However, A

need not be Hermitian, and in fact


domA domA

domA

. (106)
Denition: A is self-adjoint if it is densely dened, Hermitian, and domA = domA

. (Note
that then A = A

and domA = domA

.) For A bounded, domA = V , so Hermitian


implies self-adjoint (as we have seen), as in the nite-dimensional case.
C. Dierential operators
For D a dierential operator, the functions u in the domain must be normalizable, enough-
times dierentiable for the operator to make sense, and further Du must also be normal-
izable; we will usually leave these conditions implicit. There may also be some boundary
conditions at the endpoints a and b. In order to construct the adjoint, or to check Hermitic-
ity, we will need to integrate by parts, and this will bring in the boundary values of u and
its low derivatives explicitly.
For example, the general second-order dierential operator has the form

D = (x)
d
2
dx
2
+

(x)
d
dx
+ (x). (107)
In u[

Dv, integration by parts ignoring the boundary terms yields an expression

u[v,
where

D

is the formal adjoint of



D, in the present example

u =
1
(x)
_
_
d
2
( (x)(x)u(x))
dx
2

d(

(x)(x)u(x))
dx
+ (x)(x)u(x)
_
_
. (108)
The boundary terms are
u[

Dv

u[v =
_
u
dv
dx
+ u

v
d( u)
dx
v
_
b
a
. (109)
We suppose that the domain of

D has been specied, and is dense (for example, for the
kind of conditions already mentioned, it will be dense). The part

u[v is bounded as a
function of v, because
[

u[v[ [[

u[[ [[v[[ (110)


by Cauchy-Schwartz again. So recalling the denition above, to nd the domain of

D

as a true, not formal adjoint, we require the functions u such that the boundary terms,
40
divided by [[v[[, can be bounded by a constant. But in general the boundary values of v
and its derivatives are completely unrelated to [[v[[. In particular, there are functions whose
boundary values (or derivatives at the boundary) are arbitrarily large compared with [[v[[,
which is an integral of [v[
2
and insensitive to the precise values at the endpoints. Hence the
domain of

D

is dened (apart from the usual nite-norm and smoothness conditions) by


the boundary conditions that the boundary terms above vanish for any v dom

D. That is
dom

= u L
2
(a, b[) : . . . boundary terms eq. (109) vanish for all v dom

D. (111)
(In some singular cases, the boundary terms have to be studied carefully as the upper or
lower limit tends to b or a, rather than directly at b or a.) It is similar for operators of other
order, and for operators in more than one variable.
If the boundary conditions dening the domain of

D are strong enough to make the
boundary terms vanish, then the domain of

D

has no boundary conditions at all. Conversely,


if the conditions on dom

D are weak, those on



D

will be stronger. Notice that the discussion


implies that the boundary conditions on dom

are the weakest ones which, taking into


account the conditions on v in dom

D, make the boundary terms vanish.


Note that an operator D is formally self-adjoint if it is equal to its formal adjoint. If it
is formally self-adjoint, then it is Hermitian precisely when the boundary terms vanish for
u, v in domD. Finally, it will be self-adjoint if it is Hermitian and the boundary conditions
dening domD and domD

are the same.


Example: The simplest example is D = id/dx on [a, b] (nite) with = 1. D is formally
self-adjoint. The boundary terms in this case are
u[Dv D

u[v = i[u(b)v(b) u(a)v(a)] (112)


for u domD

, v domD. Let us see what happens for dierent choices of boundary


condition.
1) If the domain of D is dened by v(a) = 0, then the boundary terms reduce to u(b)v(b) = 0,
and as v(b) can be non-zero, we require u(b) = 0 for all u in domD

. Clearly the domains


are not contained one inside the other, and indeed D is not Hermitian (nor is D

). The
domains do intersect, and the intersection is the space of functions with u(a) = u(b) = 0.
2) If domD is shrunk to functions obeying v(a) = v(b) = 0, then no boundary condition
is required in domD

at all. Also, D is Hermitian, but not self-adjoint as the domains are


41
dierent, though we do have domD domD

. D

is not Hermitian, as the boundary terms


do not generally vanish when both functions are in domD

for this case.


3) Few choices now remain if we wish to nd boundary conditions for which D becomes
self-adjoint. One choice is the periodic boundary condition v(a) = v(b). It can be easily
checked that this forces the functions in domD

to obey the same condition, and so D is


self-adjoint. More generally, the most general self-adjoint boundary condition for D is
v(a) = e
i
v(b), (113)
where is a xed real constant (the generalized periodic boundary condition). Notice that
for any , these conditions are non-local; they relate values of the function at distinct points.
The main reason to introduce the idea of a self-adjoint operator is so that when we
examine the issues of eigenfunctions and diagonalization of operators, we can nd some sort
of spectral theorem, as in the nite-dimensional case. That is, we want to nd solutions of
Df = f (114)
for f ,= 0 in domD, where is a constant; then we say f is an eigenfunction of D and is
an eigenvalue of D (and similarly for D

). Let us see what eigenfunctions D and D

have
in each case in the example. As D = D

= id/dx, the general solution to the equation is


e
ix
times a constant.
1) For any value C, there is no eigenfunction obeying f(a) = 0 or f(b) = 0. Neither D
nor D

have any eigenfunctions. Recall that neither is Hermitian.


2) Now D is Hermitian, D

is not. Again, D has no eigenfunctions, but D

(with no boundary
conditions) has an eigenfunction for every complex value of !
3) Now D is self-adjoint. For = 0, it has eigenfunctions e
ix
provided lies in the set
2n/(b a) : n = . . . , 1, 0, 1, 2, . . .. The eigenvalues are real and form a discrete set.
The eigenfunctions are exactly those which enter in Fourier series, and form an orthonormal
complete set. Thus D can be written in terms of themit can be diagonalized by changing
to this basis. The situation is similar for all real .
The behavior in 3) is almost typical of self-adjoint operators: there is a set of real numbers
that are something like eigenvalues, and a corresponding set of functions something like
eigenfunctions with properties analogous to orthonormality and completeness, and D can
be diagonalized using them. This is the content of the Spectral Theorem for self-adjoint
42
operators on Hilbert space. The only complication (requiring the words something like
and analogous a moment ago) is that the eigenvalues may form a continuous set, and
the corresponding eigenfunctions are not normalizable. The examples illustrate how none
of the statements need to hold if the operator is only formally self-adjoint, or Hermitian,
but not self-adjoint.
Example: consider D = id/dx on (, ). In this case, D is self-adjoint with no
boundary conditions required at x beyond normalizability of u and Du; these con-
ditions clearly require that u 0 as x . The formal eigenfunctions are e
ikx
with
eigenvalue k, for any real k. These functions are not normalizable, but the norm does not
diverge as badly as for e.g. e
x
, real. The formal eigenfunctions form a complete orthonor-
mal set, in a sense of integrals instead of sums: the formulas are those of Fourier integral
transforms. But it is not yet clear why for example we should accept e
ikx
as a formal eigen-
function, but not e
x
. These are the shortcomings of the traditional physicists treatment
of these operators, going back to Diracs pioneering work. Fortunately, the problems have
been solved by von Neumann.
Example: in each example so far, we eventually found a way to make a formally self-
adjoint operator into a truly self-adjoint one. This is not always possible. For example,
consider D = id/dx on [0, ) with = 1. As functions u in domD must go to zero at
, the analogous boundary term is simply iu(0)v(0). If we require v(0) = 0 for domD,
then there is no boundary condition on functions in domD

. Now D is Hermitian, D

is
not Hermitian, and D

= D. There is no possible way to make this operator D self-adjoint.


Related to this, D

has a normalizable eigenfunction e


ix
(with eigenvalue ) for all with
Im > 0, and no normalizable eigenfunction if Im 0.
D. Sturm-Liouville systems
Refs: Stone and Goldbart; I. Stakgold, Boundary Value Problems of Mathematical
Physics, Vol. I, Ch. 4.
When the theory is worked out for second-order dierential operators, it is called Sturm-
Liouville theory. We consider the dierential operator

D as above, and ask for what bound-
ary conditions it is self-adjoint on (a, b) when the inner product is u[v =
_
b
a
u(x)v(x)(x) dx.
43
The standard form for a S-L dierential operator is
Du =
d
dx
_
p(x)
du
dx
_
+ q(x)u(x). (115)
It is always possible to nd a function

(x) so that

(x)

D = D is in the standard form for


some functions p and q. Proof: We need

= p,

= p

,

= q. The last condition is
trivial, while the rst two imply (

=

(

/ ), which has solution

(x) =
1
(x)
exp
_
x

(x

)
(x

)
dx

. (116)
Now it is easy to see that D is formally self-adjoint with respect to the weight factor
= 1 provided that p and q are real. It follows that

D is formally self-adjoint with respect
to the weight factor =

, provided ,

, and are real. If the latter conditions hold, then
is real and can indeed be taken positive everywhere, provided does not change sign by
passing through either zero or innity. From now on, we consider the operator

D and the
inner product (or norm) using , but also refer to D =

D.
Using the formal adjoint

D

, the boundary terms (109) become


u[

Dv

u[v = [p(uv

v)]
b
a
, (117)
which are simply the dierence at the two endpoints of p times the Wronskian of u and v.
(In singular cases, we may not wish to directly evaluate the terms at the singular endpoints,
but rather approach them as limits, a
0
a; we ignore this for now.)
We say that the problem is a S-L problem if on the open interval (a, b), all the following
conditions (i), (ii) hold: (i) p, p

, q, and are continuous and real; (ii) p, > 0. We say


the S-L problem is regular if b a is nite, and the conditions hold on the closed interval
[a, b]. If the S-L problem is not regular, then it is a singular S-L problem. In the singular
case, typically either a = , or b = , or p or 0 or at the endpoints (or any
combination of these).
In a regular S-L problem, the boundary conditions that make the boundary terms vanish,
and moreover are weak enough that the same conditions must hold in dom

as in dom

D,
so that

D is self-adjoint, are of the general form
u(a) +

(a) = 0, (118)
u(b) +

(b) = 0, (119)
44
where ,

, , and

are real constants, and ,

are not both zero (and similarly for


,

). Because we can divide the rst by or

, the possible self-adjoint bcs at a are


described by a single real parameter (which can be innity), and similarly at b.
The discussion of regular S-L systems will be continued later.
E. Boundary conditions for singular S-L systems (non-examinable)
In singular cases, the derivation of self-adjoint boundary conditions is much more di-
cult, and also hard to nd in books accessible to physicists. One reference is the book by
Stakgold. Another one, which is harder, is M. Reed (no relation) and B. Simon, Methods of
Mathematical Physics, Volume II, Appendix to Section X.1.
First, as one might expect, the two endpoints can be handled separately. If one is regular
(i.e. nite and p and are well-behaved there), then we can use the regular boundary
condition we just described. If both are singular, we can split the interval in two by picking
an arbitrary regular point in-between, and the analysis reduces to the case of one singular
endpoint. So we will assume the interval is a < x < b, and a is regular and b is singular
(possibly, b = ).
The result, deriving from H. Weyl, 1910, is based on the following Theorem:
The equation
(pu

+ qu = u (120)
on (a, b) (no boundary conditions) either has two linearly-independent solutions of nite
-norm for all , or for Im ,= 0 it has only one solution of nite -norm (up to multiplying
by a scalar). These two cases are respectively the limit-circle case and the limit-point case.
Notice that a lot is packed into the statement of the Theorem. For example, it states
that for Im ,= 0, a solution of nite -norm always exists; the issue is whether there is
another linearly-independent one. If there are two for one value of then there are two for
all values of C.
The theory now instructs us on how to nd the self-adjoint boundary conditions for

D.
1) In the limit-circle case, there is a one-parameter family of possible bcs at b such that

D
is self-adjoint. If u
1
and u
2
are the linearly-independent solutions for some value of , then
a self-adjoint bc at b is of the form
u(x) C(a
1
u
1
(x) + a
2
u
2
(x)) as x b. (121)
45
Note the use of the asymptotic symbol which we discussed earlier. Here a
1
, a
2
are real
constants, which are xed and determine the bc and hence dom

D. C is arbitrary, so u can
be an arbitrary multiple of the specied form, asymptotically (thus the bc is linear). We
can divide through by a
1
or a
2
(absorbing it into C), and then we see that this is a one-
parameter family of bcs as for the regular case. In eect, the allowed functions in dom

D
are asymptotic to the specied linear combination of the leading asymptotic behaviors of
the two solutions u
1
, u
2
as x b, up to an overall factor.
Incidentally, the regular case can be written in a similar form: the regular bcs require
that if b is regular,
u(x) C(

(x b)) as x b. (122)
The forms u
1
constant, u
2
(x) (x b) are the behaviors of two solutions when b is an
ordinary point of the dierential equation, as in the regular endpoint S-L case. Note that if
e.g. = 0, this should be interpreted as u(x) = C

+O((x b)
2
), corresponding to the bc
u

(b) = 0.
2) In the limit-point case, no condition is needed at x = b, beyond the usual conditions [[u[[
and [[

Du[[ being nite.


F. Eigenfunctions
Denition: an eigenfunction of an operator A is a solution y ,= 0 to
Ay = y (123)
such that y lies in domA (in particular, y is normalizable). C is the corresponding
eigenvalue.
This denition is completely general; it applies to any operator. If V is L
2
(a, b[) and

D
is a Sturm-Liouville dierential operator as before, then

Dy = y, or Dy = y, (124)
and y is not identically zero.
For A a Hermitian operator (not necessarily self-adjoint), we have easily, as in the nite-
dimensional case,
46
a) any eigenvalues of A are real:
0 = y[Ay Ay[y (125)
= ( )[[y[[
2
, (126)
so = , is real. Note the argument uses the property of Hermiticity as dened earlier in
this section.
b) Eigenvectors (-functions) belonging to distinct eigenvalues are orthogonal: suppose the
eigenvectors are y
1
, y
2
with real eigenvalues
1
,
2
. Then
0 = y
1
[Ay
2
Ay
1
[y
2
(127)
= (
2

1
)y
1
[y
2
, (128)
so y
1
[y
2
= 0. (Both parts are identical to the nite-dimensional proofs.)
In particular, the statements apply to self-adjoint S-L operators. The issue is then
whether there is a complete set of eigenfunctions.
G. Examples of self-adjoint S-L eigenvalue problems
Many of these arise in physics, often by separating variables in a partial dierential
equation. They lead to many of the sets of orthonormal functions (including polynomials),
each named after some nineteenth century mathematical physicist.
1) There are many examples of regular self-adjoint S-L systems, not all of which involve
distinct functions, or have individual names. A simple one is D = d
2
/dx
2
, = 1, on [0, a],
u(0) = u(a) = 0. The eigenfunctions are u
n
(x) sin(nx/a), eigenvalues
n
= n
2

2
/a
2
.
The functions are orthogonal and can be normalized. These are the functions occurring in
Fourier sine series. Physically, it can arise from the wave equation for a string with xed
ends, or for the stationary states of a particle in a box in quantum mechanics. There are
many others, for example the Bessel equation in a < x < b with a > 0, b < . This
describes the radial behavior of a two-dimensional wave equation in the annulus a < r < b,
with x as the radial variable r.
2) The Legendre equation, which arises from separation of variables on the Laplacian
operator
2
in spherical polar coordinates, and looking at the variable, with x = cos .
The equation is
(1 x
2
)u

2xu

u = 0. (129)
47
This is of S-L form with p(x) = 1 x
2
, q = 0, = 1. On x [1, 1] it is singular at both
ends. Both points x = 1 are regular singular points of the ODE. To analyze behavior near
x = 1, we can therefore approximate the equation as
2(x 1)u

+ 2u

= 0, (130)
or
(x 1)
2
u

+ (x 1)u

= 0, (131)
which is of the Euler form. The indicial equation is
2
= 0, so two solutions to the Legendre
equation are u
1
(x) constant, u
2
(x) ln(1 x) as x 1 (the full solution as Frobenius
series or the same with logs will not be required!). Because both of these are square-integrable
near x = 1 (note
_
1
(ln(1 x)
2
dx is nite), we are in the limit-circle case. The limit x = 1
is the same by symmetry. Hence self-adjoint boundary conditions can be written as
u(x) C(a
1
ln(1 x) + a
2
) as x 1 (132)
(a
1
, a
2
not both zero), and similarly at x = 1 (with independent constants b
1
, b
2
).
This may look unfamiliar. That is because in practice we seem always to encounter the
choice a
1
= b
1
= 0, so the bc is simply that the solutions are bounded as x 1. The
general series solution in powers of x diverges as x 1 (it must agree with the generic
solution, which diverges as ln(1 x)). To obtain a nite solution (obeying the bcs), the
series must terminate, which it only does for eigenvalue = l(l + 1) with l = 0, 1, 2,
. . . . The nite power series solutions are then the familiar Legendre polynomials P
l
(x). The
fact that they are orthogonal is explained because they are eigenfunctions of a self-adjoint
operator.
It is interesting however to realize that these are not the only bcs making the Legendre
operator

D self-adjoint. For any other choice of the above forms, there is a set of orthonormal
eigenfunctions, each of which has a logarithmic divergence as x 1, with coecient related
to the constant behavior by a xed ratio. The Legendre functions of the second kind, Q
l
(x),
have the leading behavior proportional to ln(1 x) as x 1 (but I dont know the
coecient in the subleading term 1). By solving Legendres equation near x = 1 using
ln(x 1) times a series in x 1, plus another series in x 1, we could calculate the relative
coecient between ln(1 x) and 1, and thus see which of the self-adjoint bcs is obeyed by
48
the Q
l
(x). For general self-adjoint bcs at x = 1, the eigenfunctions will be some linear
combinations of P
l
and Q
l
[or analogous Legendre functions but with ,= l(l + 1)].
3) The Bessel equation of order : it arises as the radial equation after separating variables
in the Laplacian in cylindrical polar coordinates,
d
2
R
dr
2
+
1
r
dR
dr
+
_
k
2


2
r
2
_
R = 0 (133)
on r (0, ). Here = k
2
is the eigenvalue; we assume 0. (Set x = kr to obtain
the standard Bessel equation.) It is of S-L form with p = r, q =
2
/r, = r. To examine
r = 0 and r = separately, we rst consider (0, a), a < . Then a is a regular endpoint.
As r 0, we have the Euler form (simply drop k
2
), and R r

, with
2
=
2
, thus =
( > 0), or R constant, R ln r ( = 0).
Of these solutions,
_
a
0
r[R(r)[
2
dr is nite at the lower limit for R(r) r

for all 0.
The integral also converges for R r

if < 1 (including the case = 0, R ln r), but not


for 1. Hence we are in the limit-circle case if 0 < 1, and the limit point case if 1.
Then for < 1, there is once again a one-real-parameter family of possible self-adjoint bcs,
similar to Legendres equation above. A frequently occurring example is the self-adjoint bc
R r

(no r

part) as r 0 (once again, there are also other possibilities). In this case
the eigenfunctions (using also the regular bc at r = a) are the Bessel functions J

(kr), which
r

as r 0 (similar to the modied Bessel functions discussed earlier in class). If for


example the bc at r = a is R(a) = 0, then the eigenfunctions are J

(k
n
r), where the values
k
n
are the discrete set of values of k such that J

(k
n
a) = 0. We obtain an orthonormal set
of functions on (0, a).
In the limit-point cases 1, no bc at r = 0 is needed. Normalizability of the eigen-
functions will force them to be J

(kr).
For Bessels equation we can also discuss the outer region a < r < . r = is an
irregular singular point of the ODE. We could write R e
S(r)
, but it is easy in this present
case to approximate the solution directly. As r , the equation can be approximated by
dropping the powers of 1/r, and then has constant coecients. So the solutions are R
e
ikr
. k
2
, the eigenvalue, should not be assumed to be real, and we see that for Imk
2
,= 0,
there is exactly one solution (up to scalar factors) that has nite = r norm as r thus
illustrating Weyls Theorem. For k
2
real, there may or may not be normalizable solutions.
For k
2
positive, there are no normalizable solutions, but the Bessel functions J

and N

have
49
the expected behavior as r . Anyway, we are in the limit-point case, and no further bc
is required at r .
There are many other examples: Laguerre, Hermite, Chebyshev, Jacobi, Gegenbauer,
. . . (but not Jack Bauer). The classic reference is Whittaker and Watson.
We should note that when the equations arise from separation of variables, the weight
factors that make it a S-L system, and thus with a chance of being self-adjoint, are the
same ones that arise from changing variables from the usual measure dx dy dz on R
3
. This is
natural because we begin from
2
which is self-adjoint on L
2
(R) (and requires no further
bc at innity). For example, dx dy dz = r dr ddz in cylindrical polars, and = r is what
is required for self-adjointness in connection with Bessels equation. It is likely that one
could say similar things about the choice of self-adjoint boundary conditions: they should
be determined by the self-adjointness of (say)
2
which they come from. But certainly it
is still useful to nd the possibilities directly.
H. Complete orthonormal sets of eigenfunctions
In the examples in the previous subsection, except for the case of Bessel on (a, ), there
is an innite set of eigenfunctions u
n
(x) dom

D obeying

Du
n
(x) =
n
u
n
(x), which can
be normalized to obtain an orthonormal set. Let us label the eigenfunctions in order of
increasing eigenvalue, so
n
as n . Moreover, each of these sets is a complete set:
any function in the relevant L
2
(a, b[) space can be expanded as
f(x) =

n=0
a
n
u
n
(x), (134)
where a
n
are some coecients. The series converges to the left-hand side in the usual Hilbert
space sense (convergence in the mean),
[[f
N

n=0
a
n
u
n
[[ 0 (135)
as N . As usual, this means the equality of f with the innite series holds only in
this average sense, and not necessarily at every value of x. In particular, while the partial
sums

N
n=0
a
n
u
n
are continuous and obey the boundary conditions at the the endpoints,
this does not have to be true for f. We expect that if f has a discontinuity in the interior
of (a, b), or does not obey the boundary condition at the ends, then behavior similar to
50
the Gibbs phenomenon in the Fourier series example will occur at those points. Notice
that eigenfunctions with larger eigenvalues will vary more rapidly with x, like the rapidly-
oscillating Fourier functions, and the partial sums over n N cannot reproduce rapidly-
varying behavior of f.
The coecients in the series can as usual be found from orthonormality (think of nding
the components of a vector in an orthonormal basis), which says that
u
n
[u
m
=
_
b
a
dx u
n
(x)u
m
(x)(x) =
nm
, (136)
and the coecients must be
a
n
= u
n
[f =
_
b
a
dx u
n
(x)f(x)(x). (137)
Substituting these in integral form into the series, we nd that for any f,
f(x) =

n=0
u
n
(x)
_
b
a
dx

u
n
(x

)f(x

)(x

), (138)
and hence (taking the summation under the integral and using the fact that f is arbitrary)
we must have

n=0
u
n
(x)u
n
(x

)(x

) = (x x

). (139)
This expresses the completeness of the set of functions, as we saw for Fourier series already.
Because it is equal to a -function (or by the evident symmetry), we could equally well write
(x) in place of (x

). We may note also the analog of Parsevals Theorem,


[[f[[
2
=

n=0
[a
n
[
2
. (140)
Moreover, in the same spirit as the completeness relation, we can write D itself in diagonal
form: as an integral kernel (

Du =
_
b
a
dx

D(x, x

)u(x

)),

D(x, x

) =

n
u
n
(x)u
n
(x

)(x

). (141)
In each of these cases then, the complete orthonormal set gives us an isometric (i.e.
norm-preserving) isomorphism of the space L
2
(a, b[) with
2
, the space of square-summable
sequences.
We can make general statements about these properties for self-adjoint operators. There
are diculties, however, in that there are cases such as id/dx on (, ) and Bessel on
51
(a, ) in which there are no (normalizable) eigenfunctions, and the analog to Fourier series
seems to be something like a Fourier integral transform which involves a continuous param-
eter (which corresponds to an eigenvalue, which are real numbers), and the corresponding
eigenfunctions are not normalizable. (There are also cases that involve a combination
of discrete eigenvalues and true eigenfunctions with the continuous version.) The correct
theory for self-adjoint operators must handle all these possibilities.
First, instead of the overly-restrictive notion of eigenvalues, we will dene the spectrum
of any operator A to be
spec A = C : (A I) does not have an inverse that is a bounded operator. (142)
If is an eigenvalue of A, then it is clearly in the spectrum, but spec A may also include
numbers that are not eigenvalues. Notice that if , spec A, then the inverse (A I)
1
exists and is bounded; this inverse operator, which depends on , is often called the resolvent
of A.
Now we come to the culmination of the entire course (in rather vague language):
Spectral Theorem for self-adjoint operators: The spectrum of a self-adjoint operator
is a subset of the real numbers. There is an analog of a complete set of orthonormal
eigenfunctions. (Proof: omitted. For regular S-L operators, it is not too dicult.)
Applied to operators on a function space L
2
(a, b[), the Spectral Theorem is telling us
that the spectrum of the self-adjoint operator A can be decomposed into two parts, the
discrete and continuous parts, spec A = dspec Acspec A, which contain only real numbers,
and there are functions u

(x) for all spec A, such that any element of the space can be
expressed in the form
f(x) =

dspec A
a

(x) +
_
cspec A
da()u

(x). (143)
(The discrete spectrum contains the true eigenvalues, and the functions u

(x) are the cor-


responding eigenfunctions.) The functions u

(x) satisfy analogs of the orthonormality and


completeness relations, in which the parts for the discrete spectrum are as given earlier, but
those for the continuous spectrum involve Dirac functions instead of Kronecker s, and
the sums are replaced by integrals. The Fourier transform gives a fairly typical example, in
which the operator could be id/dx, the spectrum is R, the functions are e
ikx
, and there is
no discrete spectrum.
52
Another example with a purely continuous spectrum is related to something called the
Fourier-Bessel integral transform. The operator is the Bessel operator on (0, ) with weight
(r) = r, with the self-adjoint boundary conditions at 0 and discussed above. The
eigenfunctions are J

(kr), parametrized by (real) k rather than the eigenvalue k


2
for
convenience, the orthonormality relation is
_

0
dr rJ

(kr)J

(k

r) =
1
k
(k k

), (144)
and the completeness relation is similar,
_

0
dk kJ

(kr)J

(kr

) =
1
r
(r r

). (145)
Notice that in general, just as the integrals over x involve a weight factor (x), for the
continuous spectrum the integrals contain a spectral weight factor (). In the Fourier-
Bessel example, when the integrals are written in terms of k rather than k
2
, this spectral
weight becomes (k) = k. The Fourier-Bessel transform and the similar transform using
spherical Bessel functions can be found in J.D. Jackson, Classical Electrodynamics, 3rd Ed.
They can be obtained from working on (0, a), which gives a discrete spectrum, then taking
the limit a .
One more remark about the self-adjoint S-L systems: the spectrum can possess a con-
tinuous part only when one or more of the endpoints is in the limit-point case. That is, the
limit point-case at one or more endpoints is a necessary, but not sucient condition, for
the continuous spectrum to be non-empty. If each endpoint is in the regular or limit-circle
cases, then the spectrum is purely discrete.
In the above discussion of the Spectral Theorem, I have oversimplied on one point: it is
possible that some eigenvalues are degenerate, so there is a multiplicity for each one. This
degeneracy could even be innite, and would have to be accounted for in the sums/integrals.
In usual S-L problems, multiplicities either do not occur or are simple. For example, for
d
2
/dx
2
on (, ), the eigenfunctions are e
ikx
and the eigenvalues are k
2
0 (k is
real). For k
2
> 0, there are two functions for each k
2
.
The Spectral Theorem is von Neumanns rigorous version of Diracs ideas. In quantum
mechanics, observables are represented by self-adjoint operators on a Hilbert space. The
Spectral Theorem guarantees that they can be diagonalized using the eigenfunctions
u

(x). Moreover, a set of n mutually-commuting self-adjoint operators can be simultaneously


53
diagonalized in a similar fashion, with the eigenvalue label replaced by the set of eigenvalues
for each operator in the commuting set. With enough commuting operators (a so-called
complete set), there is a single function u

1
,...,
n
for each possible list
1
, . . . ,
n
, so the
multiplicities are removed. For example, this occurs when separating Schrodingers equation
for a spinless particle in three dimensions: three commuting operators are enough, and states
can be fully labelled by three parameters, each of which is an eigenvalue in a S-L problem
in one variable.
This is one reason why we insist that observables in QM be self-adjoint operators. Another
reason why they are nice to deal with is that the idea that U = e
iA
should be unitary
does hold when A is self-adjoint. For example, when the Hamiltonian is time-independent
and self-adjoint, the time evolution of the system from time t = 0 to t is given by the
unitary operator U = e
iHt
, and the fact that it is unitary means that it conserves the total
probability [[[[
2
= 1 during time evolution of any state .
I. Greens function for a self-adjoint S-L boundary value problem, I
For a regular self-adjoint S-L problem, on a nite interval [a, b], we want to solve
Dy = f (146)
where D is in the standard form, eq. (115), (D =

D), and y should obey the b.c.s of the


form (118), (119) (thus y lies in dom

D). We seek the solution in the form


y(x) =
_
b
a
dx G(x, x

)f(x

). (147)
That is,
D
x
G(x, x

) = (x x

), (148)
where D
x
means D acting on the x variable, not on x

. In order that the solution be unique,


we require that = 0 not be in the spectrum of

D. Note that we could make minor variations
in the denitions, such as solving

Dy = f, or inserting a factor (x

) in
_
b
a
dx G(x, x

)f(x

).
These can be handled by multiplying or dividing by (x) or (x

) in appropriate places, and


you may have to recognize dierent forms of the same problem.
The technique for constructing the Greens function in such a problem is standard and
very important, so should be learned carefully. First, we nd two non-zero solutions to
54
Dy = 0, say u, v, such that u obeys the b.c. at x = a, and v obeys the b.c. at x = b (because
0 is not in the spectrum, there can be no non-zero solution that obeys the b.c.s at both
ends). Then G is given by
G(x, x

) =
_

_
1
A
u(x)v(x

) for x x

,
1
A
u(x

)v(x) for x

x,
(149)
where A is a constant to be determined (note that u and v are undetermined up to a
multiplicative factor also; these will be related). While G is continuous as a function of x
at x = x

(for any xed x

), dG/dx will not be continuous.


For x ,= x

, it is clear that D
x
G(x, x

) = 0 as required. It is also clear that for xed


x

(a, b), G obeys the boundary conditions for both x = a and x = b. We only have to
verify that we can get the -function at x = x

correctly. Now eq. (148) is equivalent to


lim
0
_
p(x)
d
dx
G(x, x

)
_
x=x

+
x=x

= 1, (150)
as we can see by integrating the equation across the location of the -function (check this!).
On the left hand side of this, p(x) is continuous, so we calculate
_
d
dx
G(x, x

)
_
x=x

+
x=x

=
1
A
(v

u vu

)[
x=x
, (151)
which is just the Wronskian of u and v! We showed earlier that (in current notation), for
any two solutions of Dy y = 0,
W[u, v](x) e

_
x
(p

/p)dx
=
1
p(x)
. (152)
We note that our u and v cannot be linearly dependent (if they were, they would both be
eigenfunctions with = 0, obeying both b.c.s), so when the Wronskian is multiplied by
p(x) it becomes a non-zero constant. So nally we only have to choose
A = p(v

u vu

), (153)
to make the discontinuity equal 1 (and this A ,= 0), and we have nished. (In practice, it
may be best to calculate the discontinuity directly, rather than memorizing this formula,
so as to avoid errors.) We have uniquely determined the required Greens function. Notice
that G(x, x

) = G(x

, x)G is a symmetric matrix.


55
These expressions for Greens functions are used e.g. for the radial Greens function in
QM and electrostatics (see e.g. Jackson, Electrodynamics). More general is the resolvent,
the operator (

I)
1
which exists and is bounded whenever

is not in the spectrum.


Its integral kernel G(x, x

) obeys the dierential equation


D
x
G(x, x

G(x, x

) = (x x

). (154)
This can be solved in a very similar way as the special case

= 0 above, using the same


b.c.s as before.
The technique can be pushed further to solve singular S-L problems, though I am not
sure if it works for all those with 0 not in the spectrum.
J. Greens function for a self-adjoint S-L boundary value problem, II
Another approach to nding the Greens function is by using the eigenfunction expansion,
or a generalization thereof for the case of continuous spectrum. The idea is similar to nite
matrices: if we can diagonalize the matrix, and none of the entries in the diagonal version
(the eigenvalues) is zero, then we can write the inverse matrix in similar diagonal form.
To begin, we will assume the spectrum is discrete. Suppose then that we have eigenfunc-
tions
Du
n
=
n
u
n
(155)
and the u
n
form a complete orthonormal set (i.e. a basis). The denition of G is
DD
1
= I, (156)
or
D
x
G(x, x

) = (x x

) =

n
u
n
(x)u
n
(x

)(x

) (157)
By completeness, we can expand G in eigenfunctions in x, for xed x

:
G(x, x

) =

n
a
n
(x

)u
n
(x), (158)
which implies
D
x
G(x, x

) =

n
a
n
(x

)
n
(x)u
n
(x). (159)
56
Then by orthonormality of u
n
(x), we can nd a
n
(x

),

n
a
n
(x

) = u
n
(x

), (160)
and so
G(x, x

) =

n
u
n
(x)u
n
(x

n
. (161)
Notice the similarity of this expression to the diagonal form of D, eq. (141), and to the
completeness identity.
For the kernel of the resolvent, we have similarly
G(x, x

) =

n
u
n
(x)u
n
(x

. (162)
In these expressions we see clearly that 0 (or

) should not equal any eigenvalue


n
, otherwise
the result fails to make sense. (However, in such a case one can sometimes make progress
by inverting D in the subspace orthogonal to the eigenvector in question.)
In all cases, if the spectrum contains a continuous part, then an integral over that part will
also appear; formally, the derivation is similar. The integral over the continuous spectrum
may make sense even when

tends to a value in that spectrum, however the resolvent


cannot be a bounded operator in that limit.
Remarks on the two techniques: The eigenfunction expansion approach will be useful
in calculating G explicitly if we can solve for the eigenfunctions, and sum the series for G.
This might be hard work. Likewise, the direct approach will be useful if we can explicitly
nd the solutions u and v, as we can in the case of the many named equations. The
fact that there are two expressions for the same Greens function can be useful, e.g. in
nding identities, or sums of series of functions. In fact, in theoretical work, if the resolvent,
or sucient properties of it, can be found by the rst technique, then properties of the
eigenfunction expansion can be obtained from it (see Stakgold). This might be useful e.g.
for nding the weight factor (k) that enters the integral over the continuous spectrum.
This approach involves complex-variable methods, and is anyway beyond the scope of the
course.
IV. COMPLEX ANALYSIS
For reference, see Stone and Goldbart; or S. Lang, Complex Analysis; or Byron and
Fuller; or Arfken.
57
A. Holomorphic functions
We recall that a complex number z can be written as z = x + iy where x and y are
real numbers, and i is a symbol obeying i
2
= 1. Rules for complex arithmetic (with
associative and commutative addition and multiplication) follow. We write C for the set of
all complex numbers (not including innity). We have the operation of complex conjugation
that sends z to z = x iy, and the absolute value [z[ dened by [z[
2
= x
2
+ y
2
. We call
x = Re z = (z + z)/2 and y = Imz = (z z)/(2i) the real and imaginary parts of z.
Denitions for convergence of a sequence to a limit, and of continuity for a function, are
exactly as for real numbers.
It will be useful to dene the open disk centered at z
0
with radius R > 0 to be
T(z
0
, R) = z C : [z z
0
[ < R, (163)
and the closed disk
T
cl
(z
0
, R) = z C : [z z
0
[ R. (164)
A closed set W is any set C with the property that, for any convergent sequence of
complex numbers that all lie in W, the limit also lies in W (note the denition of convergent
uses the absolute value on the complex numbers). An open set U is the complement of a
closed set W in C, U = CW. Any open set can be represented as the set-theoretic union
of a (possibly innite) collection of open disks.
Denition: A function f is (complex) dierentiable at z
0
if the limit
lim
h0
f(z
0
+ h) f(z
0
)
h
(165)
exists, where h can be any complex number tending to zero. If the limit exists, we call it
f

(z
0
) or df/dz. This means the limit exists and takes the same value no matter how z
approaches z
0
along a straight line from any direction, along a spiral, whatever.
Denition: If f is dierentiable for all z in an open neighborhood of z
0
(e.g. in an open disk
centered at z
0
), then we say f is holomorphic at z
0
. (This implies that it is also holomorphic
at some nearby points.) If f is dierentiable at z in a domain (a connected open set) U,
then we say it is holomorphic in U.
The derivative of a holomorphic function enjoys the same properties as in real analysis:
the formulas for the derivative of a sum, product or composite function (chain rule) are the
same.
58
From the denition of dierentiable, it follows easily that, if f(z) = u(x, y) + iv(x, y),
where u, v are real, is dierentiable at z
0
, then
u
x
=
v
y
,
u
y
=
v
x
, (166)
where the partial derivatives are taken at z
0
, holding xed x or y when we dierentiate
with respect to the other (the Cauchy-Riemann equations). Thus these are necessary for
dierentiability of f at z
0
. Sucient conditions for f to be (complex) dierentiable are (i)
all four partial derivatives exist in a neighborhood of z
0
, and are continuous at z
0
, and (ii)
the C-R equations hold at z
0
.
Alternatively, we can dene

z
=
1
2
_

x
i

y
_
, (167)

z
=
1
2
_

x
+ i

y
_
. (168)
The C-R equations are equivalent to the single complex equation
f
z
= 0. (169)
If this holds in some open set, so f is holomorphic there, we can think of it as saying that
the function f(z, z) is in fact independent of z. If f is dierentiable at z
0
, f/z = df/dz
at z
0
.
Examples: (i) z
n
is holomorphic for all n = 0, 1, 2, . . . . Its derivative is nz
n1
. z is not
dierentiable anywhere (it does not obey C-R). (ii) e
z
is holomorphic, so are cos z, sin z.
(iii) z

is holomorphic at any z ,= 0, for arbitrary complex . However, it does not dene a


single-valued function over all of C0 if is not an integer (more on this later).
B. Analytic functions
Convergence (resp., absolute, uniform convergence) of a sequence of complex numbers
(resp., functions) is dened just as for functions of a real variable. If

n=0
a
n
(z z
0
)
n
(170)
is absolutely convergent in the open disk T(z
0
, R) (note R > 0), then it denes (in that
disk) a function f(z), say. Note that the discussion of radius of convergence for power series
is just as for real variables, but now convergence is in a disk rather than an interval.
59
Conversely, if f(z) can be represented by a convergent power series in some open disk
centered at z
0
, that is, if it is equal to such a series in the disk, then we say it is analytic at z
0
(exactly as for real variables). (This terminology diers from some authors at this stage; for
them, analytic means holomorphic. However, we will eventually nd that these properties
are equivalent for complex functions.) It follows that it is analytic at some points close to
z
0
, to be precise, inside the same disk. Further, if U is an open set, we say that f is analytic
in U if it is analytic at every point of U. These denitions are exactly as for real variables.
If f is a function (possibly, a complex-valued function) of real x that is analytic at x
0
R,
so that f(x) =

n=0
a
n
(xx
0
)
n
, then it can be extended to an analytic function of complex
z simply by writing
f(z) =

n=0
a
n
(z x
0
)
n
. (171)
This series will converge with the same radius of convergence as that for x real. Notice that
it agrees with f(x) for x real; that is why it is called an extension.
Given an analytic function f, we can dierentiate its power series term by term with
respect to z, and this series for f

converges in the same disk. Hence f is holomorphic


within the disk. That is f analytic implies f holomorphic in the same disks. We will obtain
the converse statement shortly. By taking more derivatives, we nd that
a
n
=
1
n!
d
n
f
dz
n
(z
0
), (172)
which is Taylors Theorem.
C. Contour integration
We can integrate a function along a curve (or contour), just as for line integrals in real
vector calculus. If C is a curve, we take points on the curve z
i
(i = 1, . . . , N) in order, and
write
_
C
f(z) dz = lim
N
N

i=1
f(
z
i+1
+z
i
2
)z
i
(173)
=
_
C
(u + iv)(dx + idy) (174)
=
_
C
(u dx v dy) + i
_
C
(u dy + v dx), (175)
60
where z
i
= z
i+1
z
i
. This denition can be used even if f is not holomorphic, provided
that the line integrals of u, v shown exist as line integrals. For a closed curve C (one that
returns to its starting point), we also write
_
C
f dz.
Theorem (Cauchy-Goursat): If f is holomorphic within and on a closed contour C, then
_
C
f(z)dz = 0. (176)
Proof: The full proof takes some time, and I skip it. Here is Cauchys proof, which uses
additional assumptions (which we would prefer to avoid).
Use Greens/Stokes Theorem:
_
C
(P dx + Qdy) =
_
R
_
Q
x

P
y
_
dxdy (177)
(where R is the interior of C), provided the total derivatives of P, Q exist and are con-
tinuous inside and on C. Applying this with P = u +iv and Q = i(u +iv) (here we see the
need for the extra assumption that f

is continuous), we nd, by using the C-R equations,


that the right-hand side vanishes.
From this it follows that if two paths C
1
, C
2
have the same beginning and end points,
and f is holomorphic in the region between them, then
_
C
1
f dz =
_
C
2
f dz, exactly as for the
line integral dening the (e.g. electrostatic) potential for a eld of force under the condition
that the force is conservative, in classical mechanics.
Generally, for a contour integral of holomorphic function, the integral is unchanged if
the contour is deformed (leaving its endpoints, if any, xed) provided that, as it is being
deformed, the contour does not cross any region where the function is not holomorphic. This
is a very important principle.
We also obtain the Fundamental Theorem of Calculus for contour integration: for f
holomorphic,
F(z) =
_
z
z
0
f(z

) dz

(178)
denes a function of z (and of z
0
) because it is independent of the path taken from z
0
to z
(subject to conditions as above). Moreover, F is holomorphic, and
F

(z) = f(z). (179)


As an example of a non-holomorphic function, consider f(z) = 1/z, dened for z ,= 0. If
we integrate counterclockwise around a circle C of radius R, we nd (using z = Re
i
)
_
C
dz
z
=
_
2
0
i d (180)
61
= 2i (181)
,= 0. (182)
D. Cauchys formula and applications
If f is holomorphic within and on the closed non-self-intersecting contour C which encloses
z, then
_
C
f(z

)
z

z
dz

= 2if(z). (183)
Proof: f(z

)/(z

z) is holomorphic for z

,= z, so shrink the contour down to a circle


of radius centered at z. f(z

) is holomorphic, so is continuous at z

= z, so as 0
we replace it in this integral by f(z) and take the constant outside. Using the preceding
example, we then obtain the result. (We can be more careful in showing that we can take
f(z) outside if you wish.) This is Cauchys Formula.
Now take C to be a circle of radius R centered at z
0
, and expand in the last integral:
1
z

z
=
1
(z

z
0
) (z z
0
)
=
1
z

z
0
+
z z
0
(z

z
0
)
2
+
(z z
0
)
2
(z

z
0
)
3
+ . . . . (184)
This series converges absolutely and uniformly in the open disk [z z
0
[ < [z

z
0
[ = R for z

on the contour. Hence we may integrate term by term (i.e. interchange sum and integral),
and we obtain a series for f(z), which converges in the same disk:
f(z) =

n=0
a
n
(z z
0
)
n
, (185)
where the coecients are
a
n
=
1
n!
d
n
f
dz
n
(z
0
) =
1
2i
_
C
f(z

)
(z

z
0
)
n+1
dz

, (186)
which can be viewed as a generalization of Cauchys formula that now gives f
(n)
(z
0
). Because
this exhibits f as a convergent power series, this gives us the very important converse to the
earlier statement, and so:
f is holomorphic if and only if f is analytic, (187)
both statements holding in the same open set. It follows that all derivatives of f are
holomorphic, which we did not know before. Notice how remarkable this result is: for
62
real variables, a function that is once-dierentiable everywhere in an interval need not be
dierentiable anywhere in that interval even once more! And even if it can be dierentiated
arbitrarily many times, always giving a continuous derivative, it need not be analytic (recall
the example of e
1/x
2
).
We can also obtain another consequence from the power series for f: we can make the
radius R of the circle larger, and the series still converges as long as f is holomorphic within
and on the circle. But clearly, the coecients a
n
do not depend on this radius (because
we can deform the contour of integration, in particular shrink it). Then the true radius
of convergence of the series, that is the smallest value of R such that it does not converge
absolutely for any [zz
0
[ > R, is equal to the radius of the largest circle such that f remains
holomorphic within and on the circle. That is, the radius of convergence of the power series
for f about z
0
is equal to the distance from z
0
to the closest point at which f(z) is not
holomorphic. (This is not the case in real variables, because the singularity of an analytic
function might be o the real axis.)
On a more mundane level, let me note that if the curve C intersects itself, then there can
be a point enclosed by it about which one winds twice as one goes along the curve C. Then
in the Cauchy formula, we get twice the answer. In general, we have to count the number
of windings.
Further consequences of the result: By using a circle of radius R, the formula for the
coecients a
n
gives us the bound
[a
n
[ =

1
2i
_
C
R
f(z

)
(z

z
0
)
n+1
dz

(188)

1
2
_
C
R
[f(z

)[
[z

z
0
[
n+1
dz


max
|z

z
0
|=R
[f(z

)[
R
n
, (189)
where C
R
is the circle of radius R centered at z
0
, and max [f(z

)[ is found for z

on the circle.
This can be rewritten as
[f
(n)
(z
0
)[ n!
max
|z

z
0
|=R
[f(z

)[
R
n
. (190)
The max [f(z

)[ is a constant (independent of n), but of course dependent on R.


From this bound on a
n
we can (i) obtain again that the radius of convergence is R;
(ii) let us suppose f(z) is holomorphic for all z C (we say then that f is entire), and that
f is itself bounded, [f[ B (B > 0 is a constant) for all z. Then in the series for f about
63
any z
0
, max
|z

z
0
|=R
[f(z

)[ B for all R, so
[a
n
[ B/R
n
, (191)
and then taking R large we have a
n
= 0 for all n 1. That is, f(z) = a
0
, a constant. This
is Liouvilles Theorem: any bounded entire function is constant.
Liouvilles Theorem has many applications, and is often used in arguments in physics.
Just as an example, we can use it to prove part of the Fundamental Theorem of Algebra,
which states that any polynomial of degree n > 0 in one variable z has n complex roots,
that is it can be factored: there exist n complex numbers z
i
(not necessarily all distinct),
such that for a
n
,= 0,
a
0
+ a
1
z + . . . a
n
z
n
= a
n
n

i=1
(z z
i
), (192)
so the polynomial vanishes if z = z
i
for any i (and not otherwise). We will show that some
such z
i
exist. Proof: Let f(z) be the polynomial shown, with a
n
,= 0. If f(z) ,= 0 for all
complex z, then g(z) = 1/f(z) is dened for all z, and is holomorphic for all z C. [f(z)[
is asymptotic to [a
n
z
n
[ for [z[ large, so g(z) is bounded at large zit eventually goes to zero
as [z[ . Thus g(z) is bounded for all z. By Liouvilles Theorem, it must be constant,
so f is also constant. This contradiction implies that there is some z, say z = z
1
, for which
f(z
1
) = 0, which is what we wanted to prove.
Just for fun, lets complete the proof of the Fundamental Theorem, though these details
are not important for us. We need to show that we can factor z z
1
out of f(z), and
that the result is a polynomial of degree n 1. If we consider f
1
(z) = f(z)/(z z
1
),
then f
1
is holomorphic for z ,= z
1
, and its value at z tends to z
1
can be found also (by
lH opitals rule): it is f

(z
1
), independent of the direction of approach to z
1
. Hence we
can dene f
1
(z) to be continuous at z
1
, as well as elsewhere. We can show that f
1
(z) is
dierentiable at z = z
1
(directly from the original denition of dierentiability), similarly
to what we just did for continuity (try itit may help to Taylor expand about z
1
, f(z) =
a

1
(z z
1
) + a

2
(z z
1
)
2
+ . . . + a

n
(z z
1
)
n
). Hence f
1
is holomorphic at all z. For large z,
it is asymptotic to a
n
z
n1
, and using the bound for the coecients a

m
in its power series
about z = 0 (or elsewhere), we see that a

m
= 0 for m > n 1. Thus f
1
(z) is a polynomial
of degree n1. Then repeat the process until we reach degree zero (proof by induction), at
which point the proof is complete.
64
E. Laurent series and singularities
Suppose a function f is holomorphic throughout an annulus, that is the region between
two circles C
1
, C
2
that are both centered at z
0
. Theorem: there is a series

n=
a
n
(z z
0
)
n
(193)
that converges absolutely and uniformly to f in any closed region in the annulus, and the
coecients are
a
n
=
1
2i
_
C
f(z

)
(z

z
0
)
n+1
dz

(194)
for all n, where C is a circle centered at z
0
and lying between C
1
and C
2
.
I like to think of this as a Fourier series: Suppose C has radius R, and let z = Re
i
. Then
the result expresses f(Re
i
) as a Fourier series, and R
n
a
n
are the usual Fourier coecients.
Note that the change of variable from to z (R xed) can even be considered for complex
(but with small imaginary part), and is holomorphic. Then f(Re
i
) is holomorphic (analytic)
in in this strip near the real axis, and also periodic (with period 2). Conversely, a
periodic function f of real that is also analytic at all real can be extended to a periodic
analytic function in the strip enclosing the real axis, and its Fourier series converges
uniformly and absolutely in this strip to f (and not just in the mean-square as we discussed
in general). This illustrates the deep connections between Fourier analysis and complex
analysis, especially analyticity properties. We will see more of this later.
The function f for which we found a Laurent series does not have to be holomorphic
inside the inner circle C
1
: it could fail to be holomorphic there. In general, a point (z
value) for which a function fails to be holomorphic is called a singularity or singular point.
If a function is not holomorphic at z
0
but is holomorphic everywhere in a neighborhood of
z
0
except at z
0
(e.g. everywhere in a punctured disk T(z
0
, R) z
0
for some R), then
z
0
is said to be an isolated singularity. At a true singularity (not a fake or removable
singularity), there exists some path z z
0
on which f(z) (but not necessarily for all
paths tending to z
0
e.g. consider e
1/z
as z 0 for z real positive or negative).
If the Laurent series of a function f that has an isolated singularity at z
0
has all coecients
a
n
= 0 for n < m (m > 0 is an integer), then z
0
is called a pole of order m. It means that
its Laurent series begins as
f(z) =
a
m
(z z
0
)
m
+
a
1m
(z z
0
)
m1
+ . . . +
a
1
z z
0
+ a
0
+ a
1
(z z
0
) + . . . . (195)
65
Thus asymptotically, f(z) a
m
(z z
0
)
m
as z z
0
. A pole of order 1 is called a simple
pole. As we approach a pole, we see that f(z) for any direction of approach. An
isolated singularity that is not a pole is called an isolated essential singularity. Example:
e
1/z
is not analytic (holomorphic) at z = 0. Its Laurent series is
e
1/z
= 1 +
1
z
+
1
2z
2
+ . . . , (196)
which converges if we stay away from z = 0. z = 0 is an essential singularity, not a pole.
A function f that has (isolated) poles as its only singularities in a region U is said to be
meromorphic in U.
Singularities do not have to be isolated. For example, if a function has an innite sequence
of poles at positions z
1
, z
2
, . . . that tend to a limit z
0
, then f is singular at z
0
, and this
singularity is not isolated, as there are poles arbitrarily close to z
0
(by the denition of a
limit of a sequence). Consequently, the singularity at z
0
cannot be a pole. It also is referred
to as an essential singularity (but not isolated). Example: 1/ sin(1/z) has simple poles at
z
n
= 1/(n) for all n Z. They tend to 0, so z = 0 is an essential singularity. This function
still has a Laurent series in an annulus
1
(n + 1)
< [z[ <
1
n
(197)
for each n > 0. More surprising perhaps, a function such as sin(1/z) that has zeroes on
an innite sequence that tends to a limit also has an essential singularity at the limit point
(z = 0 in this case), and this singularity is isolated; the function cannot be holomorphic
at the limit point, because if it were, then the behavior of its Taylor series near that point
would show that it cannot have zeroes arbitrarily close, as this was assumed to have. In this
example, you can nd the Laurent series yourself.
Other types of singularity that an otherwise holomorphic function might have are a
branch point, or a branch cut. These are really conceptually dierent from the singularities
discussed above, and we discuss all this in the next two subsections.
F. Complex logarithm function, branch points, and branch cuts
Consider the integral
_
1z
dz

. (198)
66
1/z is holomorphic except at z = 0, so as long as the path does not pass through 0, this
denes a function F(z) that is holomorphic at any z, and independent of the path 1 z
taken, except that it does depend on whether the path goes around the origin or not. That
is, if we compare the function for the same z, for two paths P
1
, P
2
from 1 z that wind
dierently around the origin, then the values dier by 2i times the number of windings of
the combined path P
1
P
2
(i.e. the closed path 1 z along P
1
, then z 1 along P
2
in
reverse) around the origin. We saw above that if we go round once counterclockwise, then
the integral is 2i, and this generalizes.
So this complex logarithm function is not a genuine function, as it takes more than one
value for each z (it is multivalued), and as z smoothly changes and passes once around
the origin, it changes by 2i. It nonetheless obeys
F(z
1
z
2
) = F(z
1
) + F(z
2
) (199)
if we are careful about which values of F(z) are meant for each z. This can be proved by
splitting the integral for the left-hand side into two, and rescaling one part of z
1
to get the
right hand side. One must be careful about what happens to the path when we change
variable. F(z) is the (multivalued) inverse to the function z e
z
, which is many-to-one
(not one-to-one) because e
z+2i
= e
z
, and so has non-unique inverse values.
The dierent holomorphic functions F(z) for z in the neighborhood of a point z
0
,= 0
are sometimes called the branches (sheets are a related idea) of the function. z = 0
and z = are the branch points. We can obtain a genuine (single-valued) function, that is
holomorphic on the largest possible set of z, by choosing a branch at each z that keeps the
function continuous as much as possible. This is usually done by choosing a branch cut, a
path connecting the branch points, on which the function (or perhaps only its derivatives of
some order) is discontinuous, and jumps to a dierent branch (like going too far up a spiral
multistory car-park, and falling o the top level onto the one below). The function then
dened in this way is discontinuous on the cut, but may have no other singular behavior
visible away from the cutthere is nothing to warn of the impending fall to another branch.
For the logarithm function, it is most usual to choose the principal branch, which is cut
on the negative real axis, and then
ln z = ln [z[ + i tan
1
(y/x) (200)
67
for z not on the cut. The function tan
1
(or arctan) is itself multivalued, and requires a
choice of branch. It is used to nd arg z (i.e. arg z = in z = re
i
); we can specify
< arg z (201)
for the principal branch.
Other important examples of functions that require a cut to make them genuine functions
are z

for not an integer. We can dene


z

= e
ln z
, (202)
and then the question of a cut and a branch reduces to that for ln z. Again, a common
choice is the principal branch, cut on the negative axis, and agreeing with the usual
choice on the real axis. For example,

z = z
1/2
can be dened to equal

x for z = x > 0,
which by convention means the positive square root (even in real variables, choice of branch
enters!). On the principal branch, this then extends continuously, in fact holomorphically,
to z complex, until as arg z increases to , a cut will be necessary on the negative real
axis. If we continue

z across the negative real axis using the multivalued function, then
when we return to the positive real axis we obtain

x, not

x. Making one more circuit
about the origin, we return to

x.
There are many more examples. For example,

z
2
1 has branch points at z = 1. We
can choose the cut to run from 1 to +1 along the shortest path, or from to 1 along
the negative real axis, and from 1 to + along the positive real axis. In general, the cut
can be chosen to suit your purposes in a calculation, unless you are given a function that is
precisely dened already, using a specied cut.
So far, all the examples contained just two branch points. It is sometimes said that
branch points come in pairs. This is false; it is possible to have a function with an odd
number of branch points, all of which are true non-trivial branch points.
G. Analytic continuation
Suppose we have a function f, holomorphic at z
0
, for which the radius of convergence of
its Taylor series when expanded in powers of z z
0
is R
0
(nite). (Perhaps in fact we know
f only in this disk, initially.) Then it is analytic in D
0
= T(z
0
, R). Let us choose a point z
1
68
in D
1
, maybe near the boundary. Then f can be Taylor expanded about z
1
, with non-zero
radius of convergence R
1
. Perhaps surprisingly, this second disk D
1
= T(z
1
, R
1
) may extend
out side the rst! That is because the radius R
0
of the rst is determined by the distance to
the closest singularity, and depending on our choice for z
1
, that singularity and any others
may be further from z
1
than the edge of D
0
is. Moreover, we can then repeat the process,
and we obtain a sequence of disks D
2
, D
3
, . . . which may extend far outside the original D
0
,
and we can view f as now dened on the whole region covered by the disks. This process
is called analytic continuation. As the Taylor series at z
1
is uniquely determined by the
function f(z) or its series around z
0
, the extension to D
1
is uniquely determined, and the
same is true for each subsequent disk.
Now the disks can never contain any singularity of the function f that we have now
dened, but there will be at least one singularity on the boundary of each disk. If we go
around a singularity, or a region that contains some singular points, something interesting
may happen: when we return to a region on which f was already dened, we may or may
not obtain the same value at each z in this overlap. That is, what we called the function
f may be multivalued. For example, we can approach the logarithm function this way, by
starting from its series expansion about z = 1
ln z = (z 1)
1
2
(z 1)
2
+ . . . (203)
which has radius of convergence 1, and analytically continuing. By uniqueness, the logarithm
dened this way is exactly the same as the one we obtained before, and is multivalued. Thus
we have another view on functions with a branch point/cut: given the function with the
cut, we can recover a multivalued function by analytically continuing across the branch
cut, and going around the branch point.
Analytic continuation can be very useful for dening functions over a large region of the
complex plane, when they are initially dened only on a smaller region. For example, we
can consider the logarithm as just discussed (analytic for z = x > 0), and other interesting
examples include the Gamma function (or the Riemann zeta function) which can be dened
by an integral (resp., series) that is convergent only over a region Re z > 0 (resp., Re z >
1). However, the technique of constructing series in each of a series of disks may be too
cumbersome in practice. Instead, it is helpful to nd an alternative form such as a denition
of the function as an indenite (as for the complex logarithm example) or denite (as for
69
Gamma or zeta functions) integral expression that agrees with the original denition in its
domain of validity, but which also makes sense (and is analytic) on a larger domain (as in
the logarithm example). By uniqueness of analytic continuation, the result on the larger
domain must agree with the overlapping series construction.
Sometimes analytic continuation is not possible. Because it is dened using a sequence
of disks with non-zero radius, in order to continue past some singular points there has to be
a non-zero gap between them through which we can squeeze a sequence of disks. If there
is a region of singularities, then we cannot go through it (but maybe can go around it).
Also if there is a dense set of singularities on a curve, then we cannot analytically continue
through it (recall that dense means that at any point on the curve, there is a singularity
arbitrarily close to that point). If a region is surrounded by such a natural boundary,
then continuation of the function from inside to outside of that region (or from outside to
inside) is impossible. There are examples that can be analyzed by elementary methods (see
the problem set). In general, we might want to analytically continue a function as far as
possible, until we have identied all singularities and branch points, found the function on
all branches, or reached any natural boundaries.
70

También podría gustarte