Operator Theory

Basic Operator Theory
Hassan A. Kingravi
1
July 23, 2011
1
College of Computing, Georgia Institute of Technology.
Abstract
These notes are divided into two parts. The rst part derives from a graduate course in basic
operator theory, given at Georgia Tech by Andrzej Swiech. The second part deals with some
important applications. The notes for this part are mostly culled from books and papers.
Currently, only applications to PDEs are considered, but I will add a chapter on abstract
harmonic analysis some time in the future.
Unlike other courses in this area, no knowledge of functional analysis is assumed. Most
of the results focus on Hilbert spaces.
These notes have also been extensively proofread by Ali Ahmed
1
, so if you nd any errors,
they are completely his fault, and you should have absolutely no hesitation in contacting
him.
1
Electrical and Computer Engineering, Georgia Institute of Technology.
Contents
1 Preliminaries 5
1.1 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Normed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Banach Spaces 10
2.1 The Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 The Riesz Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 The Denition of a Banach Space . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1
p
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 The Space of Bounded Continuous Functions . . . . . . . . . . . . . . 12
2.3.3 Absolutely Convergent Series . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 The Closed Graph Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Completeness and Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . 18
2.6.1 Isometric Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.2 The Banach Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . 19
2.7 The Hahn-Banach Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Hilbert Spaces 23
3.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Duality in Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Orthogonality and Orthonormal Systems . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 Bessels Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Operators 34
4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Banach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Bilinear Functionals and Quadratic Forms . . . . . . . . . . . . . . . . . . . 37
4.4 Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Contents 3
4.4.1 The Lax-Milgram Theorem . . . . . . . . . . . . . . . . . . . . . . . 42
5 Adjoint Operators 43
5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.1 Invertible Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.3 Isometric Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.4 Unitary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.5 Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.6 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Compact Operators 55
6.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7 Spectral Theory 60
7.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3 Special Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4 Compact Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.4.1 The Fredholm Alternative . . . . . . . . . . . . . . . . . . . . . . . . 68
8 Applications: Partial Dierential Equations 70
8.1 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1.1 Weak Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1.2 Denition and basic properties . . . . . . . . . . . . . . . . . . . . . . 72
8.1.3 More advanced properties . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Sobolev Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.2.1 Gagliardo-Nirenberg-Sobolev inequality . . . . . . . . . . . . . . . . . 75
8.2.2 Nash Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2.3 Poincare Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.3 Weak Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.3.1 Elliptic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.3.2 The notion of a weak solution . . . . . . . . . . . . . . . . . . . . . . 79
8.3.3 An existence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.3.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9 Reproducing Kernel Hilbert Spaces 82
9.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.1 Complexication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.2 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.3 Characterization of Reproducing Kernels . . . . . . . . . . . . . . . . 85
9.3.4 Relating the C-RKHS with the R-RKHS of a Real-Valued Kernel . . 86
4 Contents
9.4 The Gaussian RBF Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.4.1 The Space 1
,C
d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.4.2 The Complex ONB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.4.3 The Real ONB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A Some Useful Notions 93
A.1 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.2.1 L
p
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.2.2 Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B Notation 95
Chapter 1
Preliminaries
The main objects studied in these notes are linear maps from one space to another. Operators
can be thought of as generalizations of matrices to innite-dimensional spaces (with some
qualications, which well go into later). Nonlinear operators exist of course, but we wont
discuss them. We will mainly deal with operators on Banach and Hilbert spaces. Some of
the topics covered include
Banach algebras
Bounded operators on Banach and Hilbert spaces
Spectral theory
After we cover the basics, we will examine applications of the functional analysis and
operator theory weve covered to other areas of analysis. The current version of these notes
only applies these ideas to the theory of partial dierential equations and reproducing kernel
Hilbert spaces, but later versions will also add quantum mechanics and abstract harmonic
analysis.
It is assumed the reader knows linear algebra and real analysis on a graduate level. The
level of abstraction used here is not very high. We do not approach most of the results
presented using general topological vector spaces, sticking instead to the more concrete
notion of a normed space. We also do not delve into the more abstract notion of an operator
algebra for its own sake; this includes areas like von Neumann or C
algebras, which are

heavily used in representation theory. Ideas vital to these concepts are present in the current
notes, but our focus is the simplest and most concrete overview of operators possible.
Since these notes originated from a course, there isnt much in the way of dicussion. We
present theorems, and then prove them. The reader is encouraged to draw his or her own
conclusions from the material presented.
We will now go over most of the denitions and concepts needed to get started.
1.1 Topological Spaces
A topology T is a collection of subsets of a set S s.t.
6 Preliminaries
1. , S T
2. A, B, T A B T
3. A union of any collection of sets in T belongs to T .
These sets are primitives, and thus are known as open sets in T , which in turn is called
a topology on S. The complement of an open set is called closed.
E is the closure of E, or in other words, the intersection of all sets containing E. The
interior of E is denoted by E
o
, and is the union of all open sets in R. Obviously the former
is closed, while the latter is open.
The neighborhood of a point p S is any open set containing p. The original goal of
topology was to create a denition of convergence without having a metric dened on the
space. This is where neighborhoods are useful.
However, structures this bare are rarely encountered in most places, and can be exceed-
ingly pathological. Therefore, the Hausdor axiom was created to dene a notion of nice-
ness on spaces. A space is Hausdor if given p
1
, p
2
S open S
1
, S
2
T s.t. p
1
S
1
, p
2

S
2
and S
1
S
2
= . Finally, compactness in topological spaces has the same denition as in
basic analysis, i.e. every open cover of the set has a nite subcover.
We can now dene convergence in a Hausdor space. A sequence (x
n
) in space S converges
to x S i every neighborhood of x contains all but nitely many elements of x
n
. We write
x = lim
n
x
n
.
1.2 Vector Spaces
A vector space is a nonempty set E dene with two operations
Addition: (x, y) x + y, E E E
Multiplication by scalars: (, x) x, F E E
where F is some eld, like R or C.
Vector spaces conform to the following rules:
1. x + y = y + x
2. (x + y) + z = x + (y + z)
3. x, y E z E s.t. x + z = y
4. (x) = ()x, , F, x E
5. ( + )x = x + x, , F, x E
6. (x + y) = x + y, F, x, y, E
7. 1 x = x
Vector Spaces 7
Note that the rst three properties imply that (E, +) is a commutative group.
We can use these to immediately prove some very basic consequences:
Consequence 1.2.1. There exists a unique identity element 0 E s.t. x E, x + 0 = x.
Proof. Let x+z
x
= x and y +z
y
= y. Let w E s.t. y = x+w. Then y +z
x
= w+x+z
x
=
w + x = y.
Therefore, if x + z
x
= x y + z
x
= y y E. Suppose there are two such elements
z
1
, z
2
. So z
1
+ z
2
= z
1
and z
2
+ z
1
= z
2
z
1
= z
2
.
Consequence 1.2.2. For all x, y E there exists a unique z E s.t. x + z = y y = x.
Proof. Let x + z
1
= y and x + z
2
= y. Then there exists w E s.t. x + w = 0. Then
z
1
= z
1
+ (x + w) = y + w = z
2
+ (x + w) = z
2
.
Consequence 1.2.3. 0 x = 0, (1)x = x, 0 = 0.
Proof. z(0x) = 0x + 0x = 0x 0x = 0.
0 = 00 = (0)0 = 00 = 0.
0 = (1 + (1))x = x + (1)x (1)x = x.
Consequence 1.2.4. x = 0 i either = 0 or x = 0.
Proof. If ,= 0 then 0 =
1
x = x, so x = 0.
Typical examples of vectors spaces include the following:
1. R
n
, C
n
.
2. F, the set of all functions from a set S into a vector space E. This is obvious. If
E = R or C, we denote by /(S) the set of all bounded functions. If S is a topological
space, we denote by ((S) the set of continuous functions
1
. If S is compact, then
((S) /(S) as a subspace.
3. If S in 2 is N, then the space F in 2 is the space of sequences.
4.
p
spaces, i.e. spaces of summable sequences.
The last two can be proved as an exercise.
If E
1
, . . . , E
n
are vector spaces over F, then E
1
E
n
= (x
1
, . . . , x
n
) : x
i
E
i
, i =
1, . . . , n, the Cartesian product, is also a vector space.
An innite set A is linearly independent if every nite collection of vectors in A is linearly
independent. For example, A = 1, x, x
2
, . . . is linearly independent in (([0, 1]). Recall
nally that a set A is a basis for E if A is linearly independent and the span of A is E. The
number of elements of the span of E itself is the dimension of E. For example, (([0, 1]) is
innite-dimensional.
1
In either case, if E isnt specied, assume it to be R.
8 Preliminaries
1.3 Metric Spaces
Given a set S, the function d : S S [0, ) is called a metric on S if
d(x, y) = 0 i x = y
d(x, y) = d(y, x)
d(x, z) d(x, y) + d(y, z) x, y, z S
(S, d) is called a metric space. We dene an open ball centered at x with radius r by
B
r
(x) = y S : d(x, y) < r. A set U is open in S if its the (possibly empty) union
of open balls in S. A sequence (x
n
) in S is called a Cauchy sequence if > 0 N
N s.t. d(x
n
, x
m
) < if n, m > N. A metric space is complete if every Cauchy sequence
converges to an element of S. This concept is crucial in the theory of innite-dimensional
spaces.
Here are some important facts about metric spaces.
1. Set B in a metric space is closed i every convergent sequence in B has a limit in B.
2. B = x B : (x
n
) B s.t. x
n
x.
3. B is dense in S if B = S. Then the following are equivalent
(a) B is dense in S.
(b) x S (x
n
) B s.t. x
n
x.
(c) Every nonempty open subset of S contains an element of B.
4. A subset B of a metric space S is compact i every sequence (x
n
) B contains a
convergent subsequence whose limit is in B.
5. Compact sets are closed and bounded. Note that in R
n
and C
n
, a sets being compact
is equivalent to it being closed and bounded. This is not generally true in innite-
dimensional spaces.
1.4 Normed Spaces
A function |.| : E [0, ) is called a norm if
|x| = 0 i x = 0.
|x| = [[|x| F, x E.
|x + y| |x| +|y| x, y E.
Normed Spaces 9
E equipped with a norm is called a normed space. Typical examples include R
n
and C
n
with the |x|
1
, |x|
2
, and |x|
norms. Another common example is that of the /(S) and

((S) spaces on a compact set S with the norm |f| = sup
xS
[f(x)[.
Also consider the
p
spaces. If 0 < p < 1, and d(x, y) =
n=1
[x
n
y
n
[
p
for vectors x, y E
is a metric.
If p 1, we can dene a norm, i.e. |x
p
| =
_

n=1
[x
n
[
p
_1
p
.
We note that any normed space is a metric space with the metric d(x, y) = |x y|.
Therefore we retain the notions of open and closed sets, as well as convergence. For normed
spaces, we say that a sequence (x
n
) converges to x i |x
n
x| 0 as n 0 and the limit is
unique. It can also be easily proved that if
n
and x
n
x, then
n
x
n
x. Similarly,
if x
n
x and y
n
y, then x
n
+ y
n
x + y. This implies the continuity of the operators
and +.
A natural question to ask is whether these norms give the same topology. We say that
| |
d
1
and | |
d
2
on E are equivalent if for any sequence (x
n
) in E and x E, |x
n
x|
d
1

0 |x
n
x|
d
2
0.
In fact, for nite dimensional spaces, all norms are equivalent.
Theorem 1.4.1. Let | |
1
and | |
2
be norms on a vector space E. Then | |
1
and | |
2
are equivalent i , > 0 s.t. |x|
1
|x|
2
|x|
1
for any x E.
Proof. The converse is trivial. For the forward direction, assume the norms are equivalent.
If the condition isnt satised, n x
n
s.t. |x
n
|
2
<
1
n
|x
n
|
1
. Dene y
n
=
1
n
x
n
|x
n
|
2
. Then
|y
n
|
2
=
1
n
0 as n , and thus y
n
0 in ||
2
. But |y
n
|
1
=
1
n
|x
n
|
1
|x
n
|
2
>
n
n
|x
n
|
2
|x
n
|
2
=
n.
Thus y
n
doesnt go to zero in | |
1
, and we have a contradiction.
Chapter 2
Banach Spaces
In this chapter, we develop some of the basic functional analysis needed to understand linear
operators. We begin with some results for normed spaces.
2.1 The Baire Category Theorem
This is a very important general tool in functional analysis. Its used in the proofs of many
other theorems, such as the closed graph and Banach-Steinhaus theorems.
Theorem 2.1.1. Let (S, d) be a complete metric space. Then the intersection of a countable
family of dense, open subsets of S is dense in S.
Proof. Suppose V
1
, V
2
, . . . are dense open subsets of S. Let B
0
be any open subset of S. We
can prove the theorem if we just show that B
0
n=1
V
n
,= .
We use induction. Suppose weve already chosen B
n1
open for some n 1. Since the
intersection of two open sets is open, we can nd an open set in the intersection of B
n1
with V
n1
. Call this set B
n
and choose it s.t. B
n
B
n1
V
n
(the base case is obvious
now). Without loss of generality, we can choose B
n
to be a ball of radius
1
n
. Its easy
to see that the centers of B
n
form a Cauchy sequence, which converges to a point in K =
n=1
B
n

n=1
V
n
. Since K B
n
V
n
n, K is a subset of B
0
; thus K B
0
n=1
V
n
.
By completeness, the point stays within the space.
2.2 The Riesz Lemma
Lemma 2.2.1. (F. Riesz). Suppose X is a normed vector space, and X
0
= X
0
is a proper
subspace of X. Then > 0 x
X, |x
| = 1, s.t. |x
x
0
| 1 x
0
X
0
.
Proof. Let y
0
X X
0
. Since X
0
is closed, d = inf
x
0
X
0
|y
0
x
0
| > 0. (Note that if there
is a sequence (x
n
) in X
0
s.t. |x
n
y
0
| 0, this would imply that y
0
X
0
, and we cant
have that).
The Denition of a Banach Space 11
Choose s.t.

+d
< . Let x
0
X
0
be s.t. d | x
0
y
0
| < d +. Set x
=
y
0
x
0
|y
0
x
0
|
. Then
|x
| = 1 and
|x
x
0
| =
_
_
_
_
y
0
x
0
|y
0
x
0
|
x
0
_
_
_
_
=
1
|y
0
x
0
|
|y
0
x
0
(x
0
|y
0
x
0
|)|
d
d +
= 1

d +
1
Note that we can do this because the element x
0
+ x
0
|y
0
x
0
| lies in X
0
.
This lemma is important, because it allows us to demonstrate an interesting property
that cant be found in nite dimensional spaces.
Corollary 2.2.1. If X is a normed vector space s.t. the dimension of X is innite, the set
B
1
(0) is not compact.
Proof. Let x
1
, x
2
, . . . be a sequence of linearly independent vectors. Denote X
n
= span x
1
, . . . , x
n
, n
N. These are increasing spaces, and X
n
= X
n
. Each X
n
is a proper subspace of X
n+1
.
Let x
1
X
1
, | x
1
| = 1. Since X
1
is a strict subset of X
2
, by Rieszs lemma, x
2
X
2
s.t.
| x
2
x
1
|
1
2
. Keep repeating this procedure.
We thus have a sequence x
1
, x
2
, . . . , x
n
X
n
s.t. | x
i
x
j
|
1
2
for i ,= j. In particular,
for n + 1, x
n+1
s.t. | x
n+1
x
i
|
1
2
i < n + 1, and | x
n+1
| = 1.
Therefore we have a sequence ( x
n
) of elements all with |.| = 1, which thus lie in B
1
(0)
s.t. | x
n
x
m
|
1
2
n ,= m. This implies that this sequence can never have a convergent
subsequence, and thus cannot be compact.
In innite-dimensional spaces, compact sets have empty interior.
2.3 The Denition of a Banach Space
We now dene one of the most important classes of spaces studied in functional analysis. A
Banach space is a complete normed space. We give a couple of examples.
2.3.1
p
Spaces
Minkowskis inequality states the following:
_

n=1
[x
n
+ y
n
[
p
_1
p
n=1
[x
n
[
p
_1
p
+
_

n=1
[y
n
[
p
_1
p
12 Banach Spaces
We can use this to show that the
p
spaces are complete.
Let (x
n
) be a Cauchy sequence in
p
, i.e. x
k
= (x
k
n
) The n stands for the indice of the
vector, and the sequence is indexed by k. So. > 0 k
0
s.t. if l, k > k
0
n=1
[x
k
n
x
l
n
[
p
<
p
(2.1)
This means that x
k
n
, k = 1, . . . is a Cauchy sequence in C; therefore, it has a limit
y
n
= lim
n
x
k
n
. Let y be the limit as a vector.
As l , we have
n=1
[x
l
n
y
n
[
p
<
p
(2.2)
This equation implies that in the limit, each quantity given lies in
p
. Using Minkowskis
inequality, we have
_

n=1
[y
n
[
p
_1
p
n=1
[x
k
n
y
n
[
p
_1
p
+
_

n=1
[x
k
n
[
p
_1
p
<
which implies that y
p
. Moreover, 2.2 implies that for k > k
0
, |x
k
y|
p
< . Thus
x
k
y, and this is a Banach space.
2.3.2 The Space of Bounded Continuous Functions
Let (S, p) be a metric space, and let (
b
(S) be the space of continuous bounded functions on
S, with values in C, and the norm |f|
= sup
xS
[f(x)[. Let (f
n
) be a Cauchy sequnce in
(
b
(S), i.e. for > 0, n
0
s.t. if n, m > n
0
sup
xS
[f
n
(x) f
m
(x)[ < (2.3)
Given any x S, (f
n
(x)) is Cauchy in C, i.e. it has a limit (f(x)). Letting n in
2.3, we get
sup
xS
[f
n
(x) f(x)[ <
Therefore
sup
xS
[f(x)[ sup
xS
[f
n
(x) f(x)[ + sup
xS
[f
n
(x)[ <
which means f is bounded. We now need to show that its continuous.
Let x
0
S. Choose n
1
> n
0
. Let S be s.t. if p(x, x
0
) < , then [f
n
1
(x) f
n
1
(x
0
)[ < .
This means that
The Denition of a Banach Space 13
[f(x) f(x
0
)[ [f(x) f
n
1
(x)[ +[f
n
1
(x) f
n
1
(x
0
)[ +[f
n
1
(x
0
) f(x
0
)[ < 3
Hence f is continuous, and (
b
(S) is a Banach space.
Note that a closed subspace of a Banach space is a Banach space.
2.3.3 Absolutely Convergent Series
A series
n=1
x
n
converges if there exists x E s.t.
m
n=1
x
n
x as m , where E is a
normed space. We write this as
n=1
x
n
= x. If
n=1
|x
n
| converges, the series is absolutely
convergent.
Theorem 2.3.1. A normed space E is complete i every absolutely convergent series in E
converges.
Proof. () Let E be a Banach space. Let
n=1
x
n
be s.t.
n=1
|x
n
| converges. Dene s
n
=
x
1
+ + x
n
. Let > 0, and choose n
0
s.t.
n=n
0
|x
n
| < . Then for n, m n
0
, we have
|s
n
s
m
| =
_
_
_
_
_
n
i=m+1
x
i
_
_
_
_
_
i=m+1
|x
i
| <
So the sequence of potential sums is Cauchy in E, and thus this limit lies in E, meaning
the series is convergent.
() Let (x
n
) be a Cauchy sequence in E. k 1, we can nd p
k
s.t. |x
n
x
m
| <
1
2
k
if n, m p
k
. Obviously, we can assume the sequence (p
k
) is strictly increasing. Then
k=1
(x
p
k+1
x
p
k
) is absolutely convergent since |x
p
k+1
x
p
k
| <
1
2
k
. Thus its convergent, so
x
p
1
+
k=1
(x
p
k+1
x
p
k
) is convergent to some element x E. But x
p
1
+
n
k=1
(x
p
k+1
x
p
k
) = x
p
n+1
,
and thus x
p
k
x as n .
Now we see
|x x
n
| = |x x
p
n
+ x
p
n
x
n
|
|x x
p
n
| +|x
p
n
x
n
|
14 Banach Spaces
The latter two terms go to zero, and thus the converse is proved.
2.4 Linear Mappings
Given two vector spaces E
1
, E
2
, a linear map from E
1
into E
2
is denoted by L. We make
some other denitions:
T(L) = x E
1
: L is dened -This is the domain of L.
1(L) = y E
2
: x T s.t. L(x) = y -This is the range of L.
((L) = (x, y) E
1
E
2
: x T(L), y = L(x) -This is the graph of L.
A mapping L : T(L) E
1
E
2
is called linear if T(L) is a subspace of E
1
and
L(x
1
+ x
2
) = L(x
1
) + L(x
2
) x
1
, x
2
T(L), , F. Since T(L) is a vector space
itself, we can assume, without loss of generality, that L : E
1
E
2
.
A mapping L : E
1
E
2
is continuous at x
0
E
1
if whenever |x
n
x
0
|
E
1
0, then
|L(x
n
) L(x
0
)|
E
2
.
Exercise 1. Let L : E
1
E
2
. Then the following are equivalent:
1. L is continuous.
2. L
1
(U) is open whenever U is open in E
2
.
3. L
1
(U) is closed whenever U is closed in E
2
.
Theorem 2.4.1. If E
1
, E
2
are normed spaces. Then L : E
1
E
2
is continuous i L is
continuous at a single point.
Proof. Only the converse needs to be proven. Suppose the mapping is continuous at some
point x
0
E
1
. Let x E
1
s.t. x ,= x
0
and let x
n
x. Then x
n
x + x
0
x
0
and so the
map converges. This implies that |L(x
n
) L(x)| = |L(x
n
x + x
0
) L(x
0
)| 0.
Theorem 2.4.2. Let E
1
, E
2
be normed spaces. Then L : E
1
E
2
is continuous i there
exists k > 0 s.t. |L|
E
2
k|x|
E
1
for any x E
1
.
Proof. The converse is obvious because this implies the map is continuous at the origin, and
thus is so everywhere.
For the forward direction, if the condition is not satised, then n N, x
n
E
1
s.t.
|L(x
n
)| > n|x
n
|. Set y
n
=
x
n
n|x
n
|
. As n , y
n
0. Since the map is continuous, |L(y
n
)|
should go to 0 as well. But |L(y
n
)| 1, leading to a contradiction.
Note that this implies that continuous maps are uniformly continuous.
Now let E
1
, E
2
be two normed spaces. Dene B(E
1
, E
2
) = L : E
1
E
2
: L is linear and bounded.
This is a space of maps.
Proposition 2.4.1. B(E
1
, E
2
) is a normed space with |L| = sup
|x|1
|L(x)|. This is the
operator norm.
Linear Mappings 15
Proof. The rst two properties are obvious. The main thing to prove is the triangle inequal-
ity.
Let |x| 1 and L
1
, L
2
B(E
1
, E
2
). Then
|(L
1
+ L
2
)(x)| = |L
1
(x) + L
2
(x)|
|L
1
(x)| +|L
2
(x)|
sup
|x|1
|L
1
(x)| + sup
|y|1
|L
2
(y)|
|L
1
| +|L
2
|
This holds for all x. This in turn implies that sup
|x|1
|(L
1
+L
2
)(x)| |L
1
|+|L
2
|.
In fact, |L| is the smallest constant k s.t. |L(x)| k|x| x E
1
. This follows because
for x ,= 0
|L(x)| = |L
_
|x|
x
|x|
_
|
=
_
_
_
_
L
_
x
|x|
__
_
_
_
|x|
|L||x|
This implies that the norm is equal to k.
Convergence in the operator norm is sometimes called uniform convergence. We say that
L
n
in B(E
1
, E
2
) converges strongly to L B(E
1
, E
2
) if x E
1
, L
n
(x) L(x). In general
this is not a Banach space. However
Theorem 2.4.3. If E
1
is a normed space and E
2
is a Banach space, then B(E
1
, E
2
) is a
Banach space.
Proof. Let (L
n
) be Cauchy in B(E
1
, E
2
). Then x E
1
|L
n
(x) L
m
(x)| = |(L
n
L
m
)(x)|
= |L
n
L
m
||x| 0
Denote the limit by L(x). L is linear because
L(x y) = lim
n
L
n
(x y)
= lim
n
L
n
(x) + lim
n
L
n
(y)
= L(x) + L(y)
Now let > 0 and let n
0
be s.t. n, m > n
0
, |x| 1 and |L
n
(x) L
m
(x)| < . Letting
m , we get |L
n
(x) L(x)| < , which implies L
n
L B(E
1
, E
2
). Thus L is bounded,
and B(E
1
, E
2
) is a Banach space.
16 Banach Spaces
Not all linear maps are bounded. Consider T(x
1
, x
2
, . . . , ) = (x
1
, 2
2
x
2
, 3
2
x, . . . ). Here
|T(0, . . . , 1, 0, . . . )| = n
2
.
Theorem 2.4.4. If L : T(L) E
2
where T(L) E
1
, (E
1
normed and E
2
Banach), and
the mapping is bounded and linear, then L has a unique extension to a continuous linear
map dened on T(L). In particular if T(L) = E
1
, then L has a unique extension to a map
in B(E
1
, E
2
).
Proof. If x T(L), then x
n
T(L) s.t. x
n
x. We thus have
|L(x) L(x
n
)| = |L(x x
n
)|
|L||x x
n
| 0
Thus the sequence (L(x
n
)) is Cauchy in E
2
, implying it has a limit in E
2
. Dene the
extension of L to T(L) by

L(x) = lim
x
n
x
L(x
n
). Call this limit Z. Obviously

L(x) is linear.
To check whether its continuous, suppose y
n
T(L) and y
n
x. Then x
n
y
n
0, and
so |L(x
n
) L(y
n
)| |L||x
n
y
n
| 0, thus lim
n
L(x
n
) = Z.

L is an extension of L
since L is continuous on T(L).
To check the boundedness of

L, let x T(L), |x| 1. Let (x
n
) T(L) be s.t. x
n
x.
Then |x
n
| |x|, and we have |
L(x)| = | lim
n
L(x
n
)| lim
n
|L||x
n
|. Thus
|
L| |L|.
Obviously, since

L is an extension of L, |L| |
L|. Therefore the two norms are

equal.
A linear map between two normed spaces s.t. the map and its inverse are continuous,
1-1 and onto is called an isomorphism. If T is linear s.t. |T(x)| = |x| x E
1
, then T is
called an isometry. Note that if E
2
= F, this is a linear functional .
2.5 The Closed Graph Theorem
The closed graph theorem is one of the fundamental results in functional analysis. The proof
is highly nontrivial, and often uses an open mapping argument, although we dont directly
employ it here.
The keys in the proof are the creation of a denseness argument, and the Baire Category
theorem. We also rely on the completeness of E
2
.
Theorem 2.5.1. (Closed Graph Theorem). Given two Banach spaces E
1
, E
2
, let L : E
1

E
2
be a linear map. Then L B(E
1
, E
2
) i ((L) is closed.
Proof. () This is the easy direction. Let L B(E
1
, E
2
), and let (x
n
, L(x
n
)) ((L) be s.t.
(x
n
, L(x
n
)) (x, y) E
1
E
2
. Since L is continuous, L(x
n
) L(x), and is closed.
() Denote an open ball in E
1
with radius r by B
E
1
r
(x). Note that we can write the
space E
1
as a covering by inverse maps from balls in E
2
. Note also that we have that
B
E
1
n
(x) = nB
E
1
1
(x).
The Closed Graph Theorem 17
We have that E
1
=
n=1
L
1
(nB
E
2
1
(0)). Let V
n
= nL
1
(B
E
2
1
(0)) and dene U
n
= E
1
V
n
.
This means that the U
n
s are open, and that
n=1
U
n
= . Therefore, by the Baire Cate-
gory theorem, one of the U
n
s is not dense in E
1
. This implies that for some n
0
, V
n
0
(i.e.
n
0
L
1
(B
E
2
1
(0))) contains a ball which is dense in some ball B
E
1
r
(x
0
) for some r > 0, x
0
E
1
.
We claim that L
1
(B
E
2
2n
0
(0)) is dense in B
E
1
r
(0).
Let |x| r. Then x + x
0
B
E
1
r
(x
0
). Consider two sequences (x
k
) and (x
0
k
)
n
0
L
1
(B
E
2
1
(0)) s.t. lim
k
x
k
= x + x
0
and lim
k
x
0
k
= x
0
. This means that x =
lim
k
(x
k
x
0
k
). But |L(x
k
x
0
k
)| |L(x
k
)| + |L(x
0
k
)| n
0
+ n
0
= 2n
0
. By linear-
ity, this imples that > 0, L
1
(B
E
2
) is dense in B
E
1
(x
0
), =
2n
0
r
.
We want to show that L is bounded.
Let x E
1
, |x| for some xed > 0. By the denseness property just proved, we can
nd x
1
s.t. |L(x
1
)| < and |xx
1
|
1
2
. Now L
1
(B
E
2
1
2
(0)) is dense in B
E
1
1
2
(x
0
), so there
exists x
2
s.t. |L(x
2
)|
1
2
and |(x x
1
) x
2
|
1
2
2
. By induction, we nd a sequence x
n
s.t. |L(x
n
)|
1
2
n1
and |x (x
1
+ + x
n
|
1
2
n
. Therefore, as n ,
n=1
x
n
= x
and
n=1
|L(x
n
)| 2. Since E
2
is complete,
n=1
L(x
n
) = y E
2
(since the series is
absolutely convergent).
However, (x
1
+ + x
n
, L(x
1
) + + L(x
n
)) ((L). Since ((L) is closed, this implies
that y = L(x). Obviously, |y| 2. Therefore x E
1
s.t. |x| , we get |L(x)| 2.
This means that L is bounded.
Note that we never used E
1
s completeness; it was enough that E
1
was of the second cat-
egory (i.e. it wasnt the countable union of nowhere dense sets). Heres another cornerstone
theorem in functional analysis, sometimes also known as the uniform boundedness principle.
Theorem 2.5.2. (Banach-Steinhaus). Let E
1
be a Banach space and E
2
be a normed space.
Let T B(E
1
, E
2
) be a family of bounded maps s.t. x E
1
, the set L(x)
LT
is bounded.
Then M > 0 s.t. |L| M L T.
Proof. Let X
k
= x E
1
: |L(x)| k L T. Then
1. X
k
= X
k
2.
k=1
X
k
= E
1
By the previous, k
0
s.t. B
E
1
r
(x
0
) X
k
0
for some x
0
E
1
, r > 0. Therefore if |xx
0
|
r, then |L(x)| k
0
L T. This means that if |y| r, then |L(y)| |L(y + x
0
)| +
|L(x
0
)| 2k
0
L T.
Therefore, if |y| r, then |L(y)| =
1
r
|L(ry)| 2
k
0
r
L T. Therefore M = 2k
0
is the
constant.
Corollary 2.5.1. Let E
1
, E
2
be Banach and normed respectively. Let T
n
B(E
1
, E
2
). If
x E
1
, T
n
(x) T(x), then T B(E
1
, E
2
).
18 Banach Spaces
These types of theorems gave the rst rigorous proofs of the convergence of Fourier series.
Note that the above argument fails if the space E
1
isnt complete. To see this, let
E
1
= (
1
([0, 1]; R) with norm |x| = sup
0t1
[x(t)[, and let E
2
be R with the usual norm.
Then, if we dene T
n
(x) =
x(1/n)x(0)
1/n
, this map approaches the derivative at 0, i.e. x
t
(0), x
(
1
([0, 1]; R). But these may not be contained in B(E
1
, E
2
) .
2.6 Completeness and Fixed Point Theorems
2.6.1 Isometric Embeddings
This section answers the following question: can one embed noncomplete spaces in a larger
space in an isometric way? The answer is yes for certain classes of spaces.
Given normed space (E, | |), the space (

E, | |
1
) is called a completion of (E, | |) if
there exists a 1-1 linear mapping : E

E s.t. |(x)|
1
= |x| for every x E, (E) =

E
and (

E, | |
1
) is complete.
Theorem 2.6.1. If (E, | |) is a normed space, then it has a completion (

E, |.|
1
).
Proof. Let

E be the set of equivalence classes of Cauchy sequences in E. [(x
n
)] is equivalent
to [(y
n
)] if lim
n
(x
n
y
n
) = 0. One can see that this forms a vector space, i.e.
[(x
n
)] + [(y
n
)] = [(x
n
+ y
n
)]
[(x
n
)] = [(x
n
)]
Set |(x
n
)|
1
= lim
n
|x
n
|. Dene : E

E by (x) = [(x, x, x, . . . )], i.e. a constant
sequence. Obviously, |(x)|
1
= |x|. Let [(x
n
)]

E. Then [(x
n
)] = lim
i
[(x
i
, x
i
, x
i
, . . . )]
(E). This means that |[(x
n
)][(x
i
, x
i
, . . . )]|
1
= lim
n
|x
n
x
i
| 0 as i since x
n
is
Cauchy. This means that

E (E). Since the converse is obvious, we have that

E = (E).
Now we just need to show that (

E, |.|
1
) is complete.
Let [(x
k
n
)] be a Cauchy sequence. Then the sequence can be written as the following
diagram
x
1
1
, x
1
2
, x
1
3
, . . . , x
1
k
, . . .
x
2
1
, x
2
2
, x
2
3
, . . . , x
2
k
, . . .
. . . . . .
. . . . . .
x
n
1
, x
n
2
, x
n
3
, . . . , x
n
k
, . . .
In this diagram, for every i, there exists L
i
s.t. if p, q > L
i
, then |x
i
p
x
i
q
| <
1
i
by the
Cauchy property. We can assume that L
1
< L
2
, < L
3
etc.
Set [( x
k
)] = [(x
k
L
k
)]. Now its easy to check that [( x
k
)] = lim
n
[(x
k
n
)] (left as an
exercise).
The Hahn-Banach Theorems 19
2.6.2 The Banach Fixed Point Theorem
Suppose you have a map T : E E. If T(x) = x, x is called a xed point of T. E is usually
a normed space, but it can sometimes be more general.
Given A E, T : A E is called a contraction if s.t. 0 < < 1 and |f(x)f(y)|
|x y|.
Theorem 2.6.2. Let f : X X, where X is a closed subset of a Banach space, and f is a
contraction. Then f has a unique xed point in x.
Proof. Choose any x
0
X. Suppose |f(x) f(y)| |x y|, x, y X. Dene x
1
=
f(x
0
), . . . , x
n+1
= f(x
n
), . . . , and so on. Then
|x
n+1
x
n
| = |f(x
n
) f(x
n+1
)|
|x
n
x
n1
|

n
|x
1
x
0
|
Let m < n. Then
|x
n
x
m
| |x
n
x
n1
| + |x
m+1
x
m
|

n1
|x
1
x
0
| + +
m
|x
1
x
0
|

m
|x
1
x
0
|(1 + +
nm1
)

m
1
|x
1
x
0
| 0 as m
Thus (x
n
) is Cauchy. Since the space is closed, its limit z X.
To see that this is a xed point, note that
|f(z) z| |f(z) x
n
| +|x
n
z|
= |f(z) f(x
n1
)| +|x
n
z|
|z x
n1
| +|x
n
z| 0 as n
This point is also unique. This is because if w X s.t. f(w) = w, then |z w| =
|f(z) f(w)| |z w|. Thus |z w| = 0.
2.7 The Hahn-Banach Theorems
The Hahn-Banach theorem is not only one of the central results of functional analysis, it is
one of the central tools of the area, being applicable to a wide variety of problems. Before we
prove it, we need a result from set theory called Zorns lemma, which is essentially equivalent
to the Axiom of Choice. Recall that a partial order on a set is reexive, antisymmetric and
transitive.
Lemma 2.7.1. (Zorns Lemma) Every partially ordered set, in which every chain (i.e. totally
ordered subset) has an upper bound, has a maximal element.
20 Banach Spaces
Suppose you have a vector space E over the eld of reals. A function p : E R is called
sublinear if
1. p(x + y) p(x) + p(y) for x, y E.
2. p(tx) = tp(x) for x E, t R
+
.
Then we have:
Theorem 2.7.1. (Hahn-Banach). Given E
0
, a subspace of E, and f
0
, a linear functional on
E
0
s.t. f
0
(x) p(x), there exists a linear functional f on E s.t. f
[E
0
= f
0
and f(x) p(x)
on E.
Proof. Let be the family of all extensions of f
0
, i.e. = (g, E
g
) : E
g
is a subspace
of E, E
0
E
g
, g being a linear functional on E
g
, g
[E
0
= f
0
, g p. We know that is
nonempty because at the very least (f
0
, E
0
) .
Introduce a partial order on : we say that (g
1
, E
g
1
) (g
2
, E
g
2
) if E
g
1
E
g
2
and
g
2[E
g
1
= g
1
.
Let
0
be a linearly ordered subset of . Set

E =
(g,E
g
)
0
E
g
and g(x) = g(x) if x E
g
.
E is a subspace of E, and clearly g is a linear functional on it. Since g

[E
g
= g for every
(g, E
g
)
0
and g(x) p(x) on

E. If x E
g
1
E
g
2
, without loss of generality E
g
1
E
g
2
and thus its well dened. Moreover, (g, E
g
) ( g,

E) (g, E
g
)
0
. This means that
0
always has an upper bound, and thus by Zorns lemma, has a maximal element.
In the second part of the proof, we need to show that E
f
= E. If we can do this, well
be done.
If E
f
,= E, then x
0
E E
f
. Dene E
1
= x E : x = x + x
0
, x E
f
, E. This
is a subspace, and moreover, E
1
E
f
.
Dene f
1
( x) = f(x) +
0
, where
0
will be determined later. So f
1
is a linear functional
on E
1
, and f
1[E
f
= f. Then for x, y E
f
, we have
f(x) f(y) = f(x y)
p(x y)
= p((x + x
0
) (y + x
0
))
p((x + x
0
)) p((y + x
0
))
Therefore f(y) p(y x
0
) f(x) + p(x + x
0
) x, y E
f
. This implies that
0
:= sup
yE
f
(f(y) p(y x
0
)) inf
xE
f
(p(x + x
0
) f(x))
Now if x E
f
, R, we have
1. > 0
0
p
_
x
+ x
0
_
f
_
x
_
This implies that
0
p(x + x
0
) f(x), and thus we have that f
1
( x) +
0

p(x + x
0
) = p( x).
The Hahn-Banach Theorems 21
2. < 0
p
_
x
0
_
f
_
x
_

0
This implies that f(x) p(x + x
0
)
0
f
1
( x) p( x).
So f
1
p on E
1
, which means (f
1
, E
1
) is larger than (f, E
f
), which is a contradiction.
Therefore E
f
= f.
There is a complex extension of this theorem as well.
Theorem 2.7.2. If E is a complex vector space, p a pseudonorm on E, and E
0
a subspace
of E, let f
0
: E C be a linear functional s.t. [f
0
(x)[ p(x) on E. Then there is a linear
functional f on E s.t. f
[E
0
= f
0
and [f(x)[ p(x) on E.
Proof. Note that f
0
(x) = Re f
0
(x) + i Imf
0
(x). Therefore i Re f
0
(x) Imf
0
(x) if
0
(x) =
f
0
(ix) = Re f
0
(ix) + i Imf
0
(ix). Thus Imf
0
(x) = Re f
0
(x) x E
0
. This means that we
have f
0
(x) = Re f
0
(x) i Re f
0
(x).
Note also that if u(x) is a real functional, then g(x) = u(x) iu(x) is a complex linear
functional. To prove this, we want that g((a + ib)x) = ag(x) + bg(ix) (the additivity is
obvious). We have that u((a +ib)x) = au(x) +bu(ix) and u((a +ib)ix) = au(ix) +ibu(ix).
Then
g((a + ib)x) = u((a + ib)x) iu((a + ib)ix)
= au(x) + bu(ix) (i(au(x) + b(ix)))
= a(u(x) iu(ix)) + b(u(ix) + iu(x))
= (a + ib)(u(x) iu(ix))
= (a + ib)g(x)
Therefore f
0
(x) = Re f
0
(x) i Re f
0
(ix) is a complex linear functional.
Now we know that Re f
0
is a real functional on E
0
, and Re f
0
p. So by Hahn-Banach,
there is a real linear functional

f on E s.t.

f
[E
0
= Re f
0
and

f(x) p(x) on E. Since
f(x) p(x) = p(x), this implies that [
f(x)[ p(x) on E.
Set f(x) =

f(x) i
f(ix). f is a complex linear functional, and f

[E
0
= f
0
. Suppose
that x
0
s.t. [f(x
0
)[ > p(x
0
). Take =
f(x
0
)
[f(x
0
)[
. Then f(x
0
) = f(x
0
) = [f(x
0
)[. This
means that f(x
0
) = Re f(x
0
). Therefore Re f(x
0
) = [f(x
0
)[ > p(x
0
) = p(x
0
). But
Re f(x
0
) =

f(x
0
) p(x
0
). But this is a contradiction.
Corollary 2.7.1. Let E be a normed space with subspace E
0
. If f E
0
, then f E
s.t.
f
[E
0
= f
0
and |f|
E
= |f
0
|
E
0
Proof. [f
0
(x)[ |f
0
|
E
0
|x|. Apply the previous theorem to p(x) = |f
0
|
E
0
|x|. We get a
linear function s.t. f
[E
0
= f
0
and [f(x)[ |f
0
|
E
0
|x| x E. This implies that |f|
E

|f
0
|
E
0
, and thus |f|
E
= |f
0
|
E
0
.
Lets look at some other consequences of the Hahn-Banach theorems. Before we do, well
need the following:
22 Banach Spaces
Exercise 2. Let X be a normed space, and let f : X C be a linear functional. Then
f X
i f
1
(0) is closed.
Theorem 2.7.3. Let E be a normed space, E
0
a subspace and let x
0
/ E
0
. Then there exists
f E
s.t. f(x
0
) = 1, f
[E
0
= 0 and |f| =
1
d
where d = inf
yE
0
|x
0
y|.
Proof. Set

E = x+x
0
: x E
0
, C. Dene

f on

E by setting

f( x) = if x = x+x
0
.
Obviously

f is a linear functional on

E. Since

f
1
(0) = E
0
, f

E
. Therefore f E
s.t.
f
[
E
=

f and |f| = |
f|. Obviously f(x

0
) =

f(x
0
).
Now let x = x+x
0
, x E
0
. Then | x| = |x+x
0
| = |
x
+x
0
| [[d. But [[ = [
f( x)[
and so [
f( x)[
1
d
| x|, which means |
f|
1
d
. Now let (x
n
) E
0
s.t. |x
n
x
0
| d. Then
1 =

f(x
0
x
n
) |
f||x
0
x
n
|. Letting n , we get 1 |
f|d, from which the theorem

follows.
Corollary 2.7.2. Let E be a normed space, with x E ,= 0. Then f
x
E
s.t. f
x
(x) = |x|
and |f
x
| = 1.
Proof. Apply the previous theorem to E
0
= 0. Then dist(x, E
0
) = |x|, and we get

f
x
E
s.t. f
x
(x) = 1, |
f
x
| =
1
|x|
. Now take f
x
= |x|
f
x
.
Corollary 2.7.3. If E be a normed space, x E, |x| = sup
fE
,|f|=1
[f(x)[.
Proof. If |f| = 1, then [f(x)[ |f||x| = |x|. In the other direction, f
x
(x) = |x| and
|f
x
| = 1.
Corollary 2.7.4. Let E be a normed space, and let K
= f E
: |f| 1. Then
T : E /(K
) dened by T (x)(f) = f(x) is an isometry.

Proof. The linearity is obvious. Note that |T (x)|
/(K
)
= sup
fK
[f(x)[ = |x|.
Chapter 3
Hilbert Spaces
The nicest innite-dimensional spaces are undoubtedly Hilbert spaces. These are a di-
rect extension of Euclidean space, and allow us to retain a suprising number of the same
properties.
3.1 Denition
Given a vector space E, a map , : E E C is called an inner product on E if
x, y E, , C,
1. x, x 0 and x, x = 0 i x = 0.
2. x + y, z = x, z + y, z.
3. x, y = y, x.
We call (E, , ) an inner product space. Note that if E is a real vector space, , :
E E R.
Some examples include
C, x, y = x y.
C
n
, (x
1
, . . . , x
n
), (y
1
, . . . , y
n
) =
n
i=1
x
i
y
i
.

2
, (x
n
), (y
n
) =
i=1
x
i
y
i
.
L
2
(, M, ), f, g
L
2 =
_
f(x)g(x)d(x).
If (E
1
, ,
1
) and (E
2
, ,
2
) are inner product spaces, then E
1
E
2
is an inner product
space, with (x
1
, y
1
), (x
2
, y
2
) = x
1
, x
2
1
+y
1
, y
2
2
.
Proposition 3.1.1. An inner product denes a norm on E by |x| =
_
x, x.
Proof. We prove each of the properties of a norm.
1. |x| 0, |0| = 0, |x| = 0 means x = 0.
24 Hilbert Spaces
2. |x| =
_
x, x =
_
x, x = [[
_
x, x = [[|x|.
3. To prove the triangle inequality, we need the Cauchy-Schwarz inequality, i.e.
[x, y[ |x||y| x, y E.
with equality i x, y are linearly dependent. To see this, consider
x + y, x + y = |x|
2
+ y, x +

x, y +[[
2
|y|
2
If y = 0, the claim is obvious. If y ,= 0, set =
x,y)
|y|
2
. Then
0 |x|
2
2
x, y
2
|y|
2
+
x, y
2
|y|
4
|y|
2
|x|
2
x, y
2
|y|
2
from which the inequality follows. To prove linear dependence, rst note that if x =
y, the inequality obviously holds. Now let [x, y[ = |x||y|, or equivalently that
[x, y[
2
= x, yy, x = x, xy, y. Then
y, yx x, yy, y, yx x, yy = 0.
Thus y, yx x, yy = 0, and so x and y are linearly dependent.
Now we can prove the triangle inequality.
|x + y|
2
= x + y, x + y = |x|
2
+|y|
2
+x, y +y, x
|x|
2
+|y|
2
+ 2[x, y[
|x|
2
+|y|
2
+ 2|x||y|
= (|x| +|y|)
2
A complete inner product space is called a Hilbert space. Some examples include C
n
,
2
and L
2
. Note that (C[0, 1] with the inner product inherited from L
2
is not a Hilbert space,
because it isnt complete (its possible to create a sequence of continuous functions s.t. the
limit is not continuous).
A very important and interesting Hilbert space is a certain kind of Sobolev space. Let U
be an open, bounded subset of R
n
. Then H
m
(U), the closure of (
m
(U) with respect to the
norm given by the inner product
f, g =
[[m
_
U
D
fD
gdx
where = (
1
, . . . ,
n
), N, [[ =
1
+ +
n
, and D
f =
d
||f
dx
1
1
dx
n
n
. These
spaces are important because theyre designed to study solutions for a large class of partial
dierential equations. More general denitions for these spaces are possible (one changes
the norm), but these modications cause the space to end up not being Hilbert.
Geometry 25
3.2 Geometry
We dicuss the geometry of Hilbert spaces, which is quite elegant and intuitive.
Proposition 3.2.1. (Parallelogram Law). Let E be an inner product space, x, y E. Then
|x + y|
2
+|x y|
2
= 2(|x|
2
+|y|
2
).
Proof. This follows from
|x + y|
2
= |x|
2
+x, y +y, x +|y|
2
|x y|
2
= |x|
2
x, y y, x +|y|
2
We say that x, y E in an inner product space are orthogonal (x y) if x, y = 0.
Then the following is obvious (and not just because its been taught in elementary school
for thousands of years).
Theorem 3.2.1. (Pythagorean Theorem). Let E be an inner product space, and let x y.
Then
|x + y|
2
= |x|
2
+|y|
2
.
Suppose you have S 1, where 1 is a Hilbert space. Then we denote the orthogonal
component of S by S
= y : y xx S = y : y S. Note that 0
= 1 and
1
= 0. Finally, x S S
implies that x, x = 0, so either the intersection is 0 or .

Obviously, if A and B are subspace, and A B, then A
B = 0.
Proposition 3.2.2. S
is a closed subspace of 1.
Proof. To see linearity, note that if , C and x, y S
, z S, then x + y, z =
x, z + y, z = 0, and thus x + y S
.
To see that its closed, note that if x
n
S
, x
n
x 1, then z S,
x, z = lim
x
, z
= lim
x
x
n
, z = 0.
Therefore x S
.
Theorem 3.2.2. (Closest point property). Let S be a closed and convex subset of a Hilbert
space 1. Then x 1, there exists a unique y S s.t. |x y| = inf
zS
|x z|.
Proof. Let y
n
S be s.t. |y
n
x| d > 0. Since
1
2
(y
n
+ y
m
) S, |x
1
2
(y
n
+ y
m
)|
d n, m N. Then
|y
n
y
m
|
2
= 4|x
1
2
(y
n
+ y
m
)|
2
+|y
n
y
m
|
2
4|x
1
2
(y
n
+ y
m
)|
2
= |(x y
n
) + (x y
m
)|
2
+|(x y
n
) (x y
m
)|
2
4|x
1
2
(y
n
+ y
m
)|
2
= 2(|x y
n
|
2
+|x y
m
|
2
) 4|x
1
2
(y
n
+ y
m
)|
2
(parallelogram)
2(|x y
n
|
2
+|x y
m
|
2
) 4d
2
0 as n, m .
26 Hilbert Spaces
Therefore (y
n
) is a Cauchy sequence, and has a limit y S, since the space is closed.
Obviously, |y x| = d.
If theres another y
1
S s.t. |y
1
x| = d, then repeating the previous calculation with
y
n
y and y
m
y
1
gets us |y
1
y|
2
2(|x y|
2
+ |x y
1
|
2
) 4d
2
= 0, which implies
y
1
= y.
A natural question to ask is whether theres anything such as an intrinsic characterization
of the closest point, similar to Euclidean space? The answer is yes, as the following theorem
shows
Theorem 3.2.3. Let S be a closed and convex subset of a real Hilbert space 1, and let
y S, x, 1. Then TFAE:
1. |x y| = inf
zS
|z x|.
2. x y, z y 0 z S.
Proof. First we prove that 1) implies 2). To see this, let z S, (0, 1). Then z + (1
)y S. Therefore
|x y| |z + (1 )y x|
= |x y + (y z)|
and so
|x y|
2
|x y|
2
+ 2x y, y z +
2
|y z|
2
giving us 2x y, y z |y z|
2
. Letting 0, we get x y, z y 0.
To prove that 2) implies 1), let x 1, y S s.t. they satisfy 2). Then
|x y|
2
|x z|
2
= y, y 2x, y +x, z z, z
= 2x, z y +y, y z, z
= 2x y, z y + 2y, z y +y, y z, z
= 2x y, z y + 2y, z y, y z, z
= 2x y, z y |y z|
2
0
therefore, |x y| |x z| z S.
If z y, n 0 z S, then n is the generalized normal to S.
Suppose you have a Hilbert space 1, and a closed subspace of this 1
0
.
Theorem 3.2.4. Every x 1 can be decomposed uniquely into x = x
0
+ x
0
where x
0

1
0
, x
0
1
0
.
The Riesz Representation Theorem 27
Proof. Since 1
0
1
0
= 0, if x = x
0
+x
0
and x = x
t
0
+x
t
0
, then since x
0
x
t
0
1
0
and
x
0
x
t
0
1
0
, and we have that the sum of these vectors is 0, we have uniqueness.
Now let x 1. Set x
0
= inf
z7
0
|x z|. This point is unique. We need to show that
x x
0
1
0
, or in other words, x x
0
, y = 0 y 1
0
.
Now, y 1
0
, x
0
+y 1
0
. So for C, |(xx
0
) y|
2
d
2
= |xx
0
|
2
. Therefore
x x
0
, y y, x x
0
+y, y 0
x x
0
, y y, x x
0
+[[
2
y, y 0
Take =
xx
0
,y)
y,y)
(if y = 0, theres nothing to prove). Then
2
[x x
0
, y[
2
y, y
+
[x x
0
, y[
2
y, y
0
[x x
0
, y[
2
y, y
0
which implies that x x
0
, y = 0.
We denote the decomposition by 1 = 1
0
1
0
(i.e. its a direct sum). Note that 1
0
doesnt have to be compact. We say that 1 has an orthogonal decomposition into 1
0
and
1
0
.
Dene a projection of 1 onto 1
0
by Px = x
0
where x = x
0
+ x
0
is the orthogonal
decomposition. P is called the orthogonal projection onto 1
0
. This is a bounded linear
operator, and has the property |P| = 1.
Consider the following example. Let S be a closed subset of 1. If S is not convex, given
x 1, there may not exist a point y S s.t. |y x| = inf
zS
|xz|. To see this, note that
if 1 =
2
, let S = 0, . . . , 0, 1 +
1
n
, 0, . . . n N. Then S is bounded and closed (because
this is a discrete space). If x is the origin, then dist(x, s) = 1, but there exists no element in
S that achieves this.
Proposition 3.2.3. Let 1 be a Hilbert space, and let S be a closed subspace of 1. Then
(S
= S.
Proof. Let x S. Then x, z = 0 z S
. Therefore x (S
S S
. Now let
x S
. Since S is closed, x = y + z, where y S, z S
. Since S S
, z = x y.
But z S
and S
= 0. So z = 0 implies that x = y, which means S
S.
3.3 The Riesz Representation Theorem
Let 1 be a Hilbert space, and 1
be its dual, i.e. the space of continuous linear functionals

dened on the space. Dene a mapping x , x 1
. Now dene a linear map T(x)(y) =

y, x. This is a conjugate linear isometry, i.e.
T(x + z)(y) = y, x + z
=

y, x + y, z
=

T(x) + T(z)
28 Hilbert Spaces
This linear operator has the norm
sup
|y|1
y, x = |x|
One of the most remarkable results in analysis is the following:
Theorem 3.3.1. (Riesz Representation Theorem). T is onto 1
.
Proof. Let f 1
. If f 0, then f = T(0). If f ,= 0, then A(f) (the null space of f) is

closed and a proper subspace of 1.
Let y 1, y A(f). Then f(y) ,= 0. Therefore, x 1, x
f(x)
f(y)
y A(f), so
x
f(x)
f(y)
y, y = 0. Thus
x, y =

f(x)f(y)|y|
2
_
x,
f(y)
|y|
2
y
_
= f(x)
i.e. f = T
_

f(y)
|y|
2
, y
_
We can state this in another way: f 1
, there exists a unique x

0
s.t. f(x) = x, x
0
.
In fact, H
is a Hilbert space with the inner product

f, g
= T
1
(f), T
1
(g)
3.3.1 Duality in Banach Spaces
These comments apply to all Banach spaces. Let X be a Banach space. Then X
and X
are duals.
We examine the topologies on these spaces. For the space X, we have
1. Norm topology: x
n
x if |x
n
x| 0.
2. Weak topology: x
n
x if f X
, f(x
n
) f(x). This is known as convergence in
the weak topology, and is basically the weakest topology in which all functions in X
are still continuous.

For the space X
, we have
1. Norm topology: f
n
f if |f
n
f|
0, where the latter is the norm induced by the

inner product mentioned above.
2. Weak topology: f
n
f if F X
, F(f
n
) F(f).
3. Weak
sequential convergence: f
n
f if x X, f
n
(x) f(x).
1
1
This is basically pointwise convergence.
The Riesz Representation Theorem 29
We dene the embedding T : X X
by
T (x)(f) = f(x) f X
Since |T (x)| sup

|f|1
[T (x)(f)[ = sup
|f|1
[f(x)[ = |x| this is an isometry.
If T (X) = X
we say that X is reexive. If X is relexive, the weak and weak
topologies
in X
are the same.

Theorem 3.3.2. (Banach-Alaoglu). Let X be a Banach space, K
= f X
: |f|
1.
Then K
is weak
compact.
The proof of this theorem is somewhat involved, and we wont present it here. We will
prove this result for Hilbert spaces in the chapter on compact operators, where its quite
useful.
If 1 is a Hilbert space, we can state something stronger, mainly that 1 is relexive. Note
that this does not follow from the fact that 1 1
!
By the Riesz Representation Theorem, every f 1
is of the form x x, x
0
for some
x
0
1. Then x
n
x i x
n
, y x, y y 1.
1
is a Hilbert space with the inner product f, g
= T
1
(f), T
1
(g). Dene the
mapping x T(x), T(x)(z) = z, x, y 1
, y(x) = x, T
1
(y). Dene a similar mapping
S for x
t
X
, which gives us x
t
S(x
t
), S(x
t
)(y) = y, x
t
. This maps elements of 1
into 1
.
We need to show that maps T and S are both 1-1 and onto. This is the same as showing
that S T : 1 1
is 1-1 and onto. But we have, for x 1 and y 1
S((T(x)))(y) = y, T(x)
= T
1
(y), T
1
(T(x))
= T
1
(y), x
= x, T
1
(y)
= y(x)
= T(x)(y)
Therefore T (1) = 1
, i.e. 1 is reexive.
We end this section with some comments on weak convergence in Hilbert spaces 1.
1. If x
n
x, then x
n
x. The opposite is not true in general. Consider the example in
2
:
x
n
= (0, . . . , 0, 1, 0, . . . )
y = (y
1
, . . . , y
n
, . . . )
30 Hilbert Spaces
Since y
2
, we know that
i
|y
i
|
2
< . Therefore
x
n
, y = y
n
0 y
2
By the RRT, we have that x
n
0, but x
n
does not go to 0, because |x|
n
= 1 n N.
2. If (x
n
) is weakly convergent, then its bounded, i.e. M s.t. |x
n
| M n N.
3. If x
n
x and |x
n
| |x|, then x
n
x. In fact, | | is weakly lower semicontinuous.
4. Let S 1 be a subset s.t. span(S) is dense in 1. If (x
n
) is a bounded sequence s.t.
y S, x
n
, y x, y, then x
n
x.
3.4 Orthogonality and Orthonormal Systems
Let E be an inner product space, S E. Then S is called an orthogonal system if x, y
S, x y. If in addition |x| = 1 x S, we say that the system is orthonormal . Note
that if the orthogonal system S only contains nonzero elements, we can convert it into the
orthonormal system

S =
x
|x|
: x S.
Proposition 3.4.1. Orthogonal systems consisting of nonzero elements are linearly inde-
pendent.
Proof. If
1
x
1
+ +
n
x
n
= 0 (x
1
, . . . , x
n
S,
1
, . . . ,
n
C), then
1
x
1
+ +
n
x
n
, x
i
=
i
|x
i
|
2
,so
i
= 0 i = 1, . . . , n.
Some examples of these systems:
1.
2
: e
i
, i = 1, 2, . . .
2. L
2
(1 , ) :
1
2
e
it
: n = 0, n = 1, +1 . . . .
3. Legendre polynomials
4. Hermite polynomials
3.4.1 Gram-Schmidt
Once can convert a sequence (v
n
) of linearly independent elements into an orthonormal
sequence (u
n
) s.t. spanv
1
, . . . , v
n
= spanu
1
, . . . , u
n
n 1. This is called the Gram-
Schmidt procedure.
One can show this inductively. The base case is u
1
=
v
1
|v
1
|
. Suppose now that weve found
u
1
, . . . , u
n
with the required properties. Then we set
u
n
=
v
n
proju
1
, . . . , u
n
v
n
|v
n
proju
1
, . . . , u
n
v
n
|
=
v
n
n1
i=1
v
n
, u
i
u
i
|v
n
n1
i=1
v
n
, u
i
u
i
|
Orthogonality and Orthonormal Systems 31
Which gives us
u
n
, u
k
=
v
n
, u
k
v
n
, u
k
| |
= 0.
Hence spanv
1
, . . . , v
n
= spanu
1
, . . . , u
n
.
3.4.2 Bessels Inequality
We now prove an important inequality for Hilbert spaces.
Theorem 3.4.1. (Bessels Inequality). Suppose u
A
is an othonormal set (which is not
necessarily countable). Then x 1,
A
[x, u
[
2
|x|
2
. In particular, : x, u
,=
0 is countable.
Proof. We show that
F
[x, u
[
2
|x|
2
F A, where F is nite.
We have that
0
_
_
_
_
_
x
F
[x, u
[
2
_
_
_
_
_
2
= |x|
2
2 Re
_
x,
F
x, u
_
+
_
_
_
_
_
F
x, u
_
_
_
_
_
2
= |x|
2
2
F
[x, u
[
2
+
F
[x, u
[
2
For the second part, we can write the following
: x, u
, = 0 =
_
n=1
_
: x, u

1
n
_
Note that each set on the right hand side is nite. Therefore, the set is at most countable.
We can now prove the following theorem:
Theorem 3.4.2. Let u
be an orthonormal set. Then the following are equivalent:

1. (Completeness). If x, u
= 0 A, then x = 0.
2. (Parsevals Identity). |x|
2
=
A
[x, u
[
2
x 1.
3. x =
A
x, u
x H, where the sum has only countably many nonzero elements

and converges in norm regardless of how the terms are ordered.
32 Hilbert Spaces
Proof. We rst prove that 1) implies 3). Let (
i
) be a sequence of all elements s.t. x, u
i
, =
0. We know from the previous proof that this sequence is countable. By Bessels inequality,
i=1
[x, u
i
[
2
converges. Then by the Pythagorean theorem, we have
m
i=n
[x, u
i
u
i
[
2
=
m
i=n
[u, u
i
[
2
0 as n, m .
Therefore
i=1
x, u
i
u
i
is convergent, since 1 is complete.
Set y = x
i=1
x, u
i
u
i
. Then y, u
is 0 if =
i
and 0 otherwise anyway. There-
fore u, u
= 0 A, implying y = 0. Hence x =
i=1
x, u
i
u
i
.
We now prove 3) implies 1). By Bessels proof,
|x
n
i=1
x, u
i
u
i
|
2
= |x|
2
i=1
[x, u
i
u
i
[
2
0 as n .
That 2) implies 1) is obvious.
An orthonormal set u
A
having any of these properties is called an orthonormal
basis. A typical example is e
i
i=1

2
.
Theorem 3.4.3. Every Hilbert space 1 has an orthonormal basis.
Proof. Let / be a set of all orthonormal sets of 1. Let /
0
be a linearly ordered subset of
/. Then
,
0
is an orthonromal subset of 1. This set contains /
0
and is thus an upper
bound for it. Therefore, by Zorns lemma, / has a maximal element
0
. We claim this is
an orthonormal basis. To prove this, we must have that
0
= 0.
Suppose this isnt the case. Then there exists z ,= 0, |z| = 1 s.t. z
0
. Then
0
Z
is an orthonormal family bigger than
0
, which is a contradiction.
3.5 Separability
Recall that a metric space is called separable if it has a countable dense subset.
Theorem 3.5.1. A Hilbert space 1 is separable i it has a countable orthonormal basis, in
which case every orthonormal basis is countable.
Proof. () Let (x
n
) be a countable dense subset of 1. Discarding nonzero elements, using
Gram-Schmidt, we can nd an orthonormal sequence (u
n
). Suppose that for n 1, weve
found u
1
, . . . , u
k
which are orthonormal and s.t.
spanx
1
, . . . , x
n1
= spanu
1
, . . . , u
k
If x
n
spanu
1
, . . . , u
k
, discard it and try another. Otherwise, use Gram-Schmidt to
nd u
k+1
. Then we have that the span of these orthonormal vectors is dense in 1.
Separability 33
() If u
A
is an orthonormal basis, then all nite linear combinations of this sequence
with coecients in Q + iQ are dense in 1. If v
A
is another othonormal basis, then
by the completeness of u
n
, , n s.t. v
, u
n
,= 0. Therefore, if we set /
n
= / :
v
, u
n
, = 0 (each of which is nite by Parsevals identity), we have that / =
n=1
A
n
is
countable.
Theorem 3.5.2. If 1 is a separable Hilbert space, then 1 is isometrically isomorphic to
either C
n
or
2
.
Proof. Suppose dim(1) < . Let x
1
, . . . , x
n
be an orthonormal basis. We have that
e
1
, . . . , e
n
is the standard basis of C
n
. Let T : 1 C be given by T (
n
i

i
x
i
) =
n
i=1
i
e
i
.
If dim(1) = , and
n
is an othonormal basis of 1, we have that the standard basis
of
2
is e
i
. Then dene T (
i

i
x
i
) =
n
i=1
i
e
i
, where each
i
= x
i
, u
i
. T is an
isometry, and is onto. If z
2
, we can rewrite it as z =
i
c
i
e
i
,
i
[c
i
[
2
< . Then
T(z) = T (
i=1
c
i
u
i
) 1.
Every Hilbert space is isometrically isomorphic to
2
(/) i.e. f : / C,
,
[f()[
2
<
.
Chapter 4
Operators
We now begin the formal study of linear operators. In this chapter well cover examples,
talk about a very important formalism, and end with a very useful result.
4.1 Examples
Let X be a Banach space. Then B(X) = T : X X, T being linear and bounded is a
Banach space with |T| = sup
|x|1
|T(x)|.
1. Linear operators on nite dimensional spaces:
For X = C
n
, A B(C
n
), every linear operator is given by a matrix. In fact, on a
nite dimensional space, every linear operator is bounded.
2. Dierentiation operator:
Let X = L
2
(0, 1), and Tf = f
t
. This operator is linear, but can be unbounded. To see
this, let f(x) = x
n
. Then Tf = nx
n1
, and
|nx
n1
|
L
2
(0,1)
=
__
1
0
n
2
x
2(n1)
_
1
2
=
n
2n 1
and
|x
n
|
L
2
(0,1)
=
__
1
0
x
2n
_
1
2
=
1
2n 1
Thus |f| =
1
2n1
0 while |Tf| =
n
2n1
as n . Therefore the operator
is unbounded.
Banach Algebras 35
3. Integral operator:
Let K L
2
( ), R
n
, and let X = L
2
(). Dene the operator T : L
2
()
L
2
() s.t. T
x
(t) =
_
k(t, s)x(s)ds. This is a bounded operator. To see this, note that

|Tx|
2
L
2
()
=
_
__
k(t, s)x(s)ds
_
2
dt
__
[k(t, s)[
2
ds
___
[x(s)[
2
ds
_
dt
=
_
[k(t, s)[
2
dsdt
_
[x(s)[
2
ds
= M|x|
2
L
2
()
Therefore T is bounded and |T|
M.
4. Multiplication operators:
T : L
2
() L
2
(). Let h L
(), and dene Tf = hf. Then

|Tf|
L
2
()
=
__
[h(t)f(t)[
2
_1
2
|h|
L
()
__
[f(t)[dt
_1
2
= |h|
L
()
|f|
L
2
()
Hence the operators bounded.
4.2 Banach Algebras
Operator algebras are one of the central concepts in modern operator theory. However, since
these are introductory notes, we wont be delving into the concept.
Recall that a commutative ring is a set that is equipped with two operations called
addition and multiplication. The ring has to be an abelian group under addition, and
commutative (and obviously closed) under multiplication. In our case, the ring is a vector
space. The Banach algebra is an associative algebra over a commutative ring
1
, which allows
the multiplication and addition of vectors in an associative and distributive manner. We
have already dened an addition operator. Now we need to come up with a multiplication
operator.
Given the set B(X) over a Banach space X, we can dene another operation, the com-
position of 2 operators. If A, B B(X), then A Bx = A(Bx) = ABx.
Proposition 4.2.1. If A, B B(X), then AB B(X).
1
See the appendix for an explanation.
36 Operators
Proof.
sup
|x|1
|ABx| sup
|x|1
|A||Bx|
sup
|x|1
|A||B||x|
Therefore our multiplication operator is now closed.
Dene A
0
:= I, A
1
:= A, A
n
:= AA
n1
. Note that the product of two operations is not
necessarily commutative. A simple example would be the dierentiation operator. Note also
that we have the inequality
|AB| |A||B|
We can now dene an algebra.
Let X be a Banach space. Let : X X X be an operation s.t.
1. x (y z) = (x y) z
2. (x) (y) = ()(x y) , C, x, y X.
3. (x + y) z = x z + y z and x (y + z) = x y + x z x, y, z X.
4. is continuous w.r.t. each variable. This statement is equivalent to requiring that
|x y| M|x||y| (see below)
2
.
5. X has a unit element e s.t. x e = e x x X.
Some people dont require the last statement. For us, a Banach space that meets the
above requirements is a Banach algebra.
We now drop the notation; i.e. x y = xy.
Proposition 4.2.2. Suppose a unit element does not exist. If so, dene

X = X e =
(x, ) : x X, C with (x, )(y, ) = (xy +y +x, ) and |(x, )| = |x| +[[. This
is a Banach algebra.
Proof. The rst three properties are trivial to verify. For property 4, note that
|(x, )(y, )| = |xy + yx| +[[
|xy| +[[|y|[[|x| +[[[[
|x||y| +[[|y|[[|x| +[[[[
= (|x| +[[)(|y| +[[)
= |(x, )||(y, )|
Finally, (0, 1) is the unit element, because (x, )(0, 1) = (x, ).
2
If |x y| M|x||y|, then after renormalization, we can assume that |x y| |x||y|.
Bilinear Functionals and Quadratic Forms 37
A Banach algebra X is commutative if xy = yx x, y X.
Proposition 4.2.3. Let 1 be a separable Hilbert space, and A B(1). Then A can be
represented by an innite matrix.
Proof. Let e
i
i=1
be an orthonormal basis of 1. Dene
ji
= Ae
j
, e
i
. Let x =
j=1
x, e
j
e
j
.
Then
Ax, e
i
= lim
n
_
A
_
n
j=1
x, e
j
e
j
_
, e
i
_
= lim
n
n
j=1
x, e
j
Ae
j
, e
i
= lim
n
n
j=1
ji
x, e
j
j=1
ji
x, e
j
Therefore Ax, e
i
=
i=1
x, e
j
Ae
j
, e
i
=
i=1
ij
x, e
j
.
Thus Ax =
i=1
j=1
ji
x, e
j
e
i
.
4.3 Bilinear Functionals and Quadratic Forms
Let E be a vector space. Then : E E C is a bilinear functional if
1. (x
1
+ x
2
, y) = (x
1
, y) + (x
2
, y).
2. (x, y
1
+ y
2
) = (x, y
1
) +

(x, y).
Some examples if E is a Hilbert space include the inner product, and if A, B B(H),
then other examples include Ax, y, x, By, Ax, By.
A bilinear functional : E E C is called
1. symmetric if (x, y) = (y, x).
2. positive if (x, x) 0 x E.
3. strictly positive if (x, x) > 0 x E.
4. If E is a normed space, and M 0 s.t. [(x, y)[ M|x||y| x, y E, then the
functional is bounded.
38 Operators
We can dene the norm for these functionals as || = sup
|x1|,|y|1
[(x, y)[. One can
show that || is the smallest M such that the last property above is satised.
Now let E be a vector space, : EE C be a bilinear functional. Then the quadratic
form associated with this functional is : E C, where (x) = (x, x).
If E is a normed space, we say that is bounded if k > 0 s.t. [(x)[ k|x|
2
. The
norm on this form is dened by || = sup
|x|1
[(x)[. Its clear that || is the smallest k
possible for the bound.
The following theorem and its corollary are easily veriable.
Theorem 4.3.1. (Polarization Identity). Let be a bilinear functional on a vector space
E. Then
(x, y) =
1
4
[(x + y) (x y) + i(x + iy) i(x iy)] .
Corollary 4.3.1. Let
1
,
2
be bilinear functionals on E. If
1
(x, x) =
2
(x, x) x E,
then
1
=
2
. In particular, if A and B are two linear operators on a Hilbert space 1 and
Ax, x = Bx, x x 1, then A = B.
Theorem 4.3.2. A bilinear functional on E is symmetric i the quadratic form is real.
Proof. () If (x, y) = (y, x), then (x, x) = (x, x) implies that is real.
() If the quadratic form is real, i.e. (x) = (x) x E, dene (x, y) = (y, x), which
is a bilinear functional. Then its associated quadratic form (x) = (x, x) = (x) x E.
Therefore, by the corollary, = , and thus its symmetric.
Theorem 4.3.3. Let E be a normed space, and : E E C be a bilinear functional.
Then is bounded i is bounded. Moreover, we have || || 4||.
Proof. We have that || = sup
|x|1
[(x)[ = sup
|x|1
[(x, x)[ ||.
Now using polarization, we have
[(x, y)[ =
1
4
[(x + y) (x y) + i(x + iy) i(x iy)]
1
4
||
_
|x + y|
2
+|x y|
2
+|x + iy|
2
+|x iy|
2
1
4
||4(|x| +|y|)
2
Therefore || sup
|x|1,|y|1
||(|x| +|y|)
2
= 4||. Thus the norms are equivalent.
Theorem 4.3.4. If a bilnear functional on an inner product space E is symmetric and
bounded, then || = ||.
Bilinear Functionals and Quadratic Forms 39
Proof. We just need to prove that || ||. Since is symmetric, is real, and so, using
polarization, we have
1
4
[(x + y) (x y) + i(x + iy) i(x iy)]
where the rst two terms are real and the last two are imaginary. Then Re (x, y) =
1
2
((x+
y) (x y)). Using the parallelogram law, we have
[ Re (x, y)[
1
4
||
_
|x + y|
2
+|x y|
2
=
1
2
||
_
|x + y|
2
Now let |x|, |y| 1, and let C s.t. [[ = 1 and [(x, y)[ = (x, y). Then
[(x, y)[ = (x, y)
= (x, y)
= Re (x, y)
1
2
||(|x|
2
+|y|
2
)
Therefore || sup
|x|1,|y|1
1
2
||(|x|
2
+|y|
2
) = ||.
Quadratic forms are used extensively to study linear operators. We now give some
examples of this.
Theorem 4.3.5. Let A be a bounded linear operator on an inner product space E. Then the
bilinear functional (x, y) = x, Ay is bounded and |A| = ||.
Proof. () We have
[(x, y)[ = [x, Ay[
|x||A||y|
And so || |A|.
() Given
|Ax|
2
= [Ax, Ax[
= [(Ax, x)[
|||Ax||x|
Therefore |Ax| |||x| x E, which means that |A| ||.
These functionals are quite special, as is shown by the next theorem.
40 Operators
Theorem 4.3.6. Let be a bounded bilinear functional on a Hilbert space 1. Then there
exists a unique bounded operator A on 1 s.t. (x, y) = x, Ay x, y 1.
Proof. Let y 1 be xed. Then (, y) is a bounded linear functional on 1. By the Riesz
representation theorem, there exists a unique element Ay 1 s.t (x, y) = x, Ay x 1.
Also,
1. A is linear.
Let x, y
1
, y
2
1 , C. Then
x, A(y
1
+ y
2
) = (x, y
1
+ y
2
)
= (x, y
1
) +

(x, y
2
)
= x, Ay
1
+

x, Ay
2
= x, Ay
1
+ Ay
2
Therefore A(y
1
+ y
2
) = Ay
1
+ Ay
2
.
2. A is bounded.
[x, Ay[ = [(x, y)[
|||x||y|
So if |Ay|
2
= Ay, Ay |||Ay||y|, this implies that |Ay| |||y| y 1.
3. A is unique. Let B be another bounded operator s.t. x, Ay = x, By x, y 1.
Then x 1, x, Ay By = 0 x, y 1. Thus Ay By = 0.
We make some nal remarks.
We know that if 1 is a separable Hilbert space, then a bounded linear operator on 1 is
given by an innite matrix, whose entries (
ij
) = (Ae
j
, e
i
), where e
j
j=1
is an orthonormal
basis. So in order to dene a bounded operator, we only need to specify Ae
j
.
Take a set of vectors x
j
j=1
and set Ae
i
= x
j
. If x
j
is not bounded, then clearly A
is unbounded. However, whats surprising is that even if x
j
is bounded, you can have an
unbounded A. To see this, consider
2
and set Ae
j
= e
1
. Then sup
j
|Ae
j
| = 1. Now take
A(1,
1
2
, . . . ,
1
n
, 0 . . . ) = e
1
+
1
2
e
1
+
= (1 + +
1
n
)e
1
Therefore the norm of the above quantity goes to innity as n does. However, |(1,
1
2
, . . . ,
1
n
, 0 . . . )| =
n
j=1
1
j
2
<
j=1
1
j
2
< .
This fact is true, however, in
1
.
Elliptic Operators 41
4.4 Elliptic Operators
Given a normed space E, a bilinear functional : E E C is called elliptic if > 0 s.t.
(x, x) |x|
2
. Some examples are
1. If E = C
n
, then every strictly positive bilinear functional is elliptic. This is not true
in innite dimensional spaces. Consider :
2

2
C s.t. (x, y) = x, Ay, where
Ae
n
=
1
n
e
n
. Then (e
n
, e
n
) =
1
n
0 as n .
2. Let E = L
2
(), z L
() be s.t. z c > 0. Then if

(x, y) =
_
x(t)y(t)z(t)dt
we have that
(x, x) =
_
x(t)x(t)z(t)dt
=
_
[x(t)[
2
z(t)dt
c|x|
2
L
2
()
Theorem 4.4.1. Let A be a bounded operator on a Hilbert space 1. Suppose that > 0
s.t. |x|
2
x, Ax x 1. Then A
1
exists and is a bounded operator s.t. |A
1
|
1
.
Proof. We need to prove four things.
1. A is 1-1: Note that |x|
2
x, Ax |x||Ax|. So |x| |Ax| implies that
|x|
|Ax|
x 1, and thus the kernel is trivial, and A is 1-1.

2. 1(A) is closed: Let x
n
1 and Ax
n
y. Then
|x
n
x|
|A(x
n
x)|
=
|Ax
n
Ax)|
0 as n, m
Thus (x
n
) is Cauchy, and thus it converges to x 1. Since A is continuous (i.e. its a
bounded linear operator), this implies that Ax = y, and thus y lies in the range of A.
3. 1(A) = 1: Suppose this isnt true. Then 1(A)
,= 0. Let 0 ,= y 1(A)
. Then
y, Ax = 0 x 1. In particular, y, Ay = 0. But y, Ay |y|
2
, which implies
that y = 0, giving a contradiction. Thus A
1
exists and is linear.
4. A is bounded: Recall that |x|
|Ax|
. Set x = A
1
y. Then |A
1
y|
|AA
1
y|
=
y
.
42 Operators
4.4.1 The Lax-Milgram Theorem
This is a very important result in operator theory. Its primary use is in the modern theory
of partial dierential equations. The version we prove is not as powerful as some other
extensions.
Theorem 4.4.2. (Lax-Milgram). Let be a bounded, elliptic bilinear functional on a Hilbert
space H. Then for every bounded linear functional f on H, there exists a unique x
f
H
s.t. f(x) = (x, x
f
).
Proof. We know that there exists a unique bounded linear operator A s.t. (x, y) = x, Ay.
By the previous theorem, A
1
is bounded. By the Riesz Representation theorem, there exists
a unique x
0
H s.t. f(x) = x, x
0
. Set x
f
= A
1
x
0
. Then f(x) = x, x
0
= x, AA
1
x
0
=
x, Ax
f
= (x, x
f
). Uniqueness follows from the uniqueness of x
0
.
Later, we will give an example application to PDEs.
Chapter 5
Adjoint Operators
The adjoint of a linear operator is a generalization of the conjugate transpose of square
matrices to innite dimensional cases.
5.1 Basics
Let X, Y be normed spaces, and let T : X Y be a bounded linear operator.
Theorem 5.1.1. Under the above assumptions, there exists a unique operator T
B(Y
, X
)
s.t.
Tx, y
Y,Y
)
= x, T
X,X
)
x X, y
.
Here, the inner product refers to a duality pairing, i.e.
Tx, y
Y,Y
)
= y
(Tx)
x, T
X,X
)
= T
(x)
We will drop the extravagant notation here, noting that the duality pairings will become
obvious from the context. The operator T
is called the adjoint of T.

Proof. The formula above denes T
uniquely. We have that T
= y
T, which follows
from the composition of two linear and continuous maps. Also, x, T
= y
T(x) =
Tx, y
.
For linearity, we have
x, T
(y
1
+ y
2
) = Tx, y
1
+ y
= Tx, y
1
+Tx, y
= x, T
1
+x, T
= x, T
1
+ T
2
) x 1.
Therefore, T
(y
1
+y
2
) = T
1
+T
2
. In the same way, we can show that T(y
) = T(y
).
44 Adjoint Operators
Recall that given an operator T, its norm can also be given by |T(x)| = sup
|z|1
z, x =
|x|. Thus,
|T| = sup
|x|1
|Tx|
= sup
|x|1,|y
|1
Tx, y
= sup
|x|1,|y
|1
[x, T
[
= sup
|y
|1
|T
|
= |T
|
In the general case, adjoints arent very interesting. In the Hilbert space case, we can
escape from the duality pairing concept, which is awkward in general. Basically, suppose
one is given a Hilbert space 1, and a map T : 1 1 s.t. T B(1, 1) (or simply B(1)).
By the Riesz representation theorem, we can identify 1 with 1
, which means the duality

pairing simply becomes a regular inner product. Applying the previous theorem to this case,
we get that T B(1), there exists a unique T
B(1) s.t. Tx, y = x, T
y x, y 1,
where this is the regular inner product. Moreover, |T| = |T
|.
Here are the basic properties of these operators.
Proposition 5.1.1. Suppose A, B B(1). Then
1. (A + B)
= A
+ B
.
2. (A)
=

A
.
3. (A
= A.
4. I
= I.
5. (AB)
= B
.
6. |A
A| = |A|
2
.
7. A(A) = 1(A
, A(A
) = 1(A)
.
Proof. We prove each in turn.
1. This follows from
(A + B)x, y = Ax, y +Bx, y
= x, A
y +x, B
y
= x, (A
+ B
)y x, y 1.
2. Ax, y = Ax, y = x, A
y = x,
y.
Self-Adjoint Operators 45
3. We have
x, (A
y = A
x, y
= y, A
x
= Ay, x
= x, Ay x, y 1.
4. Obvious.
5. x, (AB)
y = (AB)x, y = Bx, A
y = x, B
y.
6. For the rst part, note that |A
A| |A||A| = |A|
2
. For the second, note that
|Ax|
2
= Ax, Ax |x||A
Ax| |x|
2
|A
A|. Now take the supremum. Then we

get |A|
2
= sup
|x|1
|Ax|
2
sup
|x|1
|A
A|
2
|x|
2
= |A
A|.
7. y A(A
) implies A
y = 0, meaning x, A
y = 0 x 1. This in turn gives us

that Ax, y = 0 x 1, and thus y 1(A)
. The second equality can be proven

similiarly.
5.2 Self-Adjoint Operators
If A = A
, we say that A is self-adjoint. Here are some examples

1. Matrices:
Let 1 = R
n
or C
n
. Then every A B(1) is given by a matrix. One can show that
(
ij
) = (
ij
).
2. Integral Operators:
Let T : L
2
() L
2
(), k L
2
( ). Then if T
x
(t) =
_
k(t, s)x(s)ds, we have

Tx, y =
_
k(t, s)x(s)dsy(t)dt
=
_
k(t, s)x(s)y(t)dsdt
=
_
k(t, s)y(t)dtx(s)ds
=
_
k(t, s)y(t)dtx(s)ds
= c
x
_
k(t, s)y(t)dt
And thus T
y(s) =
_
k(t, s)y(t)dt.
3. Multiplication Operator:
T : L
2
() L
2
(), z L
(). If Tx = zx, dene

Tx, y =
_
x(t)z(t)y(t)dt
=
_
x(t)z(t)y(t)dt
= x, T
y
where T
y = zy. Therefore T = T
in those spaces where z = z.

Proposition 5.2.1. Let A be a bounded linear operator on a Hilbert space 1. Dene
(x, y) = x, Ay. Then is symmetric i A = A
.
Proof. We just need to show both directions.
()Ax, y = y, Ax = (y, x) = (x, y) = x, Ay.
() (x, y) = x, Ay = Ax, y = y, Ax = (y, x).
Theorem 5.2.1. Given a Hilbert space 1, let A, B B(1). Then
1. A
A, A + A
are self-adjoint.
2. If A = A
, B = B
, then (AB) = (AB)
i AB = BA.
Proof. 1. We have that (A
A)
= A
(A
= A
A, and (A + A
= A
+ A
= A
+ A.
2. () Let AB = BA. Then ABx, y = Bx, Ay = x, BAy = x, ABy.
() We have that x, BAy = x, B
y = x, (AB)
y = x, ABy.
Corollary 5.2.1. If A = A
, then every operator a

0
I + a
1
A + + a
n
A
n
is self adjoint if
a
1
, . . . , a
n
R.
Given a Hilbert space 1, let A : T(A) 1 1, where T(A) is also a subspace.
In this case, we say that a linear operator B : T(B) 1 1 is the adjoint of A if
Ax, y = x, By x T(A), y T(B).
Consider this example: let 1 = L
2
(, ), and D =
d
dt
, Dx =
dx
dt
, T(D) = C
1
0
(R). Then
we have
Dx, y =
_

x
t
(t)y(t)dt
=
_

x(t)(y(t))
t
dt
=
_

x(t)(y(t))
t
dt
= x, D
y
where D
y =
dy
dt
, T(D
) = C
1
0
(R) (i.e. D
= D).
This is another class of operators: if A B(1) and A
= A, it is called anti-Hermitian,
or skew-adjoint.
Theorem 5.2.2. For every T B(1), with 1 being a Hilbert space, there exists unique
bounded linear self-adjoint operators A, B s.t.
T = A + iB, T
= A iB
Proof. Set A =
1
2
(T + T
), B =
1
2i
(T T
). Its clear that A and B are self-adjoint.

For the uniqueness, suppose that there are other operators A
1
, B
1
with these properties.
Then 0 = (AA
1
) +i(B B
1
), and so 0 = (AA
1
)x, x +i(B B
1
)x, x x 1. Both
these quantities are real, and so both are 0 x 1. But this gives us
x, (A A
1
)y = 0
x, (B B
1
)y = 0 x, y 1.
Thus A A
1
= 0 and B B
1
= 0.
Theorem 5.2.3. If 1 is a Hilbert space, and T B(1) s.t. T = T
. Then |T| =
sup
|x|1
[Tx, x[ =: M.
Proof. Let M := sup
|x|1
[Tx, x[. () M sup
|x|1
|Tx||x| sup
|x|1
|T||x|
2
|T|.
() Let x 1 and Tx ,= 0. Set =
_
|Tx|
|x|
_1
2
and z =
Tx
. Then
|Tx|
2
= Tx, Tx
= T(x), z
=
1
4
[T(x + z), x + z T(x z), x z]
1
4
M
_
|x + z|
2
+|x z|
2
=
1
2
M
_
|x|
2
+|z|
2
(parallelogram law)
=
1
2
M
_
2
|x|
2
+
1
2
|Tx|
2
_
=
1
2
M [|Tx||x| +|Tx||x|]
M(|Tx||x|)
Therefore |Tx| M|x|, and |T| M.
5.2.1 Invertible Operators
Let E be a vector space, and let there be a map A : T(A) E. Then B : 1(A) E is
called the inverse of A if ABx = x x 1(A) and BAx = x x T(A). We denote the
inverse of A by A
1
.
Obviously T(A
1
) = 1(A), 1(A
1
) = T(A). It is trivial to verify that the inverse is
unique.
Proposition 5.2.2. 1. A
1
is a linear operator.
2. A is invertible i Ax = 0 implies that x = 0.
3. If A is invertible and x
1
, . . . , x
n
is linearly independent, then Ax
1
, . . . , Ax
n
is also
linearly independent.
4. If A, B are invertible, so is AB, and (AB)
1
= B
1
A
1
. Also, T(AB) = x s.t.
x T(B), Bx T(A).
Proof. We prove the last two parts.
For 3), note that 0 =
1
Ax
1
+ +
n
Ax
n
= A(
1
x
1
+ +
n
x
n
). By 2),
n
= 0.
For 4), let ABx = 0. Then by 2), Bx = 0 x = 0, and thus AB is invertible.
On T(AB), we have B
1
A
1
AB = B
1
(A
1
A)B = I.
On 1(AB), we have ABB
1
A
1
= A(BB
1
)A
1
= I.
If dim(E) = n and A : E E is invertible, then 1(A) = E. This is not true if
dim(E) = . Consider
1. Let E =
2
, and let T be the right shift operator. This is 1-1, the map is invertible,
but the range isnt the same.
2. Let E =
2
, and let T(x
1
, x
2
, . . . ) =
_
x
1
,
x
2
2
, . . . ,
x
n
n
, . . .
_
. Then |T| = 1, and T
1
exists, and is given by T
1
(x
1
, x
2
, . . . ) = (x
1
, 2x
2
, . . . , nx
n
, . . . ). Thus T
1
is an un-
bounded operator, and T(T
1
,= E).
5.2.2 Normal Operators
Let 1 be a Hilbert space. A B(1) is called normal if AA
= A
A.
Theorem 5.2.4. T B(1) is normal i |T
x| = |Tx| x 1.
Proof. We have |T
x|
2
= T
x, T
x = TT
x, x (from the quadratic form).

Similarly, |Tx|
2
= Tx, Tx = T
Tx, x.
Therefore, |T
x| = |Tx|
2
i T is normal.
This may not be true in real spaces. For example, Sx, x = 0 may not imply S = 0.
__
0 1
1 0
__
x
y
_
,
_
x
y
__
=
__
y
x
_
,
_
x
y
__
= yx yx = 0
5.2.3 Isometric Operators
A B(1) is an isometry if |Ax| = |x| x 1.
Theorem 5.2.5. T B(1) is an isometry i T
T = I.
Proof. (Rightarrow) If |Tx| = |x| x H, then |Tx|
2
= Tx, Tx = T
Tx, x. But
|x|
2
= x, x implies that T
T = I.
(Leftarrow) If T
T = I, then |x|
2
= x, x = T
Tx, x = |Tx|
2
.
For isometries in Hilbert space, we have Tx, Ty = x, y, which means that x y i
Tx Ty.
5.2.4 Unitary Operators
A B(1) is called unitary if A
A = AA
= I. Note that if A is unitary, then A

1
B(1)
and A
1
= A
.
Proposition 5.2.3. Let A be a unitary operator. Then
1. A is normal.
2. A is an isometry.
3. A
1 and A
are unitary.
Theorem 5.2.6. A B(1) is a unitary operator i A is an isometry and B(A) = 1.
Proof. () If A is unitary, then A is an isometry. Moreover, since A
1
B(1), we must
have 1(1).
() If A is an isometry, then A
A = I. Since A is 1-1 and 1(A) = 1, we know that A

1
exists. But then (A
A
1
)(Ax) = 0 x 1, i.e. (A
A
1
)(y) = 0 y 1, this means
that A
= A
1
, and thus we must have A
A = AA
= I.
Some examples and nonexamples:
1.
2
(R):
Let (x
1
, x
2
, . . . ) and T
R
(x) = (0, x
1
, x
2
, . . . ) be the right shift operator. This is an
isometry. We have
T
R
x, y = (0, x
1
, . . . ), (y
1
, y
2
, . . . )
= (x
1
, x
2
, . . . ), (y
2
, y
3
, . . . )
= x, T
L
(y)
Therefore T
R
= T
L
. Since T
R
T
R
,= T
R
T
R
, its not normal, and thus not unitary. Also,T
L
is not an isometry, because it obviously kills some elements of the vector.
2. Hilbert space with orthonormal basis e
1
, e
2
. . . : Let be any permutation of 1. Then,
given x =
i=1

i
e
i
, we dene T(
i=1

i
e
i
) =
i=1
i
e
(i)
. Since |Tx| = |x|, its an
isometry. Since its also onto, its unitary.
3.
2
(Z):
Given x = (. . . , x
1
, x
0
, x
1
, . . . ) s.t.
i=
[x
i
[
2
< + and x, y =
i=
x
i
y
i
.
Dene T :
2
(Z)
2
(Z) s.t. T(x) = x
i1
, making it easy to see that its an isometry
and onto, therefore its a unitary operator.
4. L
2
([0, 1]):
Dene T : L
2
([0, 1]) L
2
([0, 1]) s.t. T(x)(t) = (1 t). Then |Tx|
L
2 = |x|
L
2, and
this is an isometry, and onto. Thus its unitary. We can directly check that T
= T as
well.
5.2.5 Positive Operators
An operator A : 1 1 is called positive if Ax, x 0 x 1. We dene a partial order
A B if A B 0. We have the properties
1. A B and B C A C .
2. A B, C D A + C B + D.
3. A 0, 0 A 0.
Theorem 5.2.7. The following are true:
1. For every A B(1), AA
and A
A are positive.
2. If A is an invertible positive operator, so is its inverse.
Proof. We can show this via:
1. A
Ax, x = Ax, Ax 0.
2. A
1
y, y = A
1
Ax, Ax = Ax, Ax 0.
Theorem 5.2.8. Let A B(1). Then A |A|.
Proof. Ax, x |Ax||x| |A||x|
2
= |A|x, x = |A|x, x.
An example: A :
2
(a, b)
2
(a, b) s.t. Ax = zx, z L
(a, b). Then

Ax, x =
_
b
a
z(t)x(t)x(t)dt
=
_
b
a
z(t)[x(t)[
2
dt 0 if z 0.
Theorem 5.2.9. Let A, B B(1) be positive and self-adjoint. If AB = BA, then AB is
positive.
Proof. Dene a sequence of operators s.t. A
1
=
A
|A|
, and A
n+1
= A
n
A
2
n
. If A
n
is self-
adjoint, A
2
n
is self-adjoint, so this whole sequence is self-adjoint, and mutually commutes.
We claim that 0 A
n
I. We prove this by induction. This is true for n = 1, so
suppose its true for n, i.e. 0 A
n
I. Then
A
2
n
(I A
n
)x, x = (I A
n
)A
n
x, A
n
x 0.
because A
n
I. Thus, we have that
A
n
(I A
n
)
2
x, x = A
n
(I A
n
)x, (I A
n
)x 0.
implying that A
2
n
(I A
n
) 0 and A
n
(I A
n
)
2
0. So
A
n+1
= A
2
n
(I A
n
) + A
n
(I A
n
)
2
0.
This also means that I A
n+1
= I A
n
+ A
2
n
0.
Now A
1
= A
2
1
+ A
2
= =
n
k=1
A
2
k
+ A
n+1
. Therefore
n
k=1
A
2
n
= A
1
A
n+1
A
1
.
Note that
n
k=1
|A
k
x|
2
=
n
k=1
A
2
k
x, x Ax, x.
Thus
n
k=1
|A
k
x|
2
is convergent, and |A
k
x| 0 as n . Therefore
_
n
k=1
A
2
k
_
x = A
1
x A
n+1
x A
1
x as n
Thus
k=1
A
2
k
x = A
1
x.
Since B commutes with A, it must commute with A
n
s, because these are polynomials of
A. This nally gives us
ABx, x = |A|BA
1
x, x = |A|
n=1
BA
2
n
x, x = |A|
n=1
BA
n
x, A
n
x 0.
Note that if 1 is complex, then A 0 A = A
. 0 Ax, x = x, Ax = x, Ax.
Also, Ax, x = x, A
x, meaning the quadratic forms for A and A
are the same, i.e.

A = A
.
If 1 is real, then we have to assume that A is self-adjoint, since
Ax, y =
1
4
[A(x + y), x + y A(x y), (x y)]
if A = A
.
Corollary 5.2.2. If A = A
, B = B
, A B, then AC BC for every positive operator

C = C
, which commutes with A and B.

Proof. B A 0, C 0, and they commute, therefore (B A)C 0.
Lemma 5.2.1. Let A
1
A
2
be self-adjoint operators on 1 s.t. A
n
A
m
= A
m
A
n
n, m
1. If B = B
is s.t. BA
n
= A
n
B and A
n
B n 1, then there exists a self-adjoint oper-
ator A s.t.
lim
n
A
n
x = Ax x 1
and A
n
A B.
Proof. Dene C
n
= B A
n
. Then C
n
are mutually commuting and C
1
C
2
0. By
the previous result, this implies that (C
m
C
n
)C
m
and C
n
(C
m
C
n
) are positive if n > m.
Therefore x 1
C
2
M
x, x C
n
C
m
x, x C
2
n
x, x
thus C
2
n
x, x is a nonincreasing sequence of nonnegative numbers, and so it converges. This
means that
lim
n,m
C
n
C
m
x, x = lim
n
C
2
n
x, x
and so
|C
m
x C
n
x|
2
= (C
m
C
n
)
2
x, x
= C
2
m
x, x +C
2
n
x, x 2C
n
C
m
x, x 0 as n, m .
Hence this is a Cauchy sequence, and thus converges x 1. Therefore (A
n
x) converges
x 1. Dene Ax = lim
n
A
n
x. Since A
n
= A
, A = A
. To see this, note

Ax, y = lim
n
A
n
x, y = lim
n
x, A
n
y = x, Ay
Next, we need to show that A
n
A B. A B is obvious, since A
n
x, x
Bx, x x 1, and thus Ax, x Bx, x x 1, and thus A B.
Now suppose A
n
x, x > Ax, x + for some x 1, > 0, n 1. Choose n
0
> n s.t.
A
n
0
x, x < Ax, x +

2
Then
A
n
x, x A
n
0
x, x < Ax, x +

2
a contradiction. Therefore A
n
A n 1.
With this lemma, we can get the following important result.
Let A = A
0. Then a self-adjoint operator B s.t. B

2
= A is called the square root of
A.
Theorem 5.2.10. Let A = A
0. There then exists a unique positive square root B of A.

Moreover, B commutes with every operator that commutes with A.
Proof. Let > 0 s.t.
2
A I. Dene T
0
= 0, and T
n+1
= T
n
+
1
2
(
2
AT
2
n
)n 0. T
n
are
self-adjoint, because theyre polynomials of A. Also, T
n
T
m
= T
m
T
n
n, m.
Our rst claim is that T
n
I n 0. Of course T
0
I. Suppose T
n
I. Then
I T
n+1
1
2
(I T
n
)
2
+
1
2
(I
2
A) 0.
Our second claim is that T
0
T
1
T
2
. To see this, note that
T
n+1
T
n
=
1
2
((I T
n1
) + (I T
n
))(T
n
T
n1
)
= (T
n
T
n1
)
1
2
(T
n
+ T
n1
)(T
n
T
n1
)
= (T
n
T
n1
)
1
2
(T
2
n
T
2
n1
)
Obviously T
1
=
1
2
2
A T
0
= 0. So T
1
T
0
0. Now, if T
n
T
n1
0, then the above
shows that T
n+1
T
n
0, so we have this claim as well.
Therefore, by the proposition, T
n
converges to a positive, self-adjoint operator T. We
thus have
lim
n
T
n+1
= T
n
+
1
2
(
2
A T
2
n
)
= T +
1
2
(
2
A T
2
)
So
2
A T
2
= 0, i.e. A =
_
T
_
2
:= B
2
. Since T
n
commutes with every operator commuting
with A, so does T and thus B.
We now prove uniqueness. Suppose theres another C = C
0 s.t. C
2
= A. Since C
commutes with A (CA = CC
2
= C
2
C = AC), C commutes with B. Let x 1, and let
y = (B C)x. Then
By, y +Cy, y = (B + C)y, y
= (B + C)(B C)x, y
= (B
2
C
2
)x, y
Thus By, y = Cy, y = 0.
Let D
2
= B. Then |Dy|
2
= D
2
y, y = By, y = 0 Dy = 0. So By = D(Dy) = 0.
Similarly, one can show that Cy = 0. Thus
|Bx Cx|
2
= (B C)
2
x, x
= (B C)y, x = 0.
Therefore Bx = Cx x 1.
5.2.6 Projection Operators
If Ax, x > 0 x 1(x ,= 0), we call it strictly positive.
Suppose S =

S 1 be a closed subspace of a Hilbert space 1. We know that any x 1
is uniquely decomposed s.t. x = y + z, y S, z S
.
Dene Px = y. Then P : 1 S is called the orthogonal projection onto S. We note
that
1. P is linear.
2. P B(1) and |P| 1 and |P| = 1 if S ,= 0 (|Px|
2
= |y|
2
= |x|
2
|z|
2
|x|
2
).
3. P
2
= P.
4. P(1) (I P)(1).
If T
2
= T, T is called an idempotent operator .
Theorem 5.2.11. P B(1) is an orthogonal projection i P = P
and P
2
= P.
Proof. We prove both directions.
() We already know that P
2
= P. Let Px
1
= y
1
and Px
2
= y
2
. Then Px
1
, x
2
=
y
1
, x
2
= y
1
, y
2
= x
1
, y
2
. So P
= P.
() Let P
= P, P
2
= P. Denote S = x : Px = x. Obviously S =

S. Let x 1.
Then Px = Px
2
= P(Px) Px S.
We need to show that (I P)x S
. Let z S/ Then x Px, z = x, z Px, z =

x, z x, Pz = x, z x, z = 0.
Corollary 5.2.3. If P is an orthogonal projection, then Px, x = |Px|
2
.
Proof. Px, x = P
2
x, x = Px, Px = |Px|
2
.
If S =

S, P orthogonal projection onto S, then I P is the orthogonal projection onto
S
. We will denote this by P
and call it the complementary projection.

Chapter 6
Compact Operators
We mentioned that linear operators on Hilbert spaces are a sort of a generalization of ma-
trices. This is not exactly true, since we can have an unbounded operator, but not an
unbounded matrix. In this sense, then, the statement that compact operators are general-
izations of matrices is closer to the mark, because they are the closure of nite rank operators
on the uniform operator topology. Due to this fact, many techniques that work for matrices
work for compact operators as well.
6.1 Basics
Let 1 be a Hilbert space. Then a linear operator T : 1 1 is called compact if for every
bounded sequence (x
n
), the sequence (Tx
n
) has a convergent subsequence
1
. Equivalently,
this means that T(B
1
(0)) is compact (i.e. T(B
1
(0)) is precompact
2
). Note that compact
operators are bounded, since T(B
1
(0)) is bounded.
A couple of examples and nonexamples:
1. I : 1 1 is not compact if dim1 = .
2. Let S =

S 1 be a nite dimensional subspace. Then P : 1 S, the orthogonal
projection, is compact.
A operator is called nite dimensional, or of nite rank if its range is nite dimensional.
Theorem 6.1.1. If T : 1 1 is bounded and nite dimensional, then T is compact.
Proof. Let e
1
, . . . , e
n
be an orthonormal basis of the range. Then Tx =
m
i=1
Tx, e
i
e
i
.
Let (x
n
) be a bounded sequence, and let (x
n
k
) be a subsequence s.t. Tx
n
k
, e
i

i
as
k for some
i
. Then T
x
n
k
m
i=1
i
e
i
as k .
1
Note that this denition is in some sense the easiest to comprehend, and that it is has an equivalent
formulation for any topological space: i.e. a topological space X is compact i every net on X has a
convergent subnet (nets are generalizations of sequences).
2
Note that the Arzela-Ascoli theorem states that if S is a precompact set, every sequence within it
contains a Cauchy subseqeuence.
56 Compact Operators
If you remove the boundedness assumption, this theorem stops being true. Take an
example weve already presented before, i.e. T :
2
R s.t. Te
i
= e
1
. Then T(1,
1
2
,
1
3
, . . . ) =
n
i=1
1
n
as n . This is a 1-d unbounded operator. Thus linear operators dont
have to bounded or compact.
Proposition 6.1.1. The set of compact operators is a subspace of B(1).
In fact, we can say more; this set is, in fact, a Banach algebra.
Theorem 6.1.2. If A : 1 1 is compact and B : 1 1 is bounded, then AB and BA
are compact. In other words, the space of compact operators forms a 2-sided ideal in B(1).
Proof. AB : Let (x
n
) be bounded. Then (Bx
n
) is a bounded sequence, and thus (ABx
n
)
has a convergent subsequence.
BA : If (x
n
) is bounded, then (Ax
n
) has a convergent subsequence (Ax
n
k
). But then
since B is continuous, BAx
n
k
= B(Ax
n
k
) is convergent.
Theorem 6.1.3. Let (T
n
) be a sequence of compact operators in 1 s.t. |T
n
T| 0 as
n for some operator T. Then T is compact.
Proof. Let (x
n
) be bounded. Let (x
1
n
) be a subsequence of (x
n
) s.t. T
1
x
1
n
is convergent. We
proceed by induction.
Choose (x
k
n
) to be a subsequence of (x
k1
n
) s.t. T
k
x
k
n
is convergent, k N. We set x
n
= x
n
n
,
which is a subsequence of (x
n
) and a subsequence of all (x
i
n
) for 1 i n.
Let > 0 and let k N be s.t. |T
k
T|

3M
where M := sup
n
|x
n
|. We want to make
use of the fact that
|T x
n
T x
m
| |T x
n
T
k
x
n
| +|T
k
x
n
T
k
x
m
| +|T x
m
T
k
x
m
|
because we can bound the three quantities on the RHS. If we can acheive the right
bound, we can show that T x
n
is a Cauchy sequence. This means that this is a convergent
subsequence for T, and thus T is compact.
Let n
0
be s.t. |T
k
x
n
T
k
x
m
| <

3
for n, m > n
0
. This is possible because (T
k
x
n
) is a
subsequence of (T
k
x
k
n
), which is itself convergent. So for n, m > n
0
, we have
|T x
n
T x
m
|

3M
M +

3
+

3M
M = .
Thus the sequence (T x
n
) is Cauchy, and the conclusion follows.
Corollary 6.1.1. The limit of a convergent sequence (in operator norm) of nite dimensional
operators is compact.
Let us look at a signicant example of this, called Hilbert-Schmidt kernels/operators .
Let R
n
. Take k L
2
( ), and let T : L
2
() L
2
() be s.t.
Tx(t) =
_
k(t, s)x(s)ds.
Basics 57
We know that k B(L
2
()) and |T| |k|
L
2
()
. Let e
1
, . . . , e
n
. . . be an or-
thonormal basis of L
2
(). Then one can derive from Fubinis theorem that e
i
e
j
i,j=1
is an
orthonormal basis of L
2
( ).
Let
k
N
(t, s) =
N
i,j=1
ke
i
, e
j
e
i
e
j
=
N
i,j=1
__
k(, )e
i
()e
j
()dd
_
e
i
(t)e
j
(s)
We have that |k k
N
|
L
2
()
0 as N .
Dene T
Nx
(t) =
_
k
N
(t, s)x(s)ds. T
N
is nite dimensional and bounded. We thus have
that
T
N
x(t) =
N
i=1
_
N
j=1
ke
i
, e
j
e
i
, x
_
e
i
(t)
So T
N
x spane
1
, . . . , e
n
. Also, |k
N
|
L
2 |k|
L
2. Therefore T
N
is compact. But
|T
n
T| |k k
N
| 0 as N . And
(T T
N
)x(t) =
_
(k k
N
)(t, s)x(s)ds
Therefore T is compact.
Proposition 6.1.2. Let T : 1 1 be compact. Then T
is compact.
Proof. Let (x
n
) be a bounded sequence. Then TT
is compact. Let TT
x
n
k
be a convergent
subsequence. Then
|T
x
n
k
T
x
n
m
| = T
(x
n
k
x
n
m
), T
(x
n
k
x
n
m
)
= TT
(x
n
k
x
n
m
), (x
n
k
x
n
m
)
|TT
(x
n
k
x
n
m
)||x
n
k
x
n
m
|
Now note that |x
n
k
x
n
m
| is bounded, and that |TT
(x
n
k
x
n
m
)| 0, and we have that
(T
x
n
k
) is Cauchy.
Proposition 6.1.3. Let S 1 be s.t. span(S) is dense in 1. Let (x
n
) be a bounded
sequence s.t. x
n
, y x, y, for every y S. Then x
n
x.
Proof. By linearity, we have that x
n
, y x, y y span(S). Let z 1. Let > 0 and
let z
0
span(S) s.t. |z z
0
| <

3M
where |x
n
||x| M for every n. Let n
0
be s.t. for
every n > n
0
, [x
n
, z
0
x, z
0
[

3
. Then for n > n
0
,
[x
n
, z x, z[ [x
n
, z x
n
, z
0
[ +[x
n
, z
0
x, z
0
[ +[x, z
0
x, z[

3M
M +

3
+

3M
M = .
So x
n
, z x, z.
58 Compact Operators
Theorem 6.1.4. Let an operator T be compact in 1. Then there exist bounded nite di-
mensional operators T
n
s.t. |T T
n
| 0 as n .
Proof. Since T is compact, 1(T) is separable. Let e
1
, e
2
, . . . be an orthonormal basis
of 1(T). Dene T
n
x =
i=1
Tx, e
i
e
i
. In fact, T
n
= P
n
T, where P
n
is the orthogonal
projection onto span e
1
, . . . , e
n
. Then its easy to see that sup
|x|1
|T
n
x Tx| 0 since
T(B
1
(0)) is compact.
This theorem can be generalized to Banach spaces with an approximation property.
A Banach space X has the approximation property if for every compact set K and every
> 0, there exists a nite rank operator Q
s.t. sup
xK
|Q
x x| .
We can then set T
n
= Q1
n
where Q1
n
is for T(B
1
(0)). Then
sup
|x|1
|T T
n
| = sup
yT(B
1
(0))
|y Q1
n
y
|
1
n
.
For example,
p
has the approximation property.
We say that a set v
1
, v
2
, . . . is a Schauder basis for a Banach space X if v X, there
exists a unique sequence (
n
(v)) s.t. v =
i=1
i
(v)v
i
3
. Obviously spaces with Schauder
bases are separable, so X here must be separable. It can be shown that
i
X
i N and
that Q
n
v =
n
i=1
i
(v)v
i
satisfy sup
n
|Q
n
| C for some C.
Now if T is compact (T : X X) and we dene T
n
= Q
n
T, then
|T T
n
| = sup
x(T(B
1
(0)))
|x Q
n
x| 0 as n .
Suppose there exists a sequence (x
n
) and > 0 s.t. |x
n
Q
n
x
n
| for n N. Let
x
n
k
x
0
as k for some x T(B
1
(0)). Then
|x
n
k
Q
n
k
x
n
k
| |x
n
k
x
0
| +|x
0
Q
n
k
x
0
| +|Q
n
k
x
0
Q
n
k
x
n
k
|
|x
n
k
x
0
| +|x
0
Q
n
k
x
0
| + C|x
0
x
n
k
|.
Since C|x
0
x
n
k
| 0 as k , we get a contradiction
4
.
We mentioned that every Banach space that admits a Schauder basis is separable. The
converse, however, is not true. In 1973, Per Eno gave a counterexample, namely that of
a compact operator on a Banach space without the approximation property. Later, other
counterexamples appeared.
Theorem 6.1.5. An operator T : 1 1 is compact i x
n
x implies that Tx
n
T
x
.
Proof. () Let T be compact and x
n
x . If Tx
n
Tx, then there exists an > 0 and
a subsequence x
n
k
s.t. |Tx
n
k
Tx| . Since (x
n
k
) is bounded, Tx
n
k
has a convergent
subsequence. Denote it by Tx
t
n
k
. Now, 1, Tx
n
, y = x
n
, T
y x, T
y = Tx, y.
Therefore Tx
n
Tx, and thus Tx
t
n
k
Tx. This is a contradiction, since |Tx
t
n
k
Tx| .
(Leftarrow) Let (z
n
) be a bounded sequence. We need to show that (Tz
n
) is a convergent
subsequence.
3
This is convergence in norm.
4
Note that the important step in this proof is that the last term is bounded, and shrinks, since increment
projections are dense and the operator norm is bounded in that way.
Basics 59
Theorem 6.1.6. (Banach-Alaoglu for Hilbert spaces). The closed unit ball is weakly com-
pact
5
in a Hilbert space.
Proof. Let (z
n
) be a sequence s.t. |z
n
| 1. We can assume that the space 1 is separable
here. Otherwise, we restrict 1 to spanz
1
, z
2
, . . . . If z
n
z 1
1
, then z
n
z 1. Since
every y 1 is decomposed into y = y
1
+ y
2
1
1
1
1
, this means that
z
n
, z = z
n
, y
1
+z
n
, y
2
.
This implies that z
n
, y z
n
, y
1
= z, y
1
+ y
2
= z, y .
Let e
1
, e
2
be an orthonormal basis for 1. Let (z
1,n
) be a subsequence of (z
n
) s.t.
(z
1,n
, e
1
) is convergent. Inductively, we choose (z
m,n
) to be a subsequence of (z
m1,n
) s.t.
(z
m,n
, e
m
) is convergent. Now set x
n
= z
m,n
. Now let x
m
= z
m,n
. By contradiction,
lim
n
x
k
, e
k
=:
k
, k N, for some
k
.
We claim that (x
n
) is weakly compact. By Bessels inequality, l 1
l
k=1
[x
n
, e
k
[
2
|x
n
|
2
1.
Therefore, letting n in the above,
l
k=1
[
k
[ 1. Now let l . We get
k=1
1. Dene x =
k=1
k
e
k
1. Now
x
n
, e
m
x, e
m
= x
n
, e
m

_

k=1
k
e
k
, e
m
_
= x
n
, e
m

n
e
m
, e
m
= x
n
, e
m

n
0 as n .
Therefore x
n
, e
m
x, e
m
m N. Since spane
1
, e
2
, . . . is dense in 1, this implies that
x
n
x
6
.
We use these results extensively in the rest of the text.
5
This means that the set is compact in a weakly sequential sense, as introduced in section 3.3.1.
6
In Hilbert spaces, we dont have to worry about weak and weak
convergence.
Chapter 7
Spectral Theory
We now study the spectral theory of bounded linear operators. Modern spectral theory
largely arose out of the needs of physicists. It was rst rigorously dened by David Hilbert,
and was later reformulated more abstractly by John von Neumann, who applied it to the
study of quantum mechanics. In the 1940s, Israel Gelfand started publishing a pathblazing
series of papers on operator algebras that dened the subject as it is studied today [6].
In addition to being based on in-class lectures, this chapter also relies heavily on [3] and
[1]. The notes start from very general results, and end up being more and more specic by
the end of the chapter.
7.1 Fundamentals
The space of all operators equipped with a norm is a Banach space. Its also a Banach
algebra, since we have the composition operator. We consider Banach algebras with unit
elements, which are not necessarily commutative.
Suppose X is a Banach algebra. Denote by X
1
= x X : x
1
exists,i.e. the set of
all invertible elements. Recall that x is invertible if x
1
X s.t. xx
1
= x
1
x = I where
I is the identity element. Note that the inverse, if it exists, is unique, because if y, z are 2
inverses of x, we have y = yI = y(xz) = (yx)z = Iz = z. Note that X
1
is a group, and
this is sometimes called the general linear group of the unital
1
Banach algebra X.
Theorem 7.1.1. If x is an element of X s.t. |x| < 1, then I x is invertible, and its
inverse is given by the absolutely convergent Neumann series (I x)
1
= I + x + x
2
+ . . . .
Moreover, we have the following estimates:
|(I x)
1
|
1
1 |x|
, (7.1)
|I (I x)
1
|
|x|
1 |x|
. (7.2)
1
Having an identity element.
Fundamentals 61
Proof. Since |x
n
| |x|
n
n N, we can dene an element z X as the sum of the
absolutely convergent series z =
n=0
x
n
. We have
z(I x) = (I x)z = lim
N
(I x)
N
k=0
x
k
= lim
N
(I x
N+1
= I),
which means that I x is invertible, and the inverse is z. The inequality 7.1 follows from
|z|
n=0
|x
n
|
n=0
|x|
n
=
1
1 |x|
.
Since
I x =
n=1
x
n
= xz,
we have |I z| |x||z|, and thus 7.2 follows from 7.1.
Corollary 7.1.1. X
1
is an open set in X and x x
1
is a continuous map of X
1
to
itself.
Proof. To see that X
1
is open, choose an invertible element x
0
and an arbitrary element
h X. We have that x
0
+h = x
0
(I +x
1
0
h). So, if |x
1
0
h| < 1, then x
0
+h is invertible. In
particular, if |h| < |x
1
0
|
1
, then this condition is satised, proving that x
0
+h is invertible
when |h| is suciently small.
Supposing that h has been chosen, we can write
(x
0
+ h)
1
x
1
0
= (x
0
(I + x
1
0
h))
1
x
1
0
= [(I + x
1
0
h)
1
I]x
1
0
.
Thus for |h| < |x
1
0
|
1
we have
|(x
0
+ h)
1
x
1
0
| |(I + x
1
0
h)
1
I||x
1
0
|
|x
1
0
h|
1 |x
1
0
h|
,
and the last term goes to 0 as |h| 0.
The spectrum of x X is (x) := C : (x I) / X
1
. We now examine some of
the basic properties of the spectrum.
Proposition 7.1.1. For every x X, (x) is a closed subset of the disk z C : [z[ |x|.
Proof. The complement of the spectrum is given by
C (x) = C : x I X
1
.
Since X
1
is open, and the map C xI X is continuous, the complement of (x)
must be open.
To prove the second assertion, we need to show that no complex number with [[ > |x|
can belong to (x). Indeed, for such a the formula
x I = ()(I
1
x)
together with the fact that |
1
x| < 1, implies that x I is invertible.
62 Spectral Theory
We now prove a fundamental result due to Gelfand. To do this, we need to use Liouvilles
theorem, which states that a bounded function f : C C which is holomorphic (i.e. has a
complex derivative) on the entire complex plane (called an entire function) is always constant.
To proceed, we also need to extend the notion of holomorphicity to functions f : C X,
where X is a complex Banach space. In this case, we say that f has a complex derivative at
0
C if the following limit exists
f
t
(
0
) := lim
0
f() f(
0
)

0
.
Theorem 7.1.2. (x) ,= for every x X.
Proof. The idea is to show that if (x) = , the X valued function f() = (x I)
1
is a
bounded entire function that tends to zero as . Then an appeal to Liouvilles theorem
yields the desired result.
For every
0
/ (x), (x )
1
I is dened for all suciently close to
0
, because (x)
is closed, and we claim that
lim
0
1

0
[(x I)
1
(x
0
I)
1
] = (x
0
I)
2
(7.3)
in the norm topology of X. Indeed, we can write
(x I)
1
(x
0
I)
1
= (x I)
1
[(x I) (x
0
I)](x
0
I)
1
= (
0
)(x I)
1
(x
0
I)
1
.
Divide by
0
, and use the fact that (x I)
1
(x
0
e)
1
as
0
to obtain 7.3.
Assume that (x) is empty, and choose an arbitrary bounded linear functional p on X.
The scalar-valued function
f() = p((x I)
1
)
is dened everywhere in C, and it is clear from 7.3 that f has a complex derivative everywhere
s.t. f
t
() = p((x I)
2
). Thus f is an entire function.
Notice that f is bounded. To see this, we need to estimate |(x I)
1
| for large . If
[[ > |x|, then
|(x I)
1
| =
1
[[
|(I
1
x)
1
|.
The estimates of 7.1.1 therefore imply that
|(x I)
1
|
1
[[(1 |x|/)
=
1
[[ |x|
and the right side goes to zero as [[ . Thus the function |(xI)
1
| vanishes at
innity. It follows that f is a bounded entire function, which, by Liouvilles theorem, must
be constant. The constant value is 0 because f vanishes at innity.
We conclude that p((xI)
1
) = 0 for every C and every bounded linear functional
p. The Hahn-Banach theorem implies that (x I)
1
= 0 for every C. But this cant
be possible because (x I)
1
is invertible, and I ,= 0 in X.
Compact Operators 63
Lets look at a concrete example of the spectrum of an operator. Suppose that A :
L
2
(0, 2) L
2
(0, 2) s.t. Ax(t) =
_
2
0
cos(t s)x(s)ds. From a basic identity, we have
Ax(t) = cos(t)
_
2
0
cos(x(s))ds + sin(t)
_
2
0
sin(x(s))ds
Clearly, A is bounded and is also nite dimensional, since its spanned by cos(t) and sin(t).
Hence, its compact. The eigenvalues are of course given by the equation Ax(t) = x(t). If
x(t) is an eigenvector, we must have x(t) = a cos(t) +b sin(t). There are two eigenvalues for
this equation: eigenvalue = and eigenvector are of the form x(t) = a cos(t)+b sin(t). This
eigenspace has dimension 2. And then we have eigenvalue = 0, which has as eigenfunctions
all functions orthogonal to both cos and sin. Thus this eigenspace is innite dimensional.
7.2 Compact Operators
We would like to derive the structure of the spectrum of a general compact operator on a
complex Banach space X. To this end, we make a few denitions.
For any operator T on X, the resolvent set (T) consists of those C s.t. T I is
invertible, and the spectrum (T) is obviously the complement. If (T), then T I
may fail to be invertible in several ways:
1. A(T I) ,= 0, meaning that is an eigenvalue of T. In this case, we say
p
(T),
the point spectrum of T.
2. T I may be 1-1, but its range may be dense, but not closed in X. In this case, we
say
c
(T), the continuous spectrum of T.
3. T I may be 1-1, but its range may be not even be dense in X. In this case, we say

r
(T), the residual spectrum of T.
Clearly, for a given operator T, we can decompose the complex plane C into disjoints
sets
p
(T),
c
(T),
r
(T) and (T). As an example of the continuous spectrum, consider the
operator T on 1 s.t. Te
n
=
n
e
n
where e
n
are the orthonomal bases of the space, and the
n
form a positive sequence that tends to 0. Then 0
c
(T). If Te
n
=
n
e
n+1
, 0
r
(T).
We immediately see that for a compact operator T, 0 lies in the spectrum. The reason
for this is that if T were to be invertible, the image of the unit ball would contain an open
set, and thus would fail to be precompact. From what weve seen, 0 may lie in any of the
spectrums weve dened. However, the we will shortly show that all other elements of the
spectrum for compact operators are eigenvalues (i.e. (T) = 0
p
(T)), a useful fact that
may not be true for other types of operators. To prove these facts, we need two lemmas; the
rst is algebraic, and the second topological.
Consider a linear operator T from a vector space X to itself, and consider the chain of
subspaces
0 = A(I) A(T) A(T
2
) A(T
3
) .
64 Spectral Theory
Either this chain is strictly increasing throughout, or n 0 s.t. A(T
n
) = A(T
n+1
), in
which case only the rst n spaces are distinct and all the others equal the nth one. If this is
the case, we say the kernel chain stabilizes at n. Therefore a kernel chain stabilizes at 0 i
T is 1-1.
Similarly, consider the range chain
X = 1(I) 1(T) 1(T
2
) 1(T
3
) ,
and dene what it means for the chain to stabilize. This chain stabilizes at 0 i T is
onto. Its possible that neither or only only one of these chains stabilizes. But, we have
Lemma 7.2.1. Let T be a linear operator from a vector space X to itself. If the kernel chain
stabilizes at n and the range chain stabilizes at m, then n = m and X = A(T
n
) 1(T
n
).
Proof. Suppose m < n. Since the range chain stabilizes at n, x s.t. T
n1
x / 1(T
n
) and
y s.t. T
n+1
y = T
n
x. Thus x Ty A(T
n
), and since the kernel chain stabilizes at
m < n, A(T
n
) = A(T
n1
). Thus T
n1
x = T
n
y, a contradiction. Thus m n. A similar
argument establishes the reverse inequality.
For the second part of the lemma, not that tf T
n
x A(T6n), then T
2n
x = 0, which
means T
n
x = 0. Thus A(T
n
) 1(T
n
) = 0. Given x, let T
2n
y = T
n
x, so x decomposes as
T
n
y 1(T
n
) and x T
n
y A(T
n
).
In the second lemma, we bring in the topology of compact operators.
Lemma 7.2.2. Let T : X X be a compact operator on a Banach space, and (
n
) a
sequence of complex numbers with inf [
n
[ > 0. Then there cannot exist a strictly increasing
chain of closed subspaces S
1
S
2
s.t. (
n
I T)S
n
S
n1
n N.
Proof. Suppose such a chain exists. Note that TS
n
S
n
n N. Since S
n
S
n1
contains
an element of norm 1, we may choose y
n
S
n
with |y
n
| 2, dist(y
n
, S
n1
) = 1. If m < n,
then
z :=
Ty
m
(
n
I T)y
n
n
S
n1
,
and
|Ty
m
Ty
n
| = [
n
[|y
n
z
n
|]geq[
n
[.
This implies that the sequence (Ty
n
) has no Cauchy subsequence, a contradiction of the
compactness of T.
This leads us to the following:
Theorem 7.2.1. Let T be a compact operator on a Banach space X. Then any nonzero
element of the spectrum of T is an eigenvalue. Moreover, (T) is either nite or a sequence
approaching 0.
Special Operators 65
Proof. Consider the subspace chains A[(I T)
n
] and 1[(I T)
n
], which are closed by a
previous result. Clearly IT maps A[(IT)
n
] into A[(IT)
n1
], so the previous lemma
implies that the kernel chain stabilizes at some n. Now 1[(I T)
n
] = A[(I T)
n
]
since
the range is closed, and due to the fact that the latter stabilizes, the range chain stabilizes
as well. This gives us the fact that X = A[(I T)
n
] 1[(I T)
n
]. Thus
1(I T) ,= X 1(I T)
n
,= X A(I T)
n
,= 0 A(I T) ,= 0.
This means that (T)
p
(T).
To prove the last statement, note that if it were false, we could nd a sequence of
eigenvalues (
n
) s.t. inf [
n
[ > 0. Let x
1
, x
2
, . . . be the corresponding nonzero eigenvectors
and set S
n
=span[x
1
, . . . , x
n
]. These form a strictly increasing chain of subspaces, and
(
n
I T)S
n
S
n1
, which contradicts the lemma.
We can use the above reasoning to get the Fredholm alternative.
Theorem 7.2.2. Let T be a compact operator on a Banach space X and a nonzero complex
number. Then either I T is an isomorphism, or it is neither 1-1, not onto.
Proof. Since the kernel chain and range chain for S = I T stabilize, either they both
stabilize at 0, in which case S is 1-1 and onto, or neither does, in which case its neither.
7.3 Special Operators
Proposition 7.3.1. Let E be a vector space, A be an operator on E and T be an invertible
operator on E. Then A and TAT
1
have the same eigenvalue.
Proof. () Let Ax = x, x ,= 0. Then TAT
1
(Tx) = TAx = T(x) = Tx. Then is an
eigenvalue if Tx is an eigenvector.
() Let TAT
1
= x. Take y = T
1
x. Then Ty = x = TAT
1
x. Therefore,
y = T
1
(Ty) = T
1
TAy = Ay.
Recall if T B(1), then A(T
) = 1(T)
and A(T) = 1(T
.
Proposition 7.3.2. Let T B(1) be normal. Then
1. A(T) = A(T
).
2. 1(T) is dense i T is 1-1.
3. T is invertible i > 0 s.t. |Tx| |x| x 1.
Proof. 1. |Tx|
2
= Tx, Tx = T
Tx, x = TT
x, x = T
x, T
x = |T
x|
2
.
2. Follows from above fact and the rst item.
3. The forward direction is obvious. For the converse, note that if |Tx| |x|, > 0,
then T is 1-1, which means 1(T) is dense. But |Tx| |x| implies that 1(T) is
closed, which means 1(T) = 1. Obviously, then we have |T
1
x|
1
|x| and then

T
1
is bounded.
66 Spectral Theory
Lets go back to spectral theory, and look at self-adjoint operators.
Theorem 7.3.1. Let T = T
B(1). Then (T) [a, b] where a = inf

|x|=1
Tx, x and
b = sup
|x|=1
Tx, x.
Proof. Recall that these values are strictly real for self-adjoint operators. Let (T). We
will show that a. Suppose < a. Then
Tx x, x = Tx, x |x|
2
= |x|
2
_
T
x
|x|
,
x
|x|

_
(a )|x|
2
.
Therefore (a)|x|
2
Txx, x |(T I)||x|. Thus (a)|x| |(T I)x| x
1.
T I is normal, which by the third item of the previous proposition, (T I) is
invertible, which is a contradiction. In the same way, we can show that if (T), which
means b.
Corollary 7.3.1. If T = T
B(1) and T 0, then (T) [[ = 1.

Proof. We have that |T| = 1, which means (T) 1. Let < 1. Then |T
| < 1
which means I T
is invertible. Therefore, |T I| = T(T
I), which implies that

/ (T).
We now look at a couple of results in the case of self-adjoint operators on Hilbert space.
Theorem 7.3.2. If 1 is a Hilbert space, the eigenvectors corresponding to dierent eigen-
values of a self-adjoint or unitary operator in 1 are orthogonal.
Proof. 1. Suppose T = T
. Let
1
,=
2
, Tx
1
=
1
x
1
, Tx
2
=
2
x
2
, x
1
, x
2
,= 0. Then
1
x
1
, x
2
=
1
x
1
, x
2
= Tx
1
, x
2
= x
1
, Tx
2
= x,
2
, x
2
=

2
x
1
, x
2
. But the
eigenvalues of a self-adjoint operator are real, so
1
=
2
. Thus x
1
, x
2
= 0.
2. Suppose TT
= T
T = I. Then x
1
, x
2
= T
Tx
1
, x
2
= Tx
1
, Tx
2
=
1
x
1
,
2
x
2
=
2
x
1
, x
2
. Therefore
1
2
= 1 or x
1
, x
2
= 0.
For unitary operators, recall that (T) [[ = 1. But
1
1
= 1,
2
2
= 1, thus
1
(
1
2
) = 0. Hence x
1
, x
2
.
Theorem 7.3.3. Let B(1) be compact and T = T
. Then either |T| or |T| is an

eigenvalue of T. Therefore |T| = max([a[, [b[).
Proof. Let (x
n
) be s.t. |x
n
| = 1 and |Tx
n
| |T|. Then
|T
2
x
n
|Tx|
2
x
n
|
2
= T
2
x
n
|Tx|
2
x
n
, T
2
x
n
|Tx|
2
x
n
= |T
2
x
n
|
2
2|Tx
n
|
2
T
2
x
n
, x
n
+|Tx
n
|
4
|x
n
|
2
= |T
2
x
n
|
2
|Tx
n
|
2
|T|
2
|Tx
n
|
2
|Tx
n
|
4
0 as n .
Compact Self-Adjoint Operators 67
Since T
2
is compact, there is a subsequence (x
n
k
) s.t. T
2
x
n
k
converges. Write the limit
as |T|
2
v, v ,= 0 (since T
2
x
n
k
0). This implies that |Tx
n
k
|
2
x
n
k
|T|
2
v x
n
k

v. Therefore T
2
x
n
k
T
2
v and T
2
x
n
k
|T|
2
v and T
2
= |T|
2
v which is equivalent to
(T
2
|T|
2
I)v = 0. Thus this is an eigenvalue of T
2
.
Rewrite this as (T |T|I)(T +|T|I)v = 0, so either of these is 0. This means either
(T +|T|)v = 0 or
(T |T|)v = 0 where w = (T +|T|I)v ,= 0.
So either |T| or |T| is an eigenvalue.
7.4 Compact Self-Adjoint Operators
This is, in a sense, the nicest possible spectral result you can get on these operators.
Notice that this is exactly the same result as the elementary spectral theorem for Hermitian
matrices.
Theorem 7.4.1. Let T B(1) be compact and T = T
. Then there exists an orthonormal

basis of 1 composed of eigenvalues of T. Moreover, Tx =
i=1
i
x, v
i
v
n
, [
i
[ > 0, where
v
i
is the element of the orthonormal basis which is an eigenvector of
i
.
Proof. Let [
1
[ [
n
[ 0 as n . Let 1
n
be the eigenspace for
n
. Choose
orthonormal basis for H
n
, v
n
1
, . . . , v
n
m
n
. Since eigenspaces are orthogonal, the set
v
1
1
, . . . , v
1
m
1
, . . . , v
n
1
, . . . , v
n
m
n
, . . .
is orthonormal. Dene
1
s
:= spanv
1
1
, . . . , v
1
m
1
, . . . , v
n
1
, . . . , v
n
m
n
, . . .
and
1
0
:= 1
s
If 1
0
= 0, this implies that 1
s
is orthonormal.
If 1
0
,= 0, take an orthonormal basis of 1
0
, denoted by v
.
We rst claim that w 1
0
, T(w) = 0. To prove this we need to dene a sequence of
spaces. Dene
1
n
:= v
1
1
, . . . , v
1
m
1
, . . . , v
n
1
, . . . , v
n
m
1
Each of these are closed (due to its being an orthogonal complement) and thus are Hilbert
spaces. Let T
n
be T restricted to 1
n
. Then T : 1
n
1
n
. To see this, let x 1
n
and
v spanv
1
1
, . . . , v
1
m
1
, . . . , v
n
1
, . . . , v
n
m
1
. Then
T
n
x, v = Tx, v = x, Tv = 0
since Tv span . So T
n
x v v span . Therefore T
n
x 1
n
.
68 Spectral Theory
If
n
is an eigenvalue of T
n
, then its an eigenvalue of T. T
n
is also self-adjoint. Therefore
(since either |T
n
| or |T
n
| is an eigenvector), we get that
|T
n
| [
n+1
[ 0 as n .
If w 1
0
, w 1
n
n 1. Therefore
|T
n
| = |T
n
w| [
n+1
[|w| 0 as n .
Therefore Tw = 0.
Thus each v
is an eigenvector and the set

v
1
1
, . . . , v
1
m
1
, . . . , v
n
1
, . . . , v
n
m
1
, . . . v
is an orthonormal basis of eigenvectors.

7.4.1 The Fredholm Alternative
Lets look at an application to PDEs.
Theorem 7.4.2. Let 1 be a Hilbert space. If A = A
and is compact, then

f = Af + (7.4)
has a unique solution 1 i
g = Ag (7.5)
has only the trivial solution, i.e. g = 0. Moreover, if 7.4 has a solution, then f, g = 0 g
satisfying 7.5.
Proof. The forward direction is obvious. We prove the converse.
Let v
be an orthonormal basis of 1, composed of eigenvectors of A, with

n
being
the eigenvalues. Let =
n=1
c
n
v
n
.
Since g = Ag has only a trivial solution,
n
,= 1 n N. If f solves 7.4, then its easy to
see that f must be of the form f =
n=1
a
n
v
n
. This is because 7.4 means
n=1
a
n
v
n
=
n=1
n
a
n
v
n
+
n=1
c
n
v
n
and we have the fact that if f =
a
n
v
n
, where for some i, v
i
/ v
1
, . . . , v
n
, . . . , then we
would have a
n
v
n
=
n
a
n
v
n
, but this is impossible since
n
,= 1. Therefore, we must have
that
a
n
=
n
a
n
+ c
n
=
c
n
1
n
.
Compact Self-Adjoint Operators 69
Recall that
n=1
[c
n
[
2
< +, So we will have a solution i
n=1
[a
n
[
2
< +. But we have
n=1
[a
n
[
2
=
n=1
[c
n
[
2
[1
n
[
2
M
n=1
[c
n
[
2
< +.
since [1
n
[ 1 as n . Therefore f =
n=1
c
n
1
n
v
n
1 and solves 7.4.
If there were 2 solutions f
1
,= f
2
, then w = f
1
f
2
,= 0 would solve 7.5, a contradiction.
Now, if f solves 7.4 and g solves 7.5, then
f, g = Af, g +, g
= f, Ag +, g
= f, g +, g
and so , g = 0.
Chapter 8
Applications: Partial Dierential
Equations
In this chapter, we apply some of what weve learned to the theory of partial dierential
equations. Here, a great many of the proofs are omitted; we aim to give a sort of an
advertisement for the subject rather than a lengthy exposition. In light of this, no pretension
towards completness is made, and the tone is slightly less formal than the rest of the text.
Most of the material presented derives from Evans denitive textbook [4].
Recall that a PDE for a given function u : R
n
R is an equation composed of the
partial derivatives of the function. If the equation is linear in these derivatives, its known
as a linear PDE. There is an enormous dierence between the approaches used to work with
linear and nonlinear equations; as one might guess, the latter are harder to quantify. Whats
less obvious is the fact that, unlike ODEs, PDEs may not have any classical solutions at
all. What we mean by classical here is simply a function which possesses all the derivatives
present in the equation, so that you can simply plug it in and solve it. A weak solution, in
contrast, is a function that may not possess all the derivatives the equation requires, and yet
still satises the PDE in some precisely dened sense. What this statement actually means
is something well have to dene more carefully below. Sobolev inequalities are tools that
help us prove the existence of these solutions, as well as proving certain types of embeddings.
Since we use real analysis and operator theory, this means that were mainly going to focus
on the linear case; nonlinear PDEs require a lot more work, and dont have nearly as simple
and neat a formulation of solutions. Indeed, as [4] points out, one can say that nonlinear
PDE are fairly poorly understood. This in no way is an indication of their lack of importance
however. To give an example of this, one can consider the Millenium Problems, a list of 7
of the most important open problems in mathematics. So far, only the Poincare conjecture
has been resolved, and one of the major roadblocks in the solution was understanding the
behavior of a certain nonlinear PDE (Ricci ow). In addition to this, Navier-Stokes existence
and smoothness is another Millenium prize problem. For an insight into why this is so dicult
to solve, see Taos comments in [13].
We use the multi-index notation, which is standard in this area. We dene orders of
dierentiation by
Sobolev Spaces 71
D
u(x) :=

[[
u(x)
x
1
1
x
n
1
=
1
x
1

n
x
n
where = (
1
, . . . ,
n
), and [[ =
1
+ +
n
. Slightly abusing notation, we use
Du to denote the gradient vector, and D
2
u to denote the Hessian matrix. u denotes the
Laplacian operator.
The boundary and closure of a set U is denoted by U and

U respectively. The class
of L
p
spaces on a set U is dened by L
p
(U), and the class of locally integrable functions is
given by L
p
loc
(U).
8.1 Sobolev Spaces
Since the solutions to PDEs are obviously functions, it makes sense to use these tools from
functional analysis. Lets try to analyze what is happening using the language of operators.
Simply put, we can write our PDEs as A : X Y , where A is a linear operator that encodes
the structure of the PDEs, and X and Y represent vector spaces of functions. Note that this,
if done properly, describes everything we need to know about the PDE. The major challenge
here is to recast a PDE into this form. What does this entail? Well, rst of all, we need to
work out what the spaces X and Y are. Secondly, we need to devise the correct operator A.
Once all this is done, we can apply said tools, enabling us to prove interesting things about
PDEs that are otherwise dicult, if not impossible. In this section, we will work on what
form the spaces themselves take.
8.1.1 Weak Derivatives
The rst step towards dening a weak solution to a PDE is to dene a weak derivative.
Consider the space C
c
(U) of innitely dierentiable functions on the set U with compact
support. Functions belonging to this space are called test functions.
Consider the following relation for two functions u C
1
(U) and C
0
(U) (the closure
of C
c
in U), which clearly holds due to integration by parts:
_
U
u
x
i
=
_
U
u
x
i
dx
where i = 1, 2, . . . , n. The boundary terms vanish due to compact support. Now consider
the followinng relation:
_
U
uD
= (1)
[[
_
U
D
u dx (8.1)
This holds for u C
k
(U), because we can apply the rst formula [[ times. Now the left
hand side of the second equation makes sense if u is locally integrable. But can we have a
72 Applications: Partial Dierential Equations
situation where the function u does not lie in C
k
, and the above situation holds? In other
words, do there exist functions u, v L
1
loc
(U) s.t.
_
U
uD
= (1)
[[
_
U
uv dx
C
c
(U)? If so, v is called the weak derivative of u (i.e. v = D
u in this sense).
Theres one thing that we can clearly see here, which is that if a weak derivative exists,
its unique a.e. This is because if we have two weak derivatives v
1
, v
2
for u, by (8.1) we have
_
U
(v
1
v
2
) dx = 0.
8.1.2 Denition and basic properties
We are now ready to give the denition of a Sobolev space, our models for the equation
A : X Y . Fix 1 p and k N.
Denition 8.1.1. A Sobolev space, denoted by W
k,p
(U), is a space consisting of all locally
integrable functions u : U R s.t. for any multi-index [[ k, D
u exists in the weak

sense, and belongs to L
p
(U).
If p = 2, we have the special notation W
k,2
(U) = H
k
(U). This space is denoted by H
because its a Hilbert space. For our purposes, well ignore the case p = . The closure of
C
c
(U) in a Sobolev space is denoted by W
k,2
0
(U). This is a subspace of the Sobolev space.
The norm on these spaces is given by |u|
W
k,p
(U)
:= (
[[k
_
U
[D
u[
p
)
1
p
.
This looks pretty messy, so lets pause here and think about whats going on. The creation
of the space itself makes sense because we want to be looking in a space which contains all
the weak derivatives were interested in.
What about the norm? What exactly is it measuring? Essentially, this norm gives higher
values if two functions are close together in terms of weak derivatives. Since these spaces
are designed to look for solutions of PDEs, this is a very natural formulation of the norm.
In general, functions belonging to Sobolev spaces can be discontinuous and unbounded.
In fact, its possible to create quite pathological examples of these, such as a function thats
unbounded on every open subset of U. Therefore, even though these spaces are nicer than
many, the functions that occupy them can have extremely undesirable properties.
Theorem 8.1.1. (Properties of weak derivatives). Assume u, v W
k,p
(U), [[ k. Then
1. D
u W
k[[,p
(U) and D
(D
u) = D
+
u for all multi-indices , with [[+[[ k.
2. For each , R, u + v W
k,p
(U) and D
(u + v) = D
u + D
v, k.
3. If V is an open subset of U, then u W
k,p
(V ).
Sobolev Spaces 73
Proof. We prove the rst part. Fix C
c
(U). Then D
c
(U), and
_
U
D
uD
dx = (1)
_
U
uD
+
dx
= (1)
(1)
+
_
U
D
+
u dx
= (1)
_
U
D
+
u dx
implying D
(D
u) = D
+
u in the weak sense.
Theorem 8.1.2. For any k N, Sobolev spaces are Banach.
Proof. First we check if the norm is really a norm. The rst two parts are easy and are
left to the reader. To check the triangle inequality, for 1 p < , we can use Minkowskis
inequality to get
|u + v|
W
k,p
(U)
=
_
_
[[k
|D
u + D
v|
L
p
(U)
_
_
1
p
_
_
[[k
|D
u|
L
p
(U)
+|D
v|
L
p
(U)
_
_
1
p
_
_
[[k
|D
u|
L
p
(U)
_
_
1
p
+
_
_
[[k
|D
v|
L
p
(U)
_
_
1
p
= |u|
W
k,p
(U)
+|v|
W
k,p
(U)
To show W
k,p
(U) is complete, consider a Cauchy sequence (u
m
) W
k,p
(U). Then clearly
for each [[ k, (D
u
m
) is Cauchy in L
p
(U). Since this space is complete, u
L
p
(U) s.t.
D
u
m
u
, [[ k. Therefore u
m
u, for some u. We want to show that u W
k,p(U)
.
Fix C
c
(U). Then
_
U
uD
dx = lim
m
_
U
u
m
D
dx
= lim
m
(1)
_
U
D
u
m
dx
= (1)
_
U
u
dx.
This proves the assertion.
8.1.3 More advanced properties
We mention some more advanced properties of Sobolev spaces. All of these are highly
nontrivial, and take some work to prove, although for our purposes, its enough to mention
the main theorems.
The rst notion is that of approximation. Weak derivatives are somewhat dicult to
work with, so a question thats natural to ask is whether theres any way of approximating
functions in Sobolev spaces with simpler ones. It turns out that we can do this with sequences
of smooth functions. We call this a global approximation by smooth functions up to the
boundary.
Theorem 8.1.3. (Approximation Theorem). Assume U is bounded, and U is C
1
. Suppose
u W
k,p
(U) for 1 p . Then there exist functions u
m
C
(U) s.t. u
m
u.
The second goal is that of extension. Theres a long history in analysis and topology
of extension theorems, which basically ask the question of whether its possible to extend
a function dened on a subset or subspace to a larger domain. Two famous examples that
come to mind are the Tietze extension theorem, and the Hahn-Banach theorems. For Sobolev
spaces, we have the following result:
Theorem 8.1.4. (Extension Theorem). Assume U is bounded and U is C
1
. Select a
bounded open set V

U. Then there exists a bounded linear operator E : W
1,p
(U)
W
1,p
(R
n
) s.t. for each u W
1,p
(U)
1. Eu = u a.e. in U.
2. Eu has support within V .
3. |Eu|
W
1,p
(R
n
)
C|u|
W
1,p
(U)
, where the constant C depends only on p, U and V .
We can extend the argument generally used to prove this assertion for W
2,p
(U), but in
general, its quite dicult to prove it for higher derivatives.
Finally, we mention the trace operators. These allow us to assign boundary values along
U to functions u W
1,p
(U) given that U C
1
. Note that this isnt quite as simple as
it looks: the measure of the boundary is 0, and if we say u restricted to U, we face the
obvious problem that u is only dened a.e. However, we know that boundary conditions are
important in PDEs, and thus must be accounted for in some way. The trace operator solves
this problem. Given 1 p ,
Theorem 8.1.5. (Trace Theorem). Assume U is bounded and U is C
1
. Then there exists
a bounded linear operator T : W
1,p
(U) L
p
(U) s.t.
1. Tu = u[
U
if u W
1,p
(U) C(
U).
2. |Tu|
L
p
(U)
C|u|
W
1,p(U) for each u W
1,p
(U), with C only depending on p and U.
Tu is the trace of u on U.
This looks fairly abstract, and its sort of hard to see what exactly is going on here.
However, we can use this theorem to help us prove the following useful characterization:
Sobolev Inequalities 75
Theorem 8.1.6. (Trace-zero functions in W
1,p
). Assume U is bounded, U is C
1
and
u W
1,p
(U). Then u W
1,p
0
(U) i Tu = 0 on U.
Therefore, this gives us a smoothness condition dened solely via an operator.
8.2 Sobolev Inequalities
These are arguably amongst the most powerful tools developed in the theory of Sobolev
spaces. They help us discover embeddings of various Sobolev spaces into others. Their use is
ubiquitous not only in PDE theory, but also in areas of dierential geometry and probability
theory.
Why would we want to embed one space into another? Well, as mentioned earlier, Sobolev
spaces can have functions with pretty bad properties. However, some Sobolev spaces are
worse than others: for example, W
1,p
(U) is worse than W
2,p
(U) because it admits functions
of only one derivative. Depending on the problem, an embedding can be useful to prove how
smooth a given class of functions is.
We will go through some of the proofs in these sections, because theyre somewhat in-
structive. Each such inequality denes an upper bound on the norm of a function by the
norm of its gradient vectors, a notion that is certainly not immediately obvious.
8.2.1 Gagliardo-Nirenberg-Sobolev inequality
We dene the Sobolev conjugate of a number 1 p < n by p
:=
np
np
.
Theorem 8.2.1. (Gagliardo-Nirenberg-Sobolev inequality). Assume 1 p n. Then there
exists a constant C, depending only on p and n s.t.
|u|
L
p
(R
n
)
C|Du|
L
p
(R
n
)
u C
1
c
(R
n
).
Proof. First, assume p = 1. Since u has compact support, for each i = 1, . . . , n and x R
n
we have
u(x) =
_
x
i
u
x
i
(x
1
, . . . , x
i1
, y
i
, x
i
, . . . , x
n
) dy
i
which implies
[u(x)[
_

[D(x
1
, . . . , y
i
, . . . , x
n
)[ dy
i
(i = 1, . . . , n)
giving
[u(x)[
n
n1
i=1
_

[D(x
1
, . . . , y
i
, . . . , x
n
)[ dy
i
If we integrate this inequality with respect to x
1
, we get
_

[u[
n
n1
dx
1

_

i=1
__

[Du[dy
i
_ 1
n1
=
__

[Du[dy
1
_ 1
n1
_

i=2
__

[Du[dy
i
_ 1
n1
__

[Du[dy
1
_ 1
n1
_
n
i=2
_

[Du[dx
1
d
y
i
_ 1
n1
the last line following from Holders inequality.
If we integrate the above equation w.r.t x
2
, we have
_

[u[
n
n1
dx
1
dx
2

__

[Du[dx
1
d
y
i
_ 1
n1
_

i=1,i,=2
I
1
n1
i
dx
2
where I
1
:=
_
[Du[dy
1
and I
i
:=
_
[Du[dx
1
dy
i
, for i = 3, . . . , n. We can apply
H older as above to get a similar estimate. We continue to integrate out the other variables,
ending up with the following:
_
R
n
[u[
n
n1
i=1
__

[Du[dx
1
, . . . , d
y
i
, . . . , dx
n
_ 1
n1
(8.2)
=
__
R
n
[Du[dx
_ n
n1
(8.3)
This gives us the proof for p = 1. Now consider the case 1 < p < . We can apply the
estimate (8.2) to v := [u[
, where > 1 is to be chosen. Then we have

__
R
n
[u[
n
n1
dx
_ n
n1
_
R
n
[D[u[
[ =
_
R
n
[y[
1
[Du[dx

__
R
n
[u[
(1)
p
p1
_
p1
p
__
R
n
[Du[
p
dx
_1
p
Choose s.t.
n
n1
= ( 1)
p
p1
, giving us =
p(n1)
np
> 1, which in turn imples
n
n1
= p
.
This gives us
__
R
n
[u[
p
dx
_ 1
p
C
__
R
n
[Du[
p
dx
_1
p
from which the theorem follows.
Theorem 8.2.2. Assume U is a bounded open subset of R
n
, and u W
1,p
0
(U) for some
1 p < n. Then
|u|
L
q C|Du|
L
p
(U)
for each q [1, p
], the constant C on depending on p, q, n and U.

Sobolev Inequalities 77
Proof. Since u W
1,p
0
(U), (u
m
) C
c
(U) s.t. u
m
u. Extend each u
m
to be 0 on R
n

U
and apply the previous inequality to discover |u|
L
p
(R
n
)
C|Du|
L
p
(R
n
)
. As [U[ < , we
have |u|
L
q C|Du|
L
p
(U)
for 1 q p
.
Proceeding along these lines, one ends up with the embedding theorem, which can be
stated as
Theorem 8.2.3. (Sobolev Embedding Theorem). Let k l and 1 p < q . If
1
q
=
1
p

kl
n
, then W
k,p
(R
n
) W
l,p
(R
n
).
Most of the material presented in this section comes from [4]. For a more concise and
extremely rigorous introduction to the theory of distributions and Sobolev spaces from a
measure theoretic point of view, see [5], which uses Radon measures and Frechet spaces to
prove results on distributions, and then applies some of them to Sobolev spaces.
8.2.2 Nash Inequality
In [8], Nash proved the following
Theorem 8.2.4. (Nash Inequality). There exists C > 0 s.t. u L
1
(R
n
) W
1,2
(R
n
),
|u|
1+2/n
L
2
(R
n
)
C|u|
2/n
L
1
(R
n
)
|Du|
L
2
(R
n
)
.
To show this, we use a basic result from harmonic analysis called Parsevals theorem,
which states that for a function u, |u|
L
2 = | u|
L
2, where u is the Fourier transform of u.
We can now prove the inequality.
Proof. The Fourier transform of a function u can be written as
u(y) = (2)
n
2
_
R
n
u(x)e
ixy
dx
Now the transform of the variable
u
x
k
is iy
k
u. This gives us the relation
_
R
n
u
x
k
2
dx =
_
R
n
y
2
k
[ u[
2
dy
Since [Du[
2
=
k
(
u
x
k
)
2
, this implies that
_
R
n
[Du[
2
dx =
_
R
n
[y[
2
[ u[
2
dy
Note also that
[ u[ (2)
n
2
_
R
n
[e
ixy
[[u[dx
= (2)
n
2
_
R
n
[u[dx = (2)
n
2
|u|
L
1
(R
n
)
Now pick any radius p > 0. We then have
_
[x[p
[ u(x)[
2
dx p
n
n
|u|
2
L
1
where
n
is the volume of the n-ball.
On the other hand,
_
[x[>p
[ u(x)[
2
dx =
_
[x[>p
x
2
p
2
[ u(x)[
2
dx
=
_
R
n
p
2
[Du[
2
dx
Choose p to minimize the sum of the previous two equations. After some manipulation,
this gives us
_
R
n
[Du[
2
dx
_
4n
n + 2
__
(n/2)!
1 + n/2
_2
n
__
R
n
[u[
2
dx
_
1+
2
n
__
R
n
[u[dx
_
4
n
The proof of this inequality is interesting because it is quite elementary. Nash originally
introduced it to prove the continuity of solutions of certain nonlinear PDEs.
8.2.3 Poincare Inequality
Theorem 8.2.5. Assume 1 p , and that U is a bounded open subset of R
n
, with a C
1
boundary U. Then for any u W
1,p
(U) there exists C > 0, depending only on U, n and p
s.t. for
|u u
U
|
L
p
(U)
C|Du|
L
p
(U)
where u
U
=
1
[U[
_
U
u(x)dx.
Note that the main dierence between this inequality and the previous ones is that both
norms in the relation are the same. Proving this requires a compactness result that we wont
go into here.
A dierent version of this inequality can be stated as follows:
Theorem 8.2.6. Assume U is a bounded open subset of R
n
. Then there exists C > 0 s.t.
|u|
L
2
(U)
C|[Du[|
L
2
(U)
u H
1
0
(U).
Weak Solutions 79
8.3 Weak Solutions
We now go over the concept of a weak solution for a particular kind of PDE. We dont
delve too much into details, and dont give a completely general solution, focusing only on
a certain Sobolev space, i.e. H
k
(U), mainly because these are Hilbert spaces, and we can
apply our earlier functional analysis results.
8.3.1 Elliptic PDEs
A second order elliptic PDE can be written in divergence form as
_
Lu = f in U R
n
;
u = 0 on u.
(8.4)
To give a concrete example, the Dirichlet boundary value problem can be written as
Lu =
i,j=1
(a
ij
(x)u
x
i
)
x
j
+
i=1
b
i
(x)u
x
i
+ c(x)u
Here, the assumption is that all functions a
ij
, b
i
, c L
(U) for i = 1, . . . , n, f L
2
(U),
and each value a
ij
(x) = a
ji
(x), x U.
As mentioned earlier, we say that the operator L is elliptic in U if > 0 s.t.
n
i,j=1
a
ij
j
[[
2
= (
1
, . . . ,
n
) R
n
This is equivalent to saying that the matrix (a
ij
) I, or equivalently, that all the
eigenvalues of the matrix are equal to or greater than .
8.3.2 The notion of a weak solution
Suppose u is a classical solution of Lu = f (for our purposes, lets say u C
2
(U)) and the
equation is satised everywhere). Thus, we have
i,j=1
(a
ij
(x)u
x
i
)
x
j
+
i=1
b
i
(x)u
x
i
+ c(x)u = f(x) x R.
Take C
0
(U). Then
_
U
_
i,j=1
(a
ij
(x)u
x
i
)
x
j
+
i=1
b
i
(x)u
x
i
+ c(x)u
_
dx =
_
U
fdx
_
U
_
i,j=1
(a
ij
(x)u
x
i
)
x
j
+
i=1
b
i
(x)u
x
i
+ c(x)u
_
dx =
_
U
fdx (8.5)
Recalling the denition of H
1
0
(U), we can see clearly that the above equation is also true
H
1
0
(U).
We now dene the bilinear functional associated with the operator L, namely
B[u, v] =
_
U
_
i,j=1
(a
ij
(x)u
x
i
)v
x
j
+
i=1
b
i
(x)u
x
i
v + c(x)uv
_
dx
We have that B : H
1
0
(U) H
1
0
(U) R. Here u
x
i
and v
x
j
are derivatives in the weak
sense.
We say that u H
1
0
(U) is a weak solution of the the PDE (8.4) if
B[u, v] = f, v
L
2
(U)
v H
1
0
(U).
Notice how this ties in neatly with equation (8.5). We can dene more general weak
solutions, but for these notes, this is enough.
8.3.3 An existence result
Theorem 8.3.1. Assume that b
i
0 and c(x) > 0 x U . Then there exists a unique
weak solution for PDE (8.4).
Proof. The rst step is to prove the boundedness of the bilinear form B. We have that
[B[u, v][ =
_
U
_
i,j=1
(a
ij
(x)u
x
i
)v
x
j
+ c(x)uv
_
dx
C
1
_
U
i,j=1
[u
x
i
[[v
x
j
[dx + C
2
_
U
[u[[v[dx
C
1
|Du|
L
2
(U)
|Dv|
L
2
(U)
+ C
2
|u|
L
2
(U)
|v|
L
2
(U)
C|u|
H
1
(U)
|v|
H
1
(U)
The second step is to prove the ellipticity of B. Here, we have
B[u, u] =
_
U
n
i,j=1
a
ij
(x)u
x
i
u
x
j
dx +
_
U
c(x)u
2
dx

_
U
[Du[
2
dx
Note that
|u|
2
H
1
(U)
=
_
U
i
u
2
x
i
dx +
_
U
u
2
dx
=
_
U
[Du[
2
dx +
_
U
u
2
dx
Weak Solutions 81
From the second Poincare inequality, we have that
_
U
[Du[
2
dx C
_
U
u
2
dx
Therefore, after some rearranging, we end up with B[u, u] |u|
2
H
1
(U)
for some constant
> 0.
For the given f L
2
(U), set f(v) := f, v
L
2
(U)
. This is a bounded linear functional.
But this means that the conditions of the Lax-Milgram theorem are satised, meaning that
there exists some unique u H
1
(U) s.t. B[u, v] = f(v) = f, v v H
1
(U). Since this is
our denition of a weak solution, we are done.
8.3.4 Further reading
We have given just the tip of the iceberg as far as existence theorems go.
Regularity results in PDEs (i.e. trying to determine the smoothness of a solution) are
notoriously dicult, but are a major source of current research.
Notice something interesting. We have dened PDEs as operators on spaces, i.e. A :
X Y . However, each Sobolev space is dened on locally integrable functions. What
this means is that we have freed ourselves completely from just relying on Eulcidean space!
There is nothing stopping us from applying this theory to more complicated spaces, such as
manifolds, and indeed, this is exactly where PDEs become crucial in dierential geometry,
which is the study of smooth functions on manifolds, spaces that are locally Euclidean, but
more complicated overall.
The rigorous theory of PDEs is outlined in countless books, and has had an impact on
many areas of mathematics. Its beyond the scope of these notes to do a literature survey.
However, the most common text for beginning graduate students in mathematics interested
in questions of existence and smoothness is probably [4].
Chapter 9
Reproducing Kernel Hilbert Spaces
We outline some of the basics of the theory of Reproducing Kernel Hilbert Spaces (RKHSs),
and then give an explicit representation of the basis for Gaussian RKHSs. A familarity with
basic complex analysis is assumed in this section.
9.1 Denition
Suppose you are given a set X, which is usually taken to be R
d
, C
d
or a subset of either.
Consider the set of functions
F(X, F) = f[f : X F.
where F is a general eld. If we equip these functions with the operation of addition and
scalar multiplication, we end up with a vector space over this eld.
Denition 9.1.1. Given a set X, we have that 1 is a reproducing kernel Hilbert space
(RKHS) on X over F if
1. 1 is a vector subspace of F(X, F).
2. 1 has an inner product , .
3. Given x X, the linear functional E
x
: 1 F dened by E
y
[f] = f(y) is bounded,
and thus continuous.
Its easy to show the completeness of this space, and since we have a denition of an
inner product, we have that 1 is a Hilbert space. Further, from basic functional analysis,
we know that 1 is self-dual, and thus the set of functionals E
x
is contained within 1. By
the Riesz Representation theorem, for any x X, there exists a unique vector k
x
1 s.t.
f 1, f(x) = f, k
x
. This function k
x
is called the reproducing kernel for the point
x, and the function K(y, x) = k
x
(y) is called the reproducing kernel for the space 1.
Note that we have the properties
K(y, x) = k
x
(y) = k
x
, k
y
An example 83
and
|E
x
|
2
= k
x
, k
x
= K(x, x).
Due to the nature of the inner product, the function K is clearly Hermitian.
9.2 An example
One thing one immediately suspects is that this Hilbert space is, in a sense, even nicer than
the usual examples of Hilbert spaces that one encounters. Consider the set of continuous
functions on [0, 1], denoted by C[0, 1]. Dene the 2 norm on this space (|f|
2
=
_
1
0
[f(t)[
2
dt),
and complete it to get the Hilbert space L
2
[0, 1]. From standard measure theory, we know
that one cannot really talk about functions in L
2
[0, 1] as having values on single points (or
even on sets of measure zero), because in a sense, these are unimportant from the measures
point of view. Therefore this space cannot be an RKHS
1
.
However, lets consider a simple Sobolev space on [0, 1]. Let 1be the space of all functions
f : [0, 1] R s.t. f is absolutely continuous (a.c), f(0) = f(1) = 0 and f
t
L
2
[0, 1]
(a.c functions are dierentiable almost everywhere and are equal to the integral of their
derivative). Endow 1 with the form f, g =
_
1
0
f
t
(t)g
t
(t)dt. Since f is a.c and f(0) = 0 for
any x in this interval, we have that f(x) =
_
x
0
f
t
(t)dt =
_
1
0
f
t
(t)
[0,x]
(t)dt. Using Cauchy-
Schwarz, we have
[f(x)[
__
1
0
f
t
(t)
2
dt
_
1/2
__
1
0
[0,x]
(t)dt
_
1/2
= |f|
x.
This means that |f| = 0 i f = 0, implying that the form , is an inner product.
Further, we have that [f(x)[ = [E
x
[f][ |f|
x. Taking the sup over |f| 1, we have

that |E
x
|
x, and is thus bounded. Therefore, the last thing we need to show is that the
space is complete.
Consider a Cauchy sequence (f
n
). If this sequence is Cauchy in this norm, we have that
(f
t
n
) is Cauchy in L
2
[0, 1], and thus g L
2
[0, 1] that the latter sequence converges to. Using
the above inequality, we see that (f
n
) is pointwise Cauchy, and thus we may dene a function
by setting f(x) = lim
n
f
n
(x). Since f(x) = lim
n
f
n
(x) = lim
n
_
x
0
f
t
n
(t)dt =
_
x
0
g(t)dt, f is a.c
and f
t
L
2
[0, 1]. Also, f(0) = lim
n
f
n
(0) = 0 = lim
n
f
n
(1) = f(1), and this f lies in 1,
implying the space is complete.
Once can explicitly nd a specic reproducing kernel for this space using Greens func-
tions, although we dont do so here.
Other prominent examples for explicit RKHSs include the Hardy space of the unit disk
and Bergman spaces on complex domains. These spaces are heavily used in complex anal-
ysis, and as we will see shortly, RKHSs allow for the introduction of many concepts from
this area, which is a viewpoint thats not really emphasized in the modern study of these
spaces in connection with learning. Nachman Aronszajns original paper used this viewpoint,
inuenced by Stefan Bergmans notion of kernels of classes of analytic functions.
1
Also, its possible to make unbounded linear Dirac functionals in L
2
.
84 Reproducing Kernel Hilbert Spaces
9.3 Properties
9.3.1 Complexication
Suppose 1 is an RKHS of real-valued functions on a set X with reproducing kernel K(x, y).
Create a space J = f
1
+ if
2
: f
1
, f
2
1, which is a vector space of complex valued
functions on X. Set the following inner product
f
1
+ if
2
, g
1
+ ig
2
J
= f
1
, g
1
7
+ if
2
, g
1
7
if
1
, g
2
7
+f
2
, g
2
7
with the norm
|f
1
+ if
2
|
2
J
= |f
1
|
2
7
+|f
2
|
2
7
.
Thus J is a Hilbert space, and since f
1
(y)+if
2
(y) = f
1
+if
2
, k
x
J
, J equipped with this
inner product is an RKHS with reproducing kernel K(x, y). J is called the complexication
of 1, and since every real-valued RKHS can be complexied, for the general theory, we only
consider RKHSs on the complex eld.
9.3.2 General Theory
Proposition 9.3.1. Let 1 be an RKHS on the set X with kernel K. Then the linear span
of the functions k
x
is dense in 1.
Proof. A function f 1 is orthogonal to the span of the functions k
x
: x X i
f, k
x
= f(y) = 0 for every x X, meaning f = 0.
Proposition 9.3.2. Let 1 be an RKHS on X and let (f
n
) 1. If lim
n
|f
n
f| = 0, then
f(x) = lim
n
f
n
(x) for every x X.
Proof. We have that [f
n
(x) f(x)[ = [f
n
f, k
x
[ |f
n
f||k
x
| 0.
This is an unusual and important property thats not true in most spaces.
Proposition 9.3.3. Let 1
i
, i = 1, 2 be RKHSs on X with kernels K
i
(x, y), i = 1, 2. If
K
1
(x, y) = K
2
(x, y) x, y X, then 1
1
= 1
2
and |f|
1
= |f|
2
.
Proof. Let K(x, y) = K
1
(x, y) = K
2
(x, y) and dene J
i
= spank
x
1
i
. By the above
result, J
i
is dense in 1
i
. If f J
i
, we have that f(x) =
j

j
k
x
j
(x). Therefore, its values
as a function are independent of whether its in J
1
or J
2
. Also, |f|
2
1
=
i,j

i
j
k
x
i
, k
x
j
=
i,j

i
j
K(x
j
, x
i
) = |f|
2
2
. So f J
1
= J
2
.
Now suppose f 1
1
. Then there exists a sequence of functions (f
n
) J
1
with |f
f
n
| 0. Since (f
n
) is Cauchy in J
1
, its also Cauchy in J
2
, implying that g 1
2
s.t.
|g f
n
|
2
0. Therefore f(x) = lim
n
f
n
(x) = g(x), which in turn implies that if f 1
1
,
its also in 1
2
, and vice versa. Hence 1
1
= 1
2
. Finally, since the norms of functions are
equal on a dense subset, theyre also equal for every f.
Recall the Parseval identities: given e
s
: s S, an orthonormal basis (ONB) for a
Hilbert space 1, for any h 1, we have |h|
2
=
sS
[h, e
s
[
2
and h =
sS
h, e
s
e
s
.
Properties 85
Theorem 9.3.1. Let 1 be an RKHS on X with reproducing kernel K(x, y). If e
s
: s S
is an orthonormal basis for 1, then K(x, y) =
sS
e
s
(y)e
s
(x), where this series converges
pointwise.
Proof. For any y X, we have that k
y
, e
s
= e
s
, k
y
= e
s
(y). Hence k
y
=
sS
e
s
(y)e
s
,
where these sums converge in norm on 1. But this means they converge pointwise, giving
us K(x, y) = k
y
(x) =
sS
e
s
(y)e
s
(x).
9.3.3 Characterization of Reproducing Kernels
We would like to obtain the necessary and sucient conditions for a function K(x, y) to be
the reproducing kernel for some RKHS.
Denition 9.3.1. Let X be a set and let K : X X C be a function of two variables.
We call K a kernel function if given any set of n points x
1
, . . . , x
n
, x
i
X, we have
K(x
i
, x
j
) 0.
We want to show that a function is a kernel function i there is an RKHS for which its
the reproducing kernel.
Proposition 9.3.4. Let X be a set and let 1 be an RKHS on X with reproducing kernel
K. Then K is a kernel function.
Proof. Fix x
1
, . . . , x
n
X and
i
C. Then we have that
j
K(x
i
, x
j
) =
j

j
k
x
j
,
i
k
x
i
=
|
j

j
k
x
j
|
2
0.
The converse takes a lot more work to prove, although the underlying principle is quite
simple.
Theorem 9.3.2. (Moore). Let X be a set and let K : X X C be a function. If K
is a kernel function, the there exists an RKHS of functions on X s.t. K is the reproducing
kernel of 1.
Proof. For each y X, set k
y
(x) = K(x, y) and let W F(X) be the space spanned by the
set k
y
: y X of these functions. We claim theres a well-dened map B : W W C
given by B(
j

j
k
y
j
,
i
k
y
i
) =
i,j

j
i
K(y
i
, y
j
), where
i
and
j
are scalars.
To see whether B is well dened on W, we must show that if f(x) =
j

j
k
y
j
(x) is
identically zero as a function on X, then B(f, w) = B(w, f) = 0 for any w W. Since W
is spanned by the functions k
y
, its enough to show that B(f, k
y
) = B(k
y
, f) = 0. From the
denition, we have B(f, k
y
) =
j

j
K(y, y
j
) = f(y) = 0. Conversely, if B(k
y
, f) = 0 for
every w W, (taking w = k
y
), we see that f(y) = 0. Thus B(f, w) = 0 w W i f
is identically zero as a function on X. Thus B is well dened, and obviously sesquilinear.
Moreover, for any f W we have that f(x) = B(f, k
x
). Since K is positive, for any f =
j

j
k
y
j
, we have that B(f, f) =
i,j

i
j
K(y
i
y
j
) 0. Therefore B denes a semidenite
inner product on W, and by a similar proof to Cauchy-Schwarz, we see that B(f, f) = 0 i
B(w, f) = B(f, w) = 0 w W. Therefore B(f, f) = 0 i f is identically 0. Thus B is an
inner product on W.
Given any inner product on a vector space, we can complete the space by taking equiv-
alence classes of Cauchy sequences of functions from W to obtain a Hilbert space 1. We
need to show that every element of 1 is actually a function on X itself. Let h 1, and let
(f
n
) W be a Cauchy sequence that converges to h. By Cauchy-Schwarz, [f
n
(x) f
m
(x)[ =
B(f
n
f
m
, k
x
) |f
n
f
m
|
_
K(x, x). Thus the sequence is pointwise Cauchy and we
may dene h(x) = lim
n
f
n
(x). This value is independent of the particular Cauchy sequence
chosen.
Finally, if we let , denote the inner product on 1, then for h, we have h, k
y
=
lim
n
f
n
, k
y
= lim
n
B(f
n
, k
y
) = lim
n
f
n
(y) = h(y). Thus 1 is an RKHS on X and since k
y
is the reproducing kernel for the point y, we have that K(x, y) = k
y
(x) is the reproducing
kernel for 1.
9.3.4 Relating the C-RKHS with the R-RKHS of a Real-Valued
Kernel
We have the following:
Proposition 9.3.5. Let K : X X C be a kernel and 1 its corresponding C-RKHS. If
we have K(x, x
t
) R for all x, x
t
X, then
1
R
:= f : X R[ g 1 with Re g = f
equipped with the norm
|f|
7
R
:= inf|g|
7
: g 1 with Re g = f
with f 1
R
is the R-RKHS of the R-valued kernel k.
The proof of this is somewhat technical and unilluminating, and so we omit it from these
notes.
9.4 The Gaussian RBF Kernel
We now focus on the construction of an explicit ONB for the RKHS generated by the
Gaussian RBF kernel
2
. Here we denote the j-th component of a vector z C
d
by z
j
. The
complex Gaussian kernel is given by
K
,C
d(z, z
t
) := exp
_
2
d
j=1
(z
j
z
j
t
)
2
_
for d N, > 0, and z, z
t
C
d
, meaning that K
,C
d is a complex valued kernel on C
d
. Of
course, the real Gaussian kernel is given by
K
(x, x
t
) = exp(
2
|x x
t
|
2
2
)
2
All the results in this section are taken from [11].
The Gaussian RBF Kernel 87
for all x, x
t
R
d
.
One thing that an astute reader may wonder about is why were bothering to consider
the complex valued case, given that its of almost no use in learning. The reason for this is
that the complex version of the kernel is relatively easy to handle via basic theorems from
complex analysis. After this is done, the real version falls out as a special case.
9.4.1 The Space 1
,C
d
Our goal is nd an appropriate Hilbert space and to prove that the given space is the RKHS
of the Gaussian kernel. This section introduces that space.
Let > 0 and d N. For a given holomorphic function f : C
d
C, we dene
|f|
,C
d :=
_
2
d
2d
d
_
C
d
[f(z)[
2
e
d
j=1
(z
j
z
j
)
2
dz
_
1/2
,
where dz stands for the complex Lebesgue measure on C
d
. The space is then given by
1
,C
d := f : C
d
C[f holomorphic and |f|
,C
d < .
This is a complex function space with pre-Hilbert norm | |
,C
d. We now need to show
that 1
,C
d is an RKHS. For this, we need the following lemma:
Lemma 9.4.1. For all d N, all holomorphic functions f : C
d
C, all r
1
, . . . , r
d
> 0, and
all z C, we have
[f(z)[
2
1
(2)
d
_
2
0

_
2
0
[f(z
1
+ r
1
e
i
1
, . . . , z
d
+ r
d
e
i
d
)[
2
d
1
d
d
. (9.1)
Proof. We prove this by induction over d. For d = 1, apply Hardys convexity theorem,
which states that the function r 1/2
_
2
0
[f(z + re
i
)[
2
d is nondecreasing on [0, ).
In the inductive step, suppose the statement holds up to d. Let f : C
d+1
C be a
holomorphic function and choose r
1
, . . . , r
d+1
. Since for xed (z
1
, . . . , z
d
) C
d
the function
z
d+1
f(z
1
, . . . , z
d
, z
d+1
) is holomorphic by the inductive hypothesis, we obtain
[f(z
1
, . . . , z
d+1
)[
2
1
2
_
2
0
[f(z
1
, . . . , z
d
, z
d+1
+ r
d+1
e
i
d+1
)[
2
d
d+1
.
Applying the inductive hypothesis to the holomorphic function (z
1
, . . . , z
d
) f(z
1
, . . . , z
d
, z
d+1
+
r
d+1
e
i
d+1
) on C
d
proves the statement for d + 1.
We can now prove the following:
Lemma 9.4.2. For all > 0 and all compact subsets C C
d
there exists a constant c
C,
> 0
s.t. for all z C and all f 1
,C
d we have that [f(z)[ c
C,
|f|
,C
d.
Proof. Dene c := maxe
d
j=1
(z
j
z
j
)
2
: (z
1
, . . . , z
d
) C + (B
C
)
d
, where B
C
is the closed
unit ball of C. By the previous lemma, we have
2
d
r
1
r
d
[f(z)[
2
r
1
r
d
d
_
2
0

_
2
0
[f(z
1
+ r
1
e
i
1
, , z
d
+ r
d
e
i
d
)[
2
d
1
d
d
.
Integrating this w.r.t. r = (r
1
, . . . , r
d
) over [0, 1]
d
gives us
[f(z)[
2
d
_
z+(B
C
)
d
[f(z
t
)[
2
dz
t
d
_
z+(B
C
)
d
[f(z
t
)[
2
e
d
j=1
(z
j
z
j
)
2
dz
t
c
(2
2
)
d
|f|
2
,C
d
.
This lemma shows that convergence in the | |
,C
d norm implies compact convergence,
or uniform convergence on every compact subset. From complex analysis, we know that a
compactly convergent sequence of holomorphic functions has a holomorphic limit. Thus the
space 1
,C
d equipped with this norm is an RKHS for every > 0.
9.4.2 The Complex ONB
The reproducing kernel of an RKHS is determined by an arbitrary ONB of the RKHS.
Therefore, we rst need to nd an ONB of 1
,C
d. To do this, we need some notation.
The tensor product fg : XX F of two functions f and g is dened by fg(x, x
t
) :=
f(x)g(x
t
) for x, x
t
X. A d-fold tensor is dened in a similar way. Also, we denote the set
0 N by N
0
.
To prove the main result of this section, we need the following technical lemma:
Lemma 9.4.3. For all n, m N
0
and all > 0 we have
_
C
z
n
( z)
m
e
2
2
z z
dz =
_
n!
(2
2
)
n+1
if n = m
0 otherwise.
Proof. Suppose n = m. Then
_
C
z
n
( z)
m
e
2
2
z z
dz =
_

0
_
2
0
r
2n
e
2
2
r
2
drdr
= 2
_

0
r
2n+1
e
2
2
r
2
dr
=

(2
2
)
n+1
_

0
t
n
e
t
dt
=
n!
(2
2
)
n+1
.
Now let n ,= m. Then we have
_
C
z
n
( z)
m
e
2
2
z z
dz =
_

0
r
_
2
0
r
n+m
e
i(nm)
e
2
2
r
2
ddr = 0.
We can now show
Theorem 9.4.1. For > 0 and n N
0
, dene the function e
n
: C C by
e
n
(z) :=
_
(2
2
)
n
n!
z
n
e
2
z
2
for all z C. Then the system (e
n
1
e
n
d
)
n
1
,...,n
d
0
is an ONB of 1
,C
d.
Proof. Since this proof is long and technical, lets rst consider the case d = 1. We need to
show that (e
n
)
n0
is an orthonormal system. For n, m N
0
and z C we see
e
n
(z)e
m
(z)e
2
(z z)
2
=
_
(2
2
)
n+m
n!m!
z
n
( z)
m
e
2
z
2
2
z
2
e
2
(z z)
2
=
_
(2
2
)
n+m
n!m!
z
n
( z)
m
e
2
2
z z
.
So for n, m 0, we get
e
n
, e
m
=
2
2
_
C
e
n
(z)e
m
(z)e
2
(z z)
2
dz =
2
2
_
(2
2
)
n+m
n!m!
_
C
z
n
( z)
m
e
2
2
z z
dz =
_
1 if n = m
0 otherwise.
by the previous lemma. Thus this is an orthonormal system.
To show completeness, let f 1
,C
. Then z e
2
z
2
f(z) is an entire function (i.e.
holomorphic over the entire complex plane), and thus there exists a sequence (a
n
) C s.t.
f(z) =
n=0
a
n
z
n
e
2
z
2
=
n=0
a
n
n!
(2
2
)
n
e
n
(z0 (9.2)
for all z C. We just need to show that the above conergence holds with respect to | |
,C
.
To prove this, recall from complex analysis that the series in (9.2) converges absolutely and
compactly. Thus for n 0, by the previous lemma
f, e
n
=
2
2
_
C
f(z)e
n
(z)e
2
(z z)
2
dz
=
2
2
m=0
a
m
_
C
z
m
e
2
z
2
e
n
(z)e
2
(z z)
2
dz
=
2
2
_
(2
2
)
n
n!
m=0
a
m
_
C
z
m
( z)
n
e
2
2
z z
dz
= a
n
n!
(2
2
)
n
. (9.3)
Now, since (e
n
) is an orthonormal system, we have that (f, e
n
)
2
by Bessels inequality.
Since (e
n
) is orthonormal in 1
,C
, we can nd a function g 1
,C
with g =
n=0
f, e
n
e
n
where the convergence takes place in 1
,C
. Now using (9.2),(9.3) and the fact that norm
convergence in RKHSs implies pointwise convergence, we get that g = f, meaning the series
in (9.2) converges w.r.t. | |
,C
.
For the general d-dimensional case, note that
e
n
1
e
n
d
, e
m
1
e
m
d
=
d
j=1
e
n
j
, e
m
j
7
,C
which implies that the system is orthonormal. To check completeness, x an f 1
,C
. Then
z f(z) exp(
2
d
i=1
z
2
i
) is an entire function. Then from a theorem in complex analysis
([10]), we have that there exist a
n
1
,...,n
d
C, (n
1
, . . . , n
d
) N
d
0
s.t.
f(z) =
(n
1
,...,n
d
)N
d
0
a
n
1
,...,n
d
d
i=1
z
n
i
i
e
2
z
2
i
=
(n
1
,...,n
d
)N
d
0
a
n
1
,...,n
d
d
i=1
n
i
!
(2
2
)
n
i
e
n
i
(z)
for all z = (z
1
, . . . , z
d
) C
d
. From this, one can easily derive f, e
n
1
e
n
d
=
a
n
1
,...,n
d
d
i=1
_
n
i
!
(2
2
)
n
i
, and thus obtain completeness.
We can now prove the main result of this section.
Theorem 9.4.2. Let > 0 and d N. Then the complex Gaussian RBF kernel K
,C
d is
the reproducing kernel of 1
,C
d.
Proof. Let K be the reproducing kernel of 1
,C
d. Then using the ONB of the last theorem
and the Taylor series expansion of the exponential function, we have
K(z, z
t
) =
n
1
,...,n
d
=0
e
n
1
e
n
d
(z)e
n
1
e
n
d
(z)
=
n
1
,...,n
d
=0
d
j=1
(2
2
)
n
j
n
j
!
(z z
t
)
n
j
e
2
z
2
j
2
( z
j
)
2
=
d
j=1
n
1
,...,n
d
=0
(2
2
)
n
j
n
j
!
(z z
t
)
n
j
e
2
z
2
j
2
( z
j
)
2
=
d
j=1
e
2
z
2
j
2
( z
j
)
2
+2
2
z
j
z
j
= e
d
j=1
(z
j
z
j
)
2
which proves the statement.
9.4.3 The Real ONB
With the use of the last theorem, we can now get some information on the RKHSs of the
real Gaussian RBF kernels K
.
Theorem 9.4.3. For X R
d
and > 0 the RKHS of the real-valued Gaussian RBF kernel
K
on X is
1
:= f : X R[g 1
,C
d with Re g
[X
= f
and for f 1
,X
the norm is given by
|f|
:= inf|g|
,C
d : g 1
,C
d with Re g
[X
= f.
Proof. This follows from the last theorem, and proposition 3.5.
Given X R and n N
0
, we dene e
X
n
: X R by
e
X
n
:=
_
(2
2
)
n
n!
x
n
e
2
x
2
for x X. The higher dimensional version of the basis is dened similar to the complex
extension. Its not hard to see that this is what the basis should be for the real case.
Bibliography
[1] Douglas Arnold. Functional Analysis, 1997.
[2] N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathemat-
ical Society 68 (3): 337404, 1950.
[3] William Arveson. A Short Course on Spectral Theory, Springer, 2002.
[4] Lawrence Evans. Partial Dierential Equations, AMS, 1998.
[5] Gerald Folland. Real Analysis: Modern Techniques and their Applications, Wiley Inter-
science, 1999.
[6] Evans M. Harrell II. A Short History of Operator Theory, 2004.
[7] Lynn Loomis. An Introduction to Abstract Harmonic Analysis, D. Van Nostrand Com-
pany, 1953.
[8] John Nash. Continuity of Solutions of Parabolic and Elliptic Equations, Amer. J. Math.
80: 931954, 1958.
[9] V.I. Paulsen. An Introduction to the Theory of Reproducing Kernel Hilbert Spaces. Course
notes, available from the authors web page, February 2006.
[10] R.M.Range. Holomorphic Functions and Integral Representations in Several Complex
Variables. Springer, 1986.
[11] I. Steinwart, D. Hush, and C. Scovel. An Explicit Description of the Reproducing Kernel
Hilbert Spaces of Gaussian RBF Kernels. IEEE Transactions on Information Theory, 52:
46354643, 2006a.
[12] Walter Rudin. Functional Analysis, Mcgraw Hill, 1991.
[13] Terry Tao. Why Global Regularity for Navier-Stokes is Hard, 2007.
Appendix A
Some Useful Notions
This section lists some notions from other areas of mathematics that are useful for interpret-
ing and applying results from operator theory.
A.1 Algebra
Modern operator theory plays a crucial role in representation theory, one of the most funda-
mental areas of mathematics. In representation theory, one associates the objects in abstract
algebra with linear transformations. This has applications in all types of elds, including
abstract harmonic analysis, geometry, and even number theory.
In addition, operator algebras such as the Banach algebra can be studied from the ref-
erence point of abstract algebra. We will give the denitions of some of these algebraic
concepts in this section. We really cant do much else, because each of these topics can be
studied endlessly.
A nonempty set of elements G forms a group if there exists a binary operation on these
elements called the product (denoted by ), and the following hold true:
1. a, b G a b G (closure)
2. a, b, c G a (b c) = (a b) c (associativity)
3. e G s.t. a e = e a = a a G (identity)
4. a G, a
1
G s.t. a a
1
= a
1
a = e (inverse)
A group G is said to be abelian (or commutative) if a, b G, a b = b a. A subgroup
S of a group G is a subset of elements from the original set that is closed under the binary
operation.
A set that has all the properties of a group except the inverse is called a monoid. An
object that also lacks an identity element is called a semigroup.
These simple denitions create an astonishingly rich theory that is still the focus of
research today. Groups, in particular, are one of the most important objects in modern
mathematics.
94 Some Useful Notions
A ring R is a more complicated object, and is basically a nonempty set with two binary
operations called addition (+) and multiplication () s.t. the set forms an abelian group
under the former, and a monoid under the latter. Both operations are also required to obey
the distributive laws, i.e,
a (b + c) = a b + a c a, b, c R,
(a + b) c = a c + b c a, b, c R.
A ring in which the multiplication operation is commutative is called a commutative ring.
A right ideal of an arbitrary ring R is a subset A which is a subgroup of R under the
addition operation, as well as having the property that x r A x A, r R. A left ideal
is similar, except that the latter property becomes r x A x A, r R. A subset that is
both a left and right ideal is called simply an ideal.
A module over a ring is a generalization of a vector space. Let R be a ring, and let (M, +)
be an abelian group. Then the left R-module is these two sets, which have an additional
operation dened (usually called scalar multiplication) over RM M s.t. r, s R, and
x M, we have
1. r (x + y) = r x + r y
2. (r + s) x = r x + s x
3. (r s) x = r (s x)
4. 1
R
x = x
where 1
R
is the identity element of the ring.
Finally, let R be a xed commutative ring. Then an associative algebra is an additive
abelian group with the structure of both the ring and an R module s.t.
r (xy) = (r x)y = x(r y) r R, x, y A.
Banach algebras are associative algebras, for example.
A.2 Analysis
A.2.1 L
p
spaces
A.2.2 Radon measures
Appendix B
Notation
Here is some of the notation used in the text.
Number Field Notation
N The natural numbers
Z The integers
Q The rational eld
R The real eld
C The complex eld
F A general eld
Re The real part of a complex number
Im The imaginary part of a complex number
Set Theoretic Notation
The empty set
E
o
The interior of E
E The closure of E
B
r
(x) An open ball of radius r centered at x
B
r
(x) A closed ball of radius r centered at x
96 Notation
Spaces
p
Space of summable sequences
L
p
Lebesgue spaces
1 Hilbert space
1
The dual space of 1

/(E
1
, E
2
) All maps from the space E
1
to E
2
((E
1
, E
2
) Continuous maps from the space E
1
to E
2
((U; F) Continuous maps on U to the eld F
((U) Continuous maps on U to the eld R
(
b
(U) Bounded continuous maps on U
(
k
(U) Continuous functions with k derivatives dened on U
(
k
0
(U) Continuous functions with k derivatives and compact support on U
B(E
1
, E
2
) Bounded, continuous, linear maps from the space E
1
to E
2
H
k
(U) Sobolev space equipped with the L
2
norm on U
Operations on Spaces
Direct sum
Tensor product
T(A) The domain of a map A
1(A) The range of a map A
((A) The graph of a map A
A(A) The nullspace of a map A
Index
absolutely convergent, 10
approximation property, 46
Baire category theorem, 8
Banach
algebra, 29
space, 9
Banach-Alaoglu theorem
for Hilbert spaces, 47
general, 22
Banach-Steinhaus theorem, 14
Bessels inequality, 24
bilinear functional, 30
Cauchy-Schwarz inequality, 18
closed graph theorem, 13
Closest point property, 20
commutative ring, 28
complementary projection, 43
completion (of a space), 14
composition
of a linear operator, 28
conjugate, 60
contraction (map), 15
decomposition theorem, 21
dual space, 22
embedding theorem, 61
xed point theorem, 15
Fredholm alternative, 54
functional, 13
generalized normal, 21
Gram-Schmidt, 24
Hahn-Banach theorem
complex, 16
real, 15
Hausdor space, 5
Hilbert space, 19
Hilbert-Schmidt kernel, 45
inner product space, 18
inverse
of an operator, 38
isometric embedding, 14
isometry, 13
isomorphism, 13
Lax-Milgram theorem, 33
linear independence, 6
metric space, 6
Minkowskis inequality, 9
normed space, 7
operator
compact, 44
elliptic, 32
nite dimensional, 44
idempotent, 43
isometric, 38
norm, 11
normal, 38
positive, 39
projection, 42
self-adjoint, 36
square root of, 41
unitary , 39
orthogonal
system, 23
orthogonal decomposition, 21
orthogonal projection, 21
orthogonality, 19
orthonormal system, 23
Parallelogram Law, 19
98 Index
Parsevals identity, 25
partial order, 15
polarization identity, 30
Pythagorean theorem, 20
quadratic form, 30
reexive space, 22
Riesz lemma, 8
Riesz representation theorem, 21
Schauder basis, 46
separable space, 25
skew-adjoint, 37
Sobolev space, 19
spectrum, 49
sublinear function, 15
topology
norm, 22
weak, 22
weak
, 22
trace operator, 59
vector space, 5
weak
derivative, 57
solution, 62
weak convergence, 23
Zorns lemma, 15

Operator Theory

Cargado por

Información del documento

Descripción original:

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Operator Theory

Cargado por

Copyright:

Formatos disponibles

Basic Operator Theory

algebras, which are

norms. Another common example is that of the /(S) and

L|. Therefore the two norms are

E is a subspace of E, and clearly g is a linear functional on it. Since g

f(x) p(x) = p(x), this implies that [

f(ix). f is a complex linear functional, and f

f|. Obviously f(x

f|d, from which the theorem

) dened by T (x)(f) = f(x) is an isometry.

implies that x, x = 0, so either the intersection is 0 or .

. Since S is closed, x = y + z, where y S, z S

= 0. So z = 0 implies that x = y, which means S

be its dual, i.e. the space of continuous linear functionals

. Now dene a linear map T(x)(y) =

. If f 0, then f = T(0). If f ,= 0, then A(f) (the null space of f) is

, there exists a unique x

is a Hilbert space with the inner product

are still continuous.

0, where the latter is the norm induced by the

Since |T (x)| sup

we say that X is reexive. If X is relexive, the weak and weak

are the same.

is a Hilbert space with the inner product f, g

. This maps elements of 1

is 1-1 and onto. But we have, for x 1 and y 1

be an orthonormal set. Then the following are equivalent:

x H, where the sum has only countably many nonzero elements

k(t, s)x(s)ds. This is a bounded operator. To see this, note that

(), and dene Tf = hf. Then

() be s.t. z c > 0. Then if

x 1, and thus the kernel is trivial, and A is 1-1.

is called the adjoint of T.

uniquely. We have that T

, which means the duality

B(1) s.t. Tx, y = x, T

A|. Now take the supremum. Then we

y = 0 x 1. This in turn gives us

. The second equality can be proven

, we say that A is self-adjoint. Here are some examples

k(t, s)x(s)ds, we have

(). If Tx = zx, dene

in those spaces where z = z.

, then (AB) = (AB)

, then every operator a

). Its clear that A and B are self-adjoint.

x, x (from the quadratic form).

= I. Note that if A is unitary, then A

A = I. Since A is 1-1 and 1(A) = 1, we know that A

(a, b). Then

x, meaning the quadratic forms for A and A

are the same, i.e.

, A B, then AC BC for every positive operator

, which commutes with A and B.

. To see this, note

0. Then a self-adjoint operator B s.t. B

0. There then exists a unique positive square root B of A.

. Let z S/ Then x Px, z = x, z Px, z =

. We will denote this by P

and call it the complementary projection.

and A(T) = 1(T

|x| and then

B(1). Then (T) [a, b] where a = inf