Introduction To Probability Theory and Stochastic Processes

VIENNA GRADUATE SCHOOL OF FINANCE (VGSF)
LECTURE NOTES
Introduction to Probability Theory and
Stochastic Processes (STATS)
Helmut Strasser
Department of Statistics and Mathematics
Vienna University of Economics and Business
Administration
Helmut.Strasser@wu-wien.ac.at
http://helmut.strasserweb.net/public
October 19, 2006
Copyright c _ 2006 by Helmut Strasser
All rights reserved. No part of this text may be reproduced, stored in a retrieval sys-
tem, or transmitted, in any form or by any means, electronic, mechanical, photocoping,
recording, or otherwise, without prior written permission of the author.
2
Contents
Preliminaries i
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
0.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
I Measure and Integration 1
1 Measure and probability 3
1.1 Sigma-elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Measures on the real line . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Measurable functions and random variables 13
2.1 The idea of measurability . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The basic abstract assertions . . . . . . . . . . . . . . . . . . . . . . 14
2.3 The structure of real-valued measurable functions . . . . . . . . . . . 14
3 Integral and expectation 17
3.1 The integral of simple functions . . . . . . . . . . . . . . . . . . . . 17
3.2 The extension process . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Convergence of integrals . . . . . . . . . . . . . . . . . . . . . . . . 21
The theorem of monotone convergence . . . . . . . . . . . . . . . . . 21
The innite series theorem . . . . . . . . . . . . . . . . . . . . . . . 22
The dominated convergence theorem . . . . . . . . . . . . . . . . . . 23
3.4 Stieltjes integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Proofs of the main theorems . . . . . . . . . . . . . . . . . . . . . . 27
4 Selected topics 29
4.1 Image measures and distributions . . . . . . . . . . . . . . . . . . . . 29
4.2 Measures with densities . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Product measures and Fubinis theorem . . . . . . . . . . . . . . . . 33
4.4 Spaces of integrable functions . . . . . . . . . . . . . . . . . . . . . 36
3
4 CONTENTS
4.5 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
II Probability theory 43
5 Beyond measure theory 45
5.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Convergence and limit theorems . . . . . . . . . . . . . . . . . . . . 46
5.3 The causality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Random walks 51
6.1 The ruin problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Walds equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Gambling systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7 Conditioning 61
7.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.3 Some theorems on martingales . . . . . . . . . . . . . . . . . . . . . 67
8 Stochastic processes 71
8.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.3 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.4 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.5 The Wiener Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9 Martingales 79
9.1 From independent increments to martingales . . . . . . . . . . . . . . 79
9.2 A technical issue: Augmentation . . . . . . . . . . . . . . . . . . . . 81
9.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
The optional stopping theorem . . . . . . . . . . . . . . . . . . . . . 85
9.4 Application: First passage times of the Wiener process . . . . . . . . 88
One-sided boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Two-sided boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 90
The reection principle . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.5 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . 93
CONTENTS 5
III Stochastic calculus 95
10 The stochastic integral 97
10.1 Integrals along stochastic paths . . . . . . . . . . . . . . . . . . . . . 97
10.2 The integral of simple processes . . . . . . . . . . . . . . . . . . . . 98
10.3 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.4 Extending the stochastic integral . . . . . . . . . . . . . . . . . . . . 103
10.5 The Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11 Calculus for the stochastic integral 107
11.1 The associativity rule . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.2 Quadratic variation and the integration-by-parts formula . . . . . . . 108
11.3 Itos formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
12 Applications to nancial markets 115
12.1 Financial markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
12.2 Trading strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
12.3 The Black-Scholes equation . . . . . . . . . . . . . . . . . . . . . . 117
12.4 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
12.5 Change of numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . 121
13 Stochastic differential equations 125
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
13.2 The abstract linear equation . . . . . . . . . . . . . . . . . . . . . . . 126
13.3 Wiener driven models . . . . . . . . . . . . . . . . . . . . . . . . . . 128
14 Martingale properties of stochastic integrals 131
14.1 Locally square integrable martingales . . . . . . . . . . . . . . . . . 131
14.2 Square integrable martingales . . . . . . . . . . . . . . . . . . . . . . 134
14.3 Levys theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.4 Martingale representation . . . . . . . . . . . . . . . . . . . . . . . . 136
15 Exponential martingale and Girsanovs theorem 141
15.1 The exponential martingale . . . . . . . . . . . . . . . . . . . . . . . 141
15.2 Likelihood processes . . . . . . . . . . . . . . . . . . . . . . . . . . 142
15.3 Change of probability measures . . . . . . . . . . . . . . . . . . . . 143
16 Martingales in nancial markets 147
16.1 Pricing in nancial markets . . . . . . . . . . . . . . . . . . . . . . . 147
16.2 Pricing in Black-Scholes markets . . . . . . . . . . . . . . . . . . . . 147
16.3 Pricing in diffusion market models . . . . . . . . . . . . . . . . . . . 149
6 CONTENTS
IV Appendix 151
17 Foundations of modern analysis 153
17.1 Basic notions on set theory . . . . . . . . . . . . . . . . . . . . . . . 153
Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Cartesian products . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Uncountable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
17.2 Sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
17.3 The set of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 157
17.4 Real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Basic denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Regulated functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
The variation of functions . . . . . . . . . . . . . . . . . . . . . . . . 163
17.5 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
17.6 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Preliminaries
0.1 Introduction
The goal of this course is to give an introduction into some mathematical concepts and
tools which are indispensable for understanding the modern mathematical theory of
nance. Let us give an overview of historic origins of some of the mathematical tools.
The central topic will be those probabilistic concepts and results which play an
important role in mathematical nance. Therefore we have to deal with mathematical
probability theory. Mathematical probability theory is formulated in a language that
comes from measure theory and integration. This language differs considerably from
the language of classical analysis, known under the label of calculus. Therefore, our
rst step will be to get an impression of basic measure theory and integration.
We will not go into the advanced problems of measure theory where this theory
becomes exciting. Such topics would be closely related to advanced set theory and
topology which also differs basically from mere set theoretic language and topologi-
cally driven slang which is convenient for talking about mathematics but nothing more.
Similarly, our usage of measure theory and integration is sort of a convenient language
which on this level is of little interest in itself. For us its worth arises with its power to
give insight into exciting applications like probability and mathematical nance.
Therefore, our presentation of measure theory and integration will be an overview
rather than a specialized training program. We will become more and more familiar
with the language and its typical kind of reasoning as we go into those applications
for which we are highly motivated. These will be probability theory and stochastic
calculus.
In the eld of probability theory we are interested in probability models having a
dynamic structure, i.e. a time evolution governed by endogeneous correlation proper-
ties. Such probability models are called stochastic processes.
Probability theory is a young theory compared with the classical cornerstones of
mathematics. It is illuminating to have a look at the evolution of some fundamental
ideas of dening a dynamic structure of stochastic processes.
One important line of thought is looking at stationarity. Models which are them-
selves stationary or are cumulatives of stationary models have determined the econo-
metric literature for decades. For Gaussian models one need not distinguish between
strict and weak (covariance) stationarity. As for weak stationarity it turns out that typi-
i
ii PRELIMINARIES
cal processes follow difference or differential equations driven by some noise process.
The concept of a noise process is motivated by the idea that it does not transport any
information.
From the beginning of serious investigation of stochastic processes (about 1900)
another idea was leading in the scientic literature, i.e. the Markov property. This
is not the place to go into details of the overwhelming progress in Markov chains
and processes achieved in the rst half of the 20th century. However, for a long time
this theory failed to describe the dynamic behaviour of continuous time Markov pro-
cesses in terms of equations between single states at different times. Such equations
have been the common tools for deterministic dynamics (ordinary difference and dif-
ferential equations) and for discrete time stationary stochastic sequences. In contrast,
continuous time Markov processes were dened in terms of the dynamic behaviour of
their distributions rather than of their states, using partial difference and differential
equations.
The situation changed dramatically about the middle of the 20th century. There
were two ingenious concepts at the beginning of this disruption. The rst is the con-
cept of a martingale introduced by Doob. The martingale turned out to be the nal
mathematical xation of the idea of noise. The notion of a martingale is located be-
tween a process with uncorrelated increments and a process with independent incre-
ments, both of which were the competing noise concepts up to that time. The second
concept is that of a stochastic integral due to K. Ito. This notion makes it possible to
apply differential reasoning to stochastic dynamics.
At the beginning of the stochastic part of this lecture we will present an introduc-
tion to the ideas of martingales and stopping times at hand of stochastic sequences
(discrete time processes). However, the main subject of the second half of the lecture
will be continuous time processes with a strong focus on the Wiener process. However,
the notions of martingales, semimartingales and stochastic integrals are introduced in
a way which lays the foundation for the study of more general process theory. The
choice of examples is governed by be the needs of nancial applications (covering the
notion of gambling, of course).
0.2 Literature
Let us give some comments to the bibliography.
The popular monograph by Bauer, [1], has been for a long time the standard text-
book in Germany on measure theoretic probability. However, probability theory has
many different faces. The book by Shiryaev, [21], is much closer to those modern
concepts we are heading to. Both texts are mathematically oriented, i.e. they aim at
giving complete and general proofs of fundamental facts, preferable in abstract terms.
A modern introduction into probability models containing plenty of fascinating phe-
nomena is given by Bremaud, [6] and [7]. The older monograph by Bremaud, [5], is
not located at the focus of this lecture but contains as appendix an excellent primer on
0.2. LITERATURE iii
probability theory.
Our topic in stochastic processes will be the Wiener process and the stochastic
analysis of Wiener driven systems. A standard monograph on this subject is Karatzas
and Shreve, [15]. The Wiener systems part of the probability primer by Bremaud
gives a very compact overview of the main facts. Today, Wiener driven systems are
a very special framework for modelling nancial markets. In the meanwhile, general
stochastic analysis is in a more or less nal state, called semimartingale theory. Present
and future research applies this theory in order to get a much more exible modelling
of nancial markets. Our introduction to semimartingale theory follows the outline by
Protter, [20] (see also [19]).
Let us mention some basic literature on mathematical nance.
There is a standard source by Hull, [11]. Although this book heavily tries to present
itself as not demanding, nevertheless the contrary is true. The reason is that the com-
bination of nancial intuition and the appearently informal utilization of advanced
mathematical tools requires on the readers side a lot of mathematical knowledge in
order to catch the intrinsics. Paul Wilmott, [22] and [23], tries to cover all topics in
nancial mathematics together with the corresponding intuition, and to make the an-
alytical framework a bit more explicit and detailed than Hull does. I consider these
books by Hull and Wilmott as a must for any beginner in mathematical nance.
The books by Hull and Wilmott do not pretend to talk about mathematics. Let us
mention some references which have a similar goal as this lecture, i.e. to present the
mathematical theory of stochastic analysis aiming at applications in nance.
A very popular book which may serve as a bridge from mathematical probability
to nancial mathematics is by Bjrk, [4]. Another book, giving an introduction both
to the mathematical theory and nancial mathematics, is by Hunt and Kennedy, [12].
Standard monographs on mathematical nance which could be considered as cor-
nerstones marking the state of the art at the time of their publication are Karatzas and
Shreve, [16], Musiela and Rutkowski, [17], and Bielecki and Rutkowski, [3]. The
present lecture should lay some foundations for reading books of that type.
iv PRELIMINARIES
Part I
Measure and Integration
1
Chapter 1
Measure and probability
1.1 Sigma-elds
Let be a (non-empty) set. We are interested in systems of subsets of which are
closed under set operations.
1.1 Example. In general, a system of subsets need not be closed under set operations.
Let = 1, 2, 3. Consider the system of subsets / = 1, 2, 3. This sys-
tem is not closed under union, intersection or complementation. E.g. the complement
of 1 is not in /.
It is clear that the power set is closed under any set operations. However, there are
smaller systems of sets which are closed under set operations, too.
Let = 1, 2, 3. Consider the system of subsets B = , , 1, 2, 3. It
is easy to see that this system is closed under union, intersection and complementa-
tion. Moreover, it follows that these set operations can be repeated in arbitrary order
resulting always in sets contained in /.
1.2 Denition. A (non-empty) system T of subsets of is called a -eld if it
closed under union, intersection and complementation als well as under building limits
of monotone sequences. The pair (, T) is called a measurable space.
There are some obvious necessary properties of a -eld.
1.3 Problem.
(1) Show that every -eld on contains and .
(2) What is the smallest possible -eld on ?
If we want to check whether a given system of sets is actually a -eld then it is
sufcient to verify only a minimal set of conditions. The following assertion states
such a minimal set of conditions.
1.4 Proposition. A (non-empty) system T of subsets of is a -eld iff it satises
the following conditions:
3
4 CHAPTER 1. MEASURE AND PROBABILITY
(1) T,
(2) A T A
T,
(3) If (A
i
)
i=1
T then
i=1
A
i
T.
1.5 Problem. Prove 1.4.
Let us discuss a number of examples.
When one starts to construct a -eld one usually starts with a family ( of sets
which in any case should be contained in the -eld. If this starting family ( does not
full all conditions of a -eld then a simple idea could be to add further sets until
the family fulls all required conditions. Actually, this procedure works if the starting
family ( is a nite system.
1.6 Denition. Let ( be any system of subsets on . The -eld generated by ( is
the smallest -eld T which contains (. It is denoted by (().
1.7 Problem. Assume that ( = A. Find (().
1.8 Problem. Assume that ( = A, B. Find (().
1.9 Problem. Show by giving an example that the union of two -eld need not be a
-eld.
If the system ( is any nite system then (() consists of all sets which can be
obtained by nitely many unions, intersections and complementations of sets in (.
Although the resulting system (() still is nite a systematic overview over all sets
could be rather complicated.
Things are much easier if the generating system is a nite partition of .
1.10 Proposition. Assume that ( is a nite partition of . Then (() consists of
and of all unions of sets in (.
1.12 Problem. Let be a nite set. Find the -eld which is generated by the
one-point sets.
It is a remarkable fact that every nite -eld is generated by a partition.
1.13 Problem. Show that every nite -eld T is generated by a partition of .
Hint: Call a nonempty set A T an atom if it contains no nonempty proper subset in
T. Show that the collection of atoms is a partition of and that every set in T is a
union of atoms.
1.1. SIGMA-FIELDS 5
Information sets
In probability theory a model of a random experiment consists of a pair (, T) where
is a non-empty set and T is a -eld on .
The set serves as sample space. It is interpreted as set of possible outcomes
of the experiment. Note, that it is not necessarily the case that single outcomes are
actually observable.
The -eld T is interpreted as the eld of observable events. Observability of a
set A means that after having performed the random experiment it can be decided
whether A has been realized or not. In this sense the -eld contains the information
which is obtained after having performed the random experiment. Therefore T is also
called the information set of the random experiment.
A simple random variable X is a simple function whose basic partition is observ-
able, i.e. (X = a) T for every value a of X. The information set of X is the -eld
which is generated by the basic partition of X. It is is denoted by (X).
1.14 Example. Consider the random experiment of throwing a coin n-times. Denote
the sides of the coin by 0 and 1. Then the sample space is = 0, 1
n
. Assume that
the outcomes of each throw are observable. If X
i
denotes the outcome of the i-th throw
then this means that (X
i
= 0) and (X
i
= 1) are observable.
1.15 Problem. Let = 0, 1
3
and dene S
k
:=
k
i=1
X
i
.
(1) Find (S
1
), (S
2
), (S
3
).
(2) Find (X
1
), (X
2
), (X
3
).
1.16 Problem. Let be the sample space of throwing a die twice. Denote the
outcomes of the throws X and Y , respectively. Find (X), (Y ), (X+Y ), (XY ).
Borel sigma-elds
Let us discuss -elds on R.
Clearly, the power set of R is a -eld. However, the power set is too large. Let
us be more modest and start with a system of simple sets and then try to extend the
system to a -eld.
The following example shows that such a procedure does not work if we start with
one-point sets.
1.17 Problem. Let T be the collection of all subsets of R which are countable or are
the complement of a countable set.
(1) Show that T is a -eld.
(2) Show that T is the smallest -eld which contains all one-point sets.
(3) Does T contain intervals ?
A reasonable -eld on R should at least contain all intervals.
1.18 Denition. The smallest -eld on R which contains all intervals is called the
Borel -eld. It is denoted by B and its elements are called Borel sets.
Unfortunately, there is no way of describing all sets in B in a simple manner. All
we can say is that any set which can be obtained from intervals by countably many set
operations is a Borel set. E.g., every set which is the countable union of intervals is
a Borel set. But there are even much more complicated sets in B. On the other hand,
however, there are subsets of R which are not in B.
The concept of Borel sets is easily extended to R
n
.
1.19 Denition. The -eld on R
n
which is generated by all rectangles
1 = I
1
I
2
I
n
: I
k
being any interval
is called the Borel -eld on R
n
and is denoted by B
n
.
All open and all closed sets in R
n
are Borel sets since open sets can be represented
as a countable union of rectangles and closed sets are the complements of open sets.
Random variables
Let (, T) be a model of a random experiment. What is a random variable ?
The idea of a random variable is that of a function X : R such that assertions
about X are observable events, i.e. are contained in T. But what are assertions on X ?
In the case of a simple function we considered assertions of the form (X = a).
But for functions taking an uncountable number of values we have to consider also
assertions of the form (X I) where I is an interval.
1.20 Denition. A random variable is a function X : R such that (X I) T
for every interval I.
1.21 Problem. Show that every function satisfying (X x) T for every x R is
a random variable.
Let us turn to the question for the information set of a general random variable.
Conceptually, the information set (X) is the -eld that is generated by all events
which can be observed through X.
Obviously, the system ( consisting of the sets (X I), I being an interval, is not
a -eld. However, using the the Borel -eld we can describe the information set of
a random variable X in a quasi-explicit way.
1.22 Theorem. The information set (X) is the system of sets (X B) where B is
an arbitrary Borel set. In particular, for a random variable X we have (X B) T
for all B B.
1.2. MEASURES 7
1.2 Measures
Measures are set functions. Let us consider some examples.
1.23 Example. Let by an arbitrary set and for any subset A dene
(A) = [A[ :=
_
k if A contains k elements,
if A contains innitely many elements.
This set function is called a counting measure. It is dened for all subsets of .
Obviously, it is additive, i.e.
A B = (A B) = (A) + (B).
Measures are set functions which intuitively should be related to the notion of
volume. Therefore measures should be nonnegative and additive. In order to apply
additivity they should be dened on systems of subsets which are closed under the
usual set operations. This leads to the requirement that measures should be dened on
-elds. Finally, if the underlying -eld contains innitely many sets there should be
some rule how to handle limits of innite sequences of sets.
Thus, we are ready for the denition of a measure.
1.24 Denition. Let be a non-empty set. A measure on is a set function which
satises the following conditions:
(1) is dened on a -eld T on .
(2) is nonnegative, i.e. (A) 0, A T, and () = 0.
(3) is -additive, i.e. for every pairwise disjoint sequence (A
i
)
i=1
T
_

_
i=1
A
i
_
=
i=1
(A
i
)
A measure is called nite if () < . A measure P is called a probability
measure if P() = 1. If [T is a measure then (, T, ) is a measure space. If P[T
is a probability measure then (, T, P) is called a probability space.
There are some obvious consequences of the preceding denition.
1.25 Problem.
Show that every measure is additive.
1.26 Problem. Let [T be a measure.
(1) A
1
A
2
implies (A
1
) (A
2
).
(2) Show the inclusion-exclusion law:
(A
1
) + (A
2
) = (A
1
A
2
) + (A
1
A
2
)
(3) The preceding problem gives a formula for (A
1
A
2
) provided that all sets have
nite measure. Extend this formula to the union of three sets.
The property of being -additive both guarantees additivity and implies easy rules
for handling innite sequences of sets.
1.27 Problem. Let [T be a measure.
(1) If A
i
A then (A
i
) (A).
(2) If A
i
A and (A
1
) < then (A
i
) (A).
1.28 Problem.
(1) Any nonnegative linear combination of measures is a measure.
(2) Every innite sum of measures is a measure.
1.29 Problem. Explain the construction of measures on a nite -eld.
Hint: Measures have to be dened for atoms only.
1.3 Measures on the real line
The most simple example of a measure is a point measure.
1.30 Denition. The set function dened by
a
(A) = 1
A
(a), A R,
is called the point measure at a R.
Take a moments reection on whether this denition actually satises the proper-
ties of a measure. Note that any point measure can be dened for all subsets of R, i.e.
it is dened on the largest possible -eld 2
R
.
Taking linear combinations of point measures gives a lot of further examples of
measures.
1.31 Problem.
(1) Let =
0
+ 2
1
+ 0.5
1
. Calculate (([0, 1)), ([1, 1)), ((1, 1]).
(2) Describe in words the values of =
k
j=1
a
j
.
(3) Let x R
n
be a list of data and let (I) be the percentage of data contained in I.
Show that is a measure by writing it as a linear combination of point measures.
Let = R and for every interval I R dene
(I) := length of I
E.g. ((a, b]) = b a. This set function is called the Lebesgue content of intervals.
At the moment it is dened only on the family of all intervals.
1.3. MEASURES ON THE REAL LINE 9
The Lebesgue content is also additive in the following sense: If I
1
and I
2
are two
intervals such that the union I
1
I
2
= I
3
is an interval, too, then
I
1
I
2
= (I
1
I
2
) = (I
1
) + (I
2
).
However, the family of intervals is not a -eld. In order to obtain a measure we have
to extend the Lebesgue content to a -eld which contains the intervals. The smallest
-eld with this property is the Borel--eld.
1.32 Theorem. (Measure extension theorem)
There exists a uniquely determined measure [B such that ((a, b]) = b a, a < b.
This measure is called the Lebesgue measure.
Knowing that [B is a measure we may calculate its values for simple Borel sets
which are not intervals.
1.33 Problem. Find the Lebesgue measure of Q.
Now, let us turn to the problem how to get an overview over all measures [B. We
restrict our interest to measures which give nite values to bounded intervals.
Let [B be a measure such that ((a, b]) < for a < b. Dene
(x) :=
_
((0, x]) if x > 0
((x, 0]) if x 0
and note that for any a < b we have
((a, b]) = (b) (a) =
_
_
_
((0, b]) ((0, a]) if 0 a < b,
((0, b]) + ((a, 0]) if a < 0 < b,
((b, 0]) +((a, 0]) if a < b 0
This means: For every such measure there is a function : R R which denes
the measure at least for all intervals. This function is called the measure-dening
function of .
Note, that our denition of the measure-dening function is such that (0) = 0.
However, any function which differs from by an additive constant, only, denes the
same measure.
1.34 Problem. Calculate the measure-dening function of the following measures:
(1) A point measure:
2
,
0
,
3
(2) A linear combination of point measures:
2
+ 2
0
+ 0.5
3
(3) The Lebesgue measure .
1.35 Problem. Let [B be nite on bounded intervals. Explain the fundamental
properties of the measure-dening function :
(1) is increasing.
(2) is right-continuous.
The following is an existence theorem which establishes a one-to-one relation be-
tween functions and measures.
1.36 Theorem. (Measure extension theorem)
For every function : R R satisfying properties (1) and (2) of 1.35 there exists a
uniquely determined measure
such that
((a, b]) = (b) (a).

If the measure-dening function is continuous and piecewise differentiable then
its derivative is called the density of the measure
(with respect to the Lebesgue

measure ). This name comes from
(x) = lim
h0
(x + h) (x h)
2h
= lim
h0
((x h, x + h])
((x h, x + h])
In such a situation we have
((a, b]) =
_
b
a
(x) dx
A measure [B is discrete if it is a nite or innite linear combination of point
measures. A counting measure is a discrete measure where all point measures with
positive weight have weight one.
1.37 Problem. Explain the characteristic properties of the measure-dening function
of a discrete measure and of a counting measure.
1.38 Problem. Let be the measure-dening function of
.
(1) Show that
(a) = (a).
(2) For which measures
is continuous ?
(3) For which measures
is a step-function ?
1.4 Probability distributions
A probability model consists of a sample space , a -eld T and a probability mea-
sure P[T. Such a triple (, T, P) is called a probability space.
For practical applications it is important to specify the particular probability mea-
sure under consideration. This can be done either if the -eld T has a simple struc-
ture, e.g. if it is nite (confer problem 1.29), or if the -eld is the information set of
a random variable X.
Let us consider the second case. Let X be a random variable. The information set
of X is the -eld (X) consisting of all events (X B) where B B.
1.4. PROBABILITY DISTRIBUTIONS 11
1.39 Denition. The set function
P
X
: B P(X B); B B,
is called the distribution of X (under P).
1.40 Problem. Show that P
X
is a probability measure on (R, B).
Since P
X
is a measure on (R, B) it can be represented by its measure dening
function . For probability measures it is, however, simpler to use the distribution
function
F(x) = P(X x) = P
X
((, x]) = P
X
((, 0]) +(x)
which differs from only by an additive constant. Thus we have
P(a < X b) = P
X
((a, b]) = F(b) F(a) = (b) (a).
1.41 Proposition. Let X be a random variable with distribution function F. Then
P
X
=
F
.
Many examples illustrating the relation between random variables and their distri-
bution function have been considered in the introductory course.
Chapter 2
Measurable functions and random
variables
2.1 The idea of measurability
Recall the concept of a random variable. This is a function X : (, T, P) Rdened
on a probability space such that the sets (X B) are in T for all Borel sets B B.
The notion of a random variable is a special case of the notion of a measurable
function.
2.1 Denition. A function f : (, T) R dened on a measurable space is called
measurable if the sets (f B) are in T for all Borel sets B B.
The notion of measurability is not restricted to real-valued functions.
Let (, /) and (Y, B) be measurable spaces. Moreover, let f : Y be a
function. Recall that (f B) is the inverse image of B under f, usually denoted by
f
1
(B).
2.2 Denition. A function f : (, /) (Y, B) is called (/, B)-measurable if
f
1
(B) / for all B B.
Let us agree upon some terminology.
(1) When we consider real-valued functions then we always use the Borel--eld in
the range of f. If f : (, T) (R, B) then we simply say that f is T-measurable if
we mean that it is (T, B)-measurable.
(2) When we consider functions f : R R then (B, B)-measurability is called Borel
measurability. The term Borel is thus concerned with the -eld in the domain of
f.
To get an idea what measurability means let us consider some simple examples.
2.3 Problem. Let (, T, ) be a measure space and let f = 1
A
where A . Show
that f is T-measurable iff A T.
13
14 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES
It follows that very complicated functions are Borel-measurable, e.g. f = 1
Q
.
2.4 Problem. Let , T, ) be a measure space and let f : R be a simple
function. Show that f is T-measurable iff all sets of the canonical representation are
in T.
2.2 The basic abstract assertions
There are two fundamental principles for dealing with measurability. The rst prin-
ciple says that measurability is a property which is preserved under composition of
functions.
2.5 Theorem. Let f : (, /) (Y, B) be (/, B)-measurable, and let g : (Y, B)
(Z, () be (B, ()-measurable. Then g f is (/, ()-measurable.
The second principle is concerned with checking measurability. For checking mea-
surability of f it is sufcient to consider the sets in a generating system of the -eld
in the range of f.
2.7 Theorem. Let f : (, /) (Y, B) and let ( be a generating system of B, i.e.
B = ((). Then f is (/, B)-measurable iff f
1
(C) / for all C (.
Proof: Let T := D Y : f
1
(D) /. It can be shown that T is a -eld. If
f
1
(C) / for all C ( then ( T. This implies (() T. 2
2.8 Problem. Fill in the details of the proof of 2.7.
2.3 The structure of real-valued measurable functions
Let (, T) be a measurable space. Let L(T) denote the set of all T-measurable real-
valued functions. We start with the most common and most simple criterion for check-
ing measurability of a real-valued function.
2.9 Problem. Show that a function f : R is T-measurable iff (f ) T for
every R.
Hint: Apply 2.7.
This provides us with a lot of examples of Borel-measurable functions.
2.10 Problem.
(a) Show that every monotone function f : R R is Borel-measurable.
2.3. THE STRUCTURE OF REAL-VALUED MEASURABLE FUNCTIONS 15
(b) Show that every continuous function f : R
n
R is B
n
-measurable.
Hint: Note that (f ) is a closed set.
(c) Let f : (, T) R be T-measurable. Show that f
+
, f
, [f[, and every

polynomial a
0
+ a
1
f + + a
n
f
n
are T-measurable.
The next exercise is a rst step towards the measurability of expressions involving
several measurable functions.
2.11 Problem. Let (f
1
, f
2
, . . . , f
n
) be measurable functions. Then
f = (f
1
, f
2
, . . . , f
n
) : R
n
is (T, B
n
)-measurable.
2.12 Corollary. Let f
1
, f
2
, . . . , f
n
be measurable functions. Then for every continuous
function : R
n
R the composition (f
1
, f
2
, . . . , f
n
) is measurable.
Proof: Apply 2.5. 2
2.13 Corollary. Let f
1
, f
2
be measurable functions. Then f
1
+ f
2
, f
1
f
2
, f
1
f
2
,
f
1
f
2
are measurable functions.
As a result we see that L(T) is a space of functions where we may perform any
algebraic operations without leaving the space. Thus it is a very convenient space for
formal manipulations. The next assertion shows that we may even perform all of those
operations involving a countable set (e.g. a sequence) of measurable functions !
2.15 Theorem. Let (f
n
)
nN
be a sequence of measurable functions. Then sup
n
f
n
,
inf
n
f
n
are measurable functions. Let A := ( lim
n
f
n
). Then A T and lim
n
f
n
1
A
is measurable.
Proof: Since
(sup
n
f
n
) =
n
(f
n
)
it follows from 2.9 that sup
n
f
n
and inf
n
f
n
= sup
n
(f
n
) are measurable. We have
A := ( lim
n
f
n
) =
_
sup
k
inf
nk
f
n
= inf
k
sup
nk
f
n
_
This implies A T. The last statement follows from
lim
n
f
n
= sup
k
inf
nk
f
n
on A.
2
16 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES
Note that the preceding corollaries are only very special examples of the power of
Theorem 2.5. Roughly speaking, any function which can be written as an expression
involving countable many operations with countable many measurable functions is
measurable. Therefore it is rather difcult to construct non-measurable functions.
Let us denote the set of all T-measurable simple functions by o(T). Clearly, all
limits of simple measurable functions are measurable. The remarkable fact being fun-
damental for almost everything in integration theory is the converse of this statement.
2.16 Theorem.
(a) Every measurable function f is the limit of some sequence of simple measur-
able functions.
(b) If f 0 then the approximating sequence can be chosen to be increasing.
Proof: The fundamental statement is (b).
Let f 0. For every n N dene
f
n
:=
_
(k 1)/2
n
whenever (k 1)/2
n
f < k/2
n
, k = 1, 2, . . . , n2
n
n whenever f n
Then f
n
f. If f is bounded then (f
n
) converges uniformly to f. Part (a) follows
from f = f
+
f
. 2
2.17 Problem. Draw a diagram illustrating the construction of the proof of 2.16.
2.18 Problem. Show: If f is bounded then the approximating sequence can be
chosen to be uniformly convergent.
Chapter 3
Integral and expectation
3.1 The integral of simple functions
Let (, T, ) be a measure space. We start with dening the -integral of a measurable
simple function.
3.1 Denition. Let f =
n
i=1
a
i
1
F
i
be a nonnegative simple T-measurable function
with its canonical representation. Then
_
f d :=
n
i=1
a
i
(F
i
)
is called the -integral of f.
We had to restrict the preceding denition to nonnegative functions since we admit
the case (F) = . If we were dealing with a nite measure the denition would
work for all T-measurable simple functions.
3.2 Example. Let (, T, P) be a probability space and let X =
n
i=1
a
i
1
F
i
be a
simple random variable. Then we have E(X) =
_
X dP.
3.3 Problem. What is the integral with respect to a linear combination of point
measures ? Which functions can be integrated ?
3.4 Problem. Give a geometric interpretation of the integral of a step function with
respect to a Borel measure.
3.5 Theorem. The -integral on o(T)
+
has the following properties:
(1)
_
1
F
d = (F),
(2)
_
(sf + tg) d = s
_
f d + t
_
g d if s, t R
+
and f, g o(T)
+
(3)
_
f d
_
g d if f g and f, g o(T)
+
Proof: The only nontrivial part is to prove that
_
(f +g)d =
_
fd +
_
gd. 2
17
18 CHAPTER 3. INTEGRAL AND EXPECTATION
3.6 Problem. Show that
_
(f + g)d =
_
fd +
_
gd for f, g o(T)
+
.
Hint: Try to nd the canonical representation of f +g in terms of the canonical repre-
sentations of f and g.
It follows that the dening formula of the -integral can be applied to any (non-
negative) linear combination of indicators, not only to canonical representations !
3.2 The extension process
We know that every nonnegative measurable function f L(T)
+
is the limit of an
increasing sequence (f
n
) o(T)
+
of measurable simple functions: f
n
f. It is a
natural idea to think of the integral of f as something like
_
f d := lim
n
_
f
n
d (1)
This is actually the way we will proceed. But there are some points to worry about.
First of all, we should ask whether the limit on the right hand side exists. This is
always the case. Indeed, the integrals
_
f
n
d form an increasing sequence in [0, ].
This sequence either has a nite limit or it increases to . Both cases are covered by
our denition.
The second and far more subtle question is whether the denition is compatible
with the denition of the integral on o(T). This is the only nontrivial part of the
extension process of the integral and it is the point where -additivity of comes in.
This is proved in Theorem 3.51.
The third question is whether the value of the limit is independent of the approxi-
mating sequence. This is is is also the case and proved in Theorem 3.52.
Thus, (1) is a valid denition of the integral of f L(T)
+
.
3.7 Denition. Let (, T, ) be a measure space. The -integral of a function
f L
+
(T) is dened by equation (1) where (f
n
) is any increasing sequence (f
n
)
o(T)
+
of measurable simple functions such that f
n
f.
It is now straightforward that the basic properties of the integral of simple functions
stated in Theorem 3.5 carry over to L(T)
+
.
3.8 Theorem. The -integral on L(T)
+
has the following properties:
(1)
_
1
F
d = (F),
(2)
_
(sf + tg) d = s
_
f d + t
_
g d if s, t R
+
and f, g L(T)
+
(3)
_
f d
_
g d if f g and f, g L(T)
+
The following problems establish some easy properties of the integral developed
so far.
3.2. THE EXTENSION PROCESS 19
3.9 Problem. Let f L(T)
+
. Prove Markoffs inequality
(f > a)
1
a
_
f d, a > 0.
+
. Show that
_
f d = 0 implies (f ,= 0) = 0.
Hint: Show that (f > 1/n) = 0 for every n N.
An assertion Aabout a measurable function f is said to hold -almost everywhere
(-a.e.) if (A
c
) = 0. Using this terminology the assertion of the preceding exercise
can be phrased as:
_
f d = 0, f 0 f = 0 -a.e.
If we are talking about probability measures and random variables the phrase almost
everwhere is sometimes replaced by almost sure.
+
. Show that
_
f d < implies (f > a) < for
every a > 0.
Now the integral is dened for every nonnegative measurable function. The value
of the integral may be . In order to dene the integral for measurable functions
which may take both positive and negative values we have to exclude innite integrals.
3.12 Denition. A measurable function f is -integrable if
_
f
+
d < and
_
f
d < . If f is -integrable then

_
f d :=
_
f
+
d
_
f
d
The set of all -integrable functions is denoted by L
1
() = L
1
(, T, ).
Proving the basic properties of the integral of integrable functions is an easy matter.
We collect these fact in a couple of problems.
3.13 Problem. Show that f L(T) is -integrable iff
_
[f[ d < .
3.14 Problem. The set L
1
() is a linear space and the -integral is a linear functional
on L
1
().
3.15 Problem. The -integral is an isotonic functional on L
1
().
3.16 Problem. Let f L
1
(). Show that [
_
f d[
_
[f[ d.
3.17 Problem. Let f be a measurable function and assume that there is an integrable
function g such that [f[ g (say: f is dominated). Then f is integrable.
3.18 Problem.
(a) Discuss the question whether bounded measurable functions are integrable.
(b) Characterize those measurable simple functions which are integrable.
Many assertions in measure theory concerning measurable functions are stable un-
der linear combinations and under convergence. Assertions of such a type need only
be proved for indicators. The procedure of proving (understanding) an assertion for in-
dicators and extending it to nonnegative and to integrable functions is called measure
theoretic induction.
3.19 Problem. Show that integrals are linear with respect to the integrating measure.
Let us nish this section by some notational remarks.
For convenience we denote
_
A
f d :=
_
1
A
f d, A T.
3.20 Problem.
(a) Let f be an integrable function. Then
_
A
f d = 0 for all A T implies f = 0
-a.e.
(b) Let f and g be integrable functions. Then
_
A
f d =
_
A
g d for all A T
implies f = g -a.e.
If (, T, P) is a probability space and X 0 is a random variable then
E(X) :=
_
X dP
is called the expectation of X. Thus, expectations are integrals of random variables
w.r.t. the underlying probability measures.
3.21 Problem. Let X be a P-integrable random variable. Prove Cebysevs inequality.
If we are dealing with Borel measure spaces (R, B,
) where the measure is de-

ned by some increasing right-continuous function . Then we write
_
f d
:=
_
f d =
_
f(x) d(x)
A special case is the Lebesgue integral
_
f d =
_
f(x) dx. Moreover, integral limits
are dened by
_
b
a
f d :=
_
(a,b]
f d.
Note, that the lower integral limit is not included, but the upper limit is included !
3.22 Problem. What is the difference between
_
b
a
f d and
_
(a,b)
f d ?
3.3. CONVERGENCE OF INTEGRALS 21
3.3 Convergence of integrals
One of the reasons for the great success of abstract integration theory are the conver-
gence theorems for integrals. The problem is the following. Assume that (f
n
) is a
sequence of functions converging to some function f. When can we conclude that
lim
n
_
f
n
d =
_
f d ?
There are (at least) three basic assertions of this kind which could be viewed as the
three basic principles of integral convergence. We will present these principles together
with typical applications.
The theorem of monotone convergence
The rst principle says that for increasing sequences of nonnegative functions the limit
and the integral may be interchanged.
3.23 Theorem. (Theorem of Beppo Levi)
Let (f
n
) L(T)
+
. Then f
n
f lim
n
_
f
n
d =
_
f d
The theorem is proved in section 3.5. Note that there is no assumption on integra-
bility. If the sequence is decreasing instead of increasing the corresponding assertion
is only valid if the sequence is integrable.
3.24 Problem.
(a) Let (f
n
) L
1
(T)
+
. Then f
n
f lim
n
_
f
n
d =
_
f d
(b) Show by example that the integrability assumption cannot be omitted without
compensation.
The rst application looks harmless.
3.25 Problem.
(a) Let f be a measurable function such that f = 0 -a.e.. Then f is integrable and
_
f d = 0.
Hint: Consider f
+
and f
separately.
(b) Let f and g be measurable functions such that f = g -a.e.. Then f is integrable
iff g is integrable.
Our next application is the starting point of a couple of problems which are con-
cerned with advanced calculus. They serve as a warming up for stochastic calculus
which will be the subject of part III of this text.
Let : [a, b] R be increasing and right-continuous. For any bounded measur-
able function f : [a, b] R let
(f )(t) :=
_
t
a
f d
3.26 Problem.
(a) Show that f g is right-continuous with left limits.
(b) Show that (f )(t) = f(t)(t).
The innite series theorem
The second principle says that for nonnegative measurable function integrals and in-
nite sums may be interchanged. It is an easy consequence of the monotone conver-
gence theorem (see section 3.5).
3.27 Theorem. For every sequence (f
n
) of nonnegative measurable functions we
have
_
_

n=1
f
n
_
d =
n=1
_
f
n
d
3.28 Problem. Let (, T, ) be a measure space and f 0 a measurable function.
Show that : A
_
A
f d is a measure.
3.29 Problem. Let (a
mn
) a double sequence of nonnegative numbers. Show that
n
a
mn
=
m
a
mn
.
Hint: Dene f
n
(x) := a
mn
if x (m1, m].
3.30 Problem.
(a) Let = N and T = 2
N
. Show that for every sequence a
n
0 there is a
uniquely determined measure [T such that (n) = a
n
.
(b) Find
_
f d for f 0.
3.31 Problem. Let f 0 be a measurable function and right-continuous and
increasing. If f = 0 except at countably many points (x
i
)
iN
then
_
f d =
i=1
f(x
i
)(x
i
).
3.32 Problem. Let f and be increasing right-continuous functions. Show that
_
b
a
f(t) d(t) =
a<st
f(s)(s)
3.3. CONVERGENCE OF INTEGRALS 23
The dominated convergence theorem
The most popular result concerning this issue is Lebesgues theorem on dominated
convergence. Find the proof in section 3.5.
3.33 Theorem. (Dominated convergence theorem)
Let (f
n
) be a sequence of measurable function which is dominated by an inte-
grable function g, i.e. [f
n
[ g, n N. If f
n
f -a.e. then f L
1
() and
lim
n
_
f
n
d =
_
f d.
3.34 Problem. Show that under the assumptions of the dominated convergence
theorem we even have
lim
n
_
[f
n
f[ d = 0
(This type of convergence is called mean convergence.)
3.35 Problem. Discuss the question whether a uniformly bounded sequence of
measurable functions is dominated in the sense of the dominated convergence theorem.
There is a plenty of applications of the dominated convergence theorem. Let us
present those consequences which show the superiority of general measure theory
compared with previous approaches to integration.
Recall the notion of a Riemannian sequence of subdivisions of an interval [a, b].
3.36 Problem. Let f : [a, b] R be a regulated function and let be increasing and
right-continuous. Show that for every Riemannian sequence of subdivisions of [a, b]
(a) lim
n
k
n
i=1
f(t
i1
)((t
i
) (t
i1
)) =
_
b
a
f
d
(b) lim
n
k
n
i=1
f(t
i
)((t
i
) (t
i1
)) =
_
b
a
f
+
d
The preceding convergence statements for Riemannian sums are the key for impor-
tant mathematical theorems.
3.37 Problem. Let f and g be increasing right continuous functions. Show the
following versions of the integration by parts formula (a < b):
(a) f(b)g(b) = f(a)g(a) +
_
b
a
f
dg +
_
b
a
g df
(b) f(b)g(b) = f(a)g(a) +
_
b
a
f
dg +
_
b
a
g
df +
a<sb
f(s)g(s)
3.38 Problem. Let f be increasing and right continuous. Show that for every
Riemannian sequence of subdivisions of [a, b]
lim
n
k
n
i=1
(f(t
i
) f(t
i1
))
2
=
a<sb
(f(s))
2
(Remark: The limit is called the quadratic variation of f on [a, b]. The assertion
shows that it is zero if f is continuous.)
3.4 Stieltjes integration
Let g
1
and g
2
be right-continuous functions. If these functions are increasing then they
dene measures
g
1
and
g
2
and we have
g
1
+g
2
=
g
1
+
g
2
as well as
_
b
a
f d(g
1
+ g
2
) =
_
b
a
f dg
1
+
_
b
a
f dg
2
for bounded measurable functions f : [a, b] R.
It seems to be a natural goal to extend the notion of the integral to function g
which are not increasing. However this is not possible with arbitrary right continuous-
functions g. The family of functions where an extension is easily available is the family
of functions of bounded variation. The reason is that these are exactly the functions
which can be written as linear combination (actually as a difference) of increasing
functions.
3.39 Denition. Let g : [a, b] R be a right-continuous function of bounded
variation. Then the Stieltjes integral of a bounded measurable function f : [a, b] R
is
_
b
a
f dg :=
_
b
a
f dg
1
_
b
a
f dg
2
where g = g
1
g
2
is such that g
1
and g
2
are right-continuous increasing functions.
3.40 Problem. Show that the denition of the Stieltjes integral is independent of the
decomposition g = g
1
g
2
.
In the following we denote by the letter g : [a, b] Rright-continuous functions of
bounded variation. The letter f : [a, b] R denotes a bounded measurable function.
We dene
f g : t
_
t
a
f dg
From problem 3.24 we know that f g is a right-continuous function.
3.4. STIELTJES INTEGRATION 25
3.41 Problem. Explain why f g is of bounded variation.
The Stieltjes integral has many properties which can be used for calculation pur-
poses. Moreover, the Stieltjes integral is a special case of the general stochastic integral
which is an indispensable tool in the theory of stochastic processes and their applica-
tions.
If h : [a, b] R is bounded measurable then the integral
_
b
a
f d(h g) is well-
dened. How can we express this integral in terms of an integral with respect to g ?
3.42 Theorem. (Associativity rule)
Let f and h be bounded measurable functions. Then
_
b
a
f d(h g) =
_
b
a
fhdg
Proof: The assertion is obvious for f = 1
(a,t]
, 0 t b. The general case
follows by measure theoretic induction. 2
Since for rules of this kind the function f is only a dummy function it is convenient
to state the rule in a more compact way as
d(h g) = hdg
which is called differential notation. It should be kept in mind that such formulas
have always to be interpreted as assertions about integrals.
3.43 Problem. Let g be differentiable with continuous derivative g
. Show that
dg = g
dt.
If g has jumps then we have (h g)(t) = h(t)g(t). This follows by the same
argument as for problem 3.24. Hence, h g is continuous whenever g is continuous.
Note, that the assertions on the convergence of Riemannian sums, shown in prob-
lem 3.36, are also true for right-continuous functions g of bounded variation. Hence,
we have the integation by parts formula.
3.44 Theorem. (Integration by parts)
Let both f and g be right-continuous and of bounded variation. Then
f(b)g(b) = f(a)g(a) +
_
b
a
f
dg +
_
b
a
g
df +
a<sb
f(s)g(s)
3.45 Problem. State the integration by parts formula for continuous and for con-
tinuously differentiable functions. How to write the integration by parts formula in
differential form ?
3.46 Problem. What is the quadratic variation of a function of bounded variation ?
Distinguish between the continuous and the non-continuous case.
There is a third calculation rule for Stieltjes integrals called the transformation
formula. Since this rule is the classical prototype of the famous Ito formula of stochas-
tic integration let us try to explain it very carefully.
3.47 Theorem. (Transformation formula)
Let : R R be a continuous function with a continuous derivative. Let f : [a, b]
R be a function of bounded variation.
(1) If f is continuous then
f(b) = f(a) +
_
b
a
f df
(2) If f is right-continuous then
f(b) = f(a) +
_
b
a
df +
0<st
_
f f
f
_
(s)
The rst part of the assertion can easily be understood. For this let us consider the
case of a differentiable function f. Then we know that
f(b) = f(a) +
_
b
a
f(s) f
(s) ds
which by df(s) = f
(s) ds gives part (1) of the theorem. Therefore, part (1) is the
extension of the fundamental theorem of analysis to continuous functions f which are
of bounded variation.
The second part of the assertion can be understood intuitively, too. It simply sub-
tracts the contributions of the integral at the jumps of f and replaces them by the jump
heights of f.
3.48 Problem. Assume that f has nitely many jumps on [a, b]. Show that part (2)
of Theorem 3.47 follows from part (1).
Hint: Apply part (1) to each interval where f is continuous.
Theorem 3.47 can be proved as follows. The following two problems imply that
the theorem holds if (x) is any polynomial. Since a continuously differentiable func-
tion can be approximated by polynomials in such a way that both the function and its
derivative is approximated uniformly on compact intervals, the assertion follows.
3.49 Problem. Use the integration by parts-formula to show that Theorem 3.47 is
true for (x) = x
2
.
3.50 Problem. Assume that Theorem 3.47 holds for the function (x). Use the
integration by parts-formula to show that Theorem 3.47 is true also for (x) = (x)x.
3.5. PROOFS OF THE MAIN THEOREMS 27
3.5 Proofs of the main theorems
3.51 Theorem. Let f o(T)
+
and (f
n
) o(T)
+
. Then
f
n
f lim
n
_
f
n
d =
_
f d
Proof: Note that is clear. For an arbitrary > 0 let B
n
:= (f f
n
(1 + )). It
is clear that
_
1
B
n
f d
_
1
B
n
f
n
(1 +) d
_
f
n
d (1 +)
From B
n
it follows that A B
n
A and (A B
n
) (A) by -additivity. We
get
_
f d =
n
j=1
j
(A
j
) = lim
n
n
j=1
j
(A
j
B
n
) = lim
n
_
1
B
n
f d
which implies
_
f d lim
n
_
f
n
d (1 + )
Since is arbitrarily small the assertion follows. 2
3.52 Theorem. Let (f
n
) and g
n
) be increasing sequences of nonnegative measurable
simple functions. Then
lim
n
f
n
= lim
n
g
n
lim
n
_
f
n
d = lim
n
_
g
n
d.
Proof: It is sufcient to prove the assertion with replacing =. Since
lim
k
f
n
g
k
= f
n
lim
k
g
k
= f
n
we obtain by 3.51
_
f
n
d = lim
k
_
f
n
g
k
d lim
k
_
g
k
d
2
3.53 Theorem. (Theorem of Beppo Levi)
Let f L(T)
+
and (f
n
) L(T)
+
. Then
f
n
f lim
n
_
f
n
d =
_
f d
Proof: We have to show .
For every n N let (f
nk
)
kN
be an increasing sequence in o(T)
+
such that
lim
k
f
nk
= f
n
. Dene
g
k
:= f
1k
f
2k
. . . f
kk
Then
f
nk
g
k
f
k
f whenever n k.
It follows that g
k
f and
_
f d = lim
k
_
g
k
d lim
k
_
f
k
d
2
3.54 Problem. Prove Fatous lemma: For every sequence (f
n
) of nonnegative mea-
surable functions
liminf
n
_
f
n
d
_
liminf
n
f
n
d
Hint: Recall that liminf
n
x
n
= lim
k
inf
nk
x
n
. Consider g
k
:= inf
nk
f
n
and apply
Levis theorem to (g
k
).
3.55 Theorem. (Dominated convergence theorem)
Let (f
n
) be a sequence of measurable function which is dominated by an integrable
function g, i.e. [f
n
[ g, n N. Then
f
n
f -a.e. f L
1
() and lim
n
_
f
n
d =
_
f d
Now it is easy to prove several important facts concerning the integral. We state
these a problems.
Proof: Integrability of f is obvious since f is dominated by g, too. Moreover, the
sequences g f
n
and g + f
n
consist of nonnegative measurable functions. Therefore
we may apply Fatous lemma:
_
(g f) d liminf
_
(g f
n
) d =
_
g d limsup
n
_
f
n
d
and
_
(g + f) d liminf
_
(g + f
n
) d =
_
g d + liminf
n
_
f
n
d
This implies
_
f d liminf
n
_
f
n
d limsup
n
_
f
n
d
_
f d
2
Chapter 4
Selected topics
4.1 Image measures and distributions
Let (, /, ) be a measure space and let (Y, B) be a measurable space. Moreover,
let f : Y be a function. We are going to consider the problem of mapping the
measure to the set Y be means of the function f.
The concept of the distribution of a random variable is an important special case of
mapping a measure from one set to another (confer denition 1.39).
4.1 Denition. Let f : (, /, ) (Y, B) be (/, B)-measurable. Then
f
(B) := (f B) = (f
1
(B)), B B.
is the image of under f or the distribution of f under .
4.2 Problem. Show that
f
is indeed a measure on B.
4.3 Problem. Let , T, ) be a measure space and let f = 1
A
where A . Find
f
.
4.4 Problem. Let , T, ) be a measure space and let f : R be a simple
function. Find
f
.
4.5 Problem. Let (, T, P) be a probability space and let X be a random variable
with distribution function F. Show that P
X
=
F
.
An important point is how integrals behave under measure mappings.
4.6 Theorem. (Transformation formula)
Let (, T, ) be a measure space and let g L(T). Then for every f L
+
(B)
_
f g d =
_
f d
g
29
30 CHAPTER 4. SELECTED TOPICS
4.7 Problem. Prove 4.6 by measure theoretic induction.
4.8 Problem. Let (, T, ) be a measure space and let g L(T). Show that f g is
-integrable iff f is
g
-integrable. In case of integrability the transformation formula
holds.
4.9 Problem. Let (, T, P) be a probability space and X a random variable with
distribution function F. Explain the formula
E(f X) =
_
f d
F
4.2 Measures with densities
Let (, T, ) be a measure space and let f L
+
(T).
4.10 Problem. Show that : A
_
A
f d, A T is a measure.
We would like to say that f is the density of with respect to but for doing so
we have to be sure that f is uniquely determined by . But this is not true, in general.
4.11 Problem. Show that the density is uniquely determined if the measure is
nite.
4.12 Example. Let [B be a measure such that all countable sets B B have measure
zero and all uncountable sets have measure (B) = . A moments reection shows
that this is actually a measure. Now for every positive constant function f c > 0 we
have
_
B
f d = (B), B B.
In the light of the preceding example we see that we have to exclude unreasonable
measures in order to obtain uniqueness of densities. The following lemma shows the
direction we have to go.
4.13 Lemma. Let f, g L
+
(T). Then
_
A
f d =
_
A
g d A T ((f ,= g) A) = 0 (A) < .
In other words: f = g -a.e. on every set of nite -measure.
4.2. MEASURES WITH DENSITIES 31
Proof: Let (M) < and dene M
n
:= M (f n) g n). Since f1
M
n
and g1
M
n
are -integrable it follows that f1
M
n
= g1
M
n
-a.e. For n we have
M
n
M which implies f1
M
= g1
M
-a.e. 2
Since densities are uniquely determined on sets of nite measure we have unique-
ness of densities for nite measures and also for measures which can be decomposed
into nite measures.
4.14 Denition. A measure [T is called -nite if there is a sequence of sets
(F
n
)
nN
T such that (F
n
) < for all n N.
Note that Borel measures
are -nite. For -nite measures densitites are

uniquely determined.
4.15 Lemma. If is nite or -nite then
_
A
f d =
_
A
g d A T f = g -a.e.
4.16 Denition. Let be -nite and dene a measure = f by
: A
_
A
f d, A T.
Then f =:
d
d
is called the density or the Radon-Nikodym derivative of with
respect to .
A density w.r.t the Lebesgue measure is called a Lebesgue density.
4.17 Problem. Let : R R be an increasing function which is supposed to be
differentiable on R. Show that
.
4.18 Problem. Let (, T, P) be a measure space and X a random variable with
differentiable distribution function F. Explain the formulas
P(X B) =
_
B
F
(t) dt and E(g X) =

_
g(t)F
(t) dt
Which measures have densities w.r.t. other measures ?
4.19 Problem. Let = f. Show that (A) = 0 implies (A) = 0, A T.
4.20 Denition. Let [T and [T be measures. The measure is said to be abso-
lutely continuous w.r.t the measure [T ( ) if
(A) = 0 (A) = 0, A T.
We saw that absolute continuity is necessary for having a density. It is even suf-
cient.
4.21 Theorem. (Radon-Nikodym theorem)
Assume that is -nite. Then iff = f for some f L
+
(T).
Proof: See Bauer, [2]. 2
4.22 Problem. Let P and Q be probability measures of a nite eld T.
(1) State Q P in terms of the generating partition of T.
(2) If Q P nd dQ/dP.
An improtant question is how -integrals can be transformed into -integrals.
4.23 Problem. Let = f. Discuss the validity of
_
f d =
_
f
d
d
d
Hint: Prove it for f o
+
(T) and extend it by measure theoretic induction.
The following prepares for chapter 15.
4.24 Denition. The probability measures P[T and Q[T are said to be equivalent if
they are mutually absolutley continuous (P Q), i.e.
P(F) = 0 Q(F) = 0 whenever F T
Obviously, we have P Q iff Q P and P Q. Therefore there exist the
Radon-Nikodym derivatives
dQ
dP
and
dP
dQ
. The following two problems are contain
general properties of Radon-Nkodym derivatives.
4.25 Problem. Let P Q. Show that
dP
dQ
= 1
_
dQ
dP
.
Hint: Show that for all F T
T
_
F
_
dP
dQ

dQ
dP
1
_
dP = 0
4.3. PRODUCT MEASURES AND FUBINIS THEOREM 33
4.26 Problem. Let Q P. Show that P Q iff
dQ
dP
> 0 P-a.s.
Hint: For proving , show that Q(F) = 0 implies 1
F
dQ
dP
= 0 P-a.s.
4.27 Problem. Let Q P and (A
n
) T. Then P(A
n
) 0 implies Q(A
n
) 0.
Hint: Let > 0 and choose M such that
_
dQ
dP
>M
dQ
dP
dP < .
(Why is this possible ?) Let B =
_
dQ
dP
> M
_
and split A
n
= (A
n
B) (A
n
B
c
).
4.3 Product measures and Fubinis theorem
Let (
1
, T) and (
2
, () be measurable spaces. We want to discuss measure and inte-
gration on
1
2
.
To begin with we have to dene a -eld on
1
2
. This -eld should be large
enough to contain at least the rectangles (diagram) F G where F T and G (.
4.28 Denition. The -eld on
1

2
which is generated by the family of mea-
surable rectangles
1 = F G : F T, G (
is called the product of T and ( and is denoted by T (.
A special case of a product -eld is the Borel -eld B
2
.
Having established a -eld we turn to measurable functions. Recall that any con-
tinuous function f : R
2
R is B
2
-measurable.
4.29 Problem.
(1) Let f :
1
R be T-measurable. Show that (x, y) f(x) is T (-
measurable.
(2) Let f :
1
R be T-measurable, g :
2
R be (-measurable, and let
: R
2
R be continuous. Show that (x, y) (f(x), g(y)) is T (-measurable.
The preceding problem shows that functions of several variables which are set up
as compositions of measurable functions of one variable are usually measurable with
respect to the product -eld (confer corollaries 2.12 and 2.13).
The next point is to talk about measures. Basically, there are measures on product
spaces having a very complicated structure. But there is a special class of measures on
product spaces which are constructed from measures on the components in a simple
way.
The starting idea is the geometric content of rectangles in R
2
. If I
1
and I
2
are
intervals then the geometric content (area) of the rectangle I
1
I
2
is the product of the
contents (lengths) of the constituting intervals. The extension of this idea to general
measures leads to product measures.
4.30 Theorem. Let (
1
, T, ) and (
2
, (, ) be measure spaces. Then there exists a
uniquely determined measure [T ( satisfying
( )(F G) = (F)(G), F G 1.
The measure is called the product measure of and .
Proof: See Bauer, [2]. 2
As a consequence it follows that there is a uniquely determined measure on (R
2
, B
2
)
which measures rectangles by their geometric area. In terms of product measure this
is =
2
, and is called the Lebesgue measure on R
2
.
Let us turn to integration. Integration for general measures on product spaces can
be a rather delicate matter. Things are much simpler when we are dealing with product
measures. The main point is that multiple integration (i.e. integration w.r.t. product
measures) can be reduced to iterated integration (i.e. evaluating integrals over single
components).
Let us proceed step by step.
The most simple case is the integration of the indicator of a rectangle. Let F G
1. Then we have
_
1
FG
d( ) = ( )(F G) = (F)(G) =
_
1
F
d
_
1
G
d
In general, a set A T ( need not be a rectangle. How, can we extend the formula
above to general sets ? The answer is the section theorem (Cavalieris principle).
For any set A
1
2
we call
A
y
:= x
1
: (x, y) A, y
2
,
the y-section of A (diagram !). Similarly the x-section A
x
, x
1
, is dened. Note,
that for rectangles the sections are particularly simple.
4.31 Problem. Find the sections of a rectangle.
The section theorem says that the volume of a set is the sum of the volumes of its
sections.
4.32 Theorem. Let A T (. Then all sections of A are measurable, i.e. A
y
T,
y
2
, and y (A
y
) is a (-measurable function. Moreover, we have
( )(A) =
_
(A
y
)(dy)
4.3. PRODUCT MEASURES AND FUBINIS THEOREM 35
Proof: The measurability parts of the section theorem are a matter of measure
theoretic routine arguments. Much more interesting is the integral formula.
In order to understand the integral formula we write it as an iterated integral:
( )(A) =
_
_
_
1
A
(x, y) (dx)
_
(dy)
It is easy to see that the inner integral evaluates to (A
y
). Why is this formula valid ?
First of all, it is valid for rectangles A = F G 1. This follows immediately from
the denition of the product measure. Moreover, both sides of the equation dene
measures on the -eld T (. Since these two measures are equal on rectangles they
necessarily are equal on the generated -eld. 2
Let us illustrate how the section theorem works.
4.33 Problem. Find the area of the unit circle by means of the section theorem.
Outline: Let A be the unit circle with center at the origin. Then we have
2
(A) =
_
(A
y
) dy = 2
_
1
1
_
1 y
2
dy
Substitute y = sin t and apply (sin t cos t)
= 2 cos
2
t 1.
4.34 Problem. Find the area of a right angled triangle by means of the section
theorem.
Our last topic in this section is to extend the section theorem to integrals. The
resulting general assertion is Fubinis theorem.
4.35 Theorem. (Fubinis theorem) Let f :
1

2
R be a nonnegative T (-
measurable function. Then
x f(x, y) and y
_
f(x, y) (dx)
are measurable functions and
_
f d( ) =
_
_
_
f(x, y) (dx)
_
(dy)
Proof: Fubinis theorem follows from the section theorem in a straightforward
way by measure theoretic induction. 2
4.36 Problem. Find a version of Fubinis theorem for integrable functions.
4.37 Problem. Explain when it is possible to interchange the order of integration for
an iterated integral.
4.38 Problem. Deduce from Fubinis theorem assertions for interchanging the order
of summation for double series of numbers.
4.4 Spaces of integrable functions
We know that the space L
1
= L
1
(, T, ) is a vector space. We would like to dene
a norm on L
1
.
A natural idea is to dene
[[f[[
1
:=
_
[f[ d, f L
1
.
It is easy to see that this denition has the following properties:
(1) [[f[[
1
0, f = 0 [[f[[
1
= 0,
(2) [[f + g[[
1
[[f[[
1
+[[g[[
1
, f, g L
1
,
(3) [[f[[
1
[[ [[f[[
1
, R, f L
1
.
However, we have
[[f[[
1
= 0 f = 0 -a.e.
A function with zero norm need not be identically zero ! Therefore, [[.[[
1
is not a norm
on L
1
but only a pseudo-norm.
In order to get a normed space one has to change the space L
1
in such a way that all
functions f = g -a.e. are considered as equal. Then f = 0 -a.e. can be considered
as the null element of the vector space. The space of integrable functions modied in
this way is denoted by L
1
= L
1
(, T, ).
4.39 Discussion. For those readers who like to have hard facts instead of soft wellness
we provide some details.
For any f L(T) let
f = g L(T) : f = g -a.e.
denote the equivalence class of f. Then integrability is a class-property and the space
L
1
:=
f : f L
1
is a vector space. The value of the integral depends only on the class and therefore it
denes a linear function on L
1
having the usual properties. In particular, [[
f[[
1
:= [[f[[
1
denes a norm on L
1
.
It is common practice to work with L
1
instead of L
1
but to write f instead of

f.
This is a typical example of what mathematicians call abuse of language.
4.40 Theorem. The space L
1
(, T, ) is a Banach space.
Proof: Let (f
n
) be a Cauchy sequence in L
1
, i.e.
> 0 N() such that
_
[f
n
f
m
[ d < whenever n, m N().
Let n
i
:= N(1/2
i
). Then
_
[f
n
i+1
f
n
i
[ d <
1
2
i
4.4. SPACES OF INTEGRABLE FUNCTIONS 37
It follows that for all k N
_
_
[f
n
1
[ +[f
n
2
f
n
1
[ + +[f
n
k+1
f
n
k
[
_
d C <
Hence the corresponding innite series converges which implies that
[f
n
1
[ +
i=1
[f
n
i+1
f
n
i
[ < -a.e.
Since absolute convergence of series in R implies convergence (here completenes of
R goes in) the partial sums
f
n
1
+ (f
n
2
f
n
1
) + + (f
n
k
f
n
k1
) = f
n
k
converge to some limit f -a.e. Mean convergence of (f
n
) follows fromFatous lemma
by
_
[f
n
f[ d =
_
lim
k
[f
n
f
n
k
[ d
liminf
k
_
[f
n
f
n
k
[ d < whenever n N().
In other words we [[f
n
f[[
1
0. 2
Our next result establishes a dense subset of L
1
. First of all, it says that simple
functions are dense in L
1
. But even more is true. We can can even restrict the class
of sets where the canonical decomposition of the simple functions comes from. It is
sufcient to consider simple functions made up of indicators of sets in a system 1
which generates T and is a eld. This means that it is is closed under unions, inter-
sections and complementations. E.g. if we are dealing with (R, B) we may consider
stepfunctions, i.e. simple functions based on intervals. The reason is that nite unions
of intervals form a eld which generates B.
4.41 Theorem. Let 1 be a eld which generates T. Then the set of 1-measurable
simple functions is dense in L
1
(, T, ).
Proof: The assertion is proved in two parts. Let > 0. First we note that for
every f L
1
(, T, P) there exists an T-measurable simple function g such that [[f
g[[
1
< . This can easily be shown for the positive and the negative parts separately.
Second we have show that for every T-measurable simple function g there exists an 1-
measurable simple function h such that [[g h[[
1
< . This follows from the measure
extension theorem. we do not go into details but refer to Bauer, [2]. 2
Let
L
2
= L
2
(, T, ) = f L(T) :
_
f
2
d <
This is another important space of integrable functions.
4.42 Problem.
(a) Show that L
2
is a vector space.
(b) Show that
_
f
2
d < is a property of the -equivalence class of f L(T).
By L
2
= L
2
(, T, ) we again denote the corresponding space of equivalence
classes. On this space there is an inner product
< f, g >:=
_
fg d, f, g L
2
.
The corresponding norm is
[[f[[
2
=< f, f >=
_
_
f
2
d
_
1/2
The following facts can be proved in a way similar the the L
1
-case.
4.43 Theorem. The space L
2
(, T, ) is a Hilbert space.
4.44 Theorem. Let 1 be a eld which generates T. Then the set of 1-measurable
simple functions is dense in L
2
(, T, P).
4.5 Fourier transforms
In order to represent and treat measures and probability measures in a mathematically
convenient way measure transforms play a predominant role. The most simple measure
transform is the moment generating function.
4.45 Denition. Let [B be a nite measure. Then the function
m(t) =
_
e
tx
(dx), t R,
is called the Laplace transform or moment generating function of .
The moment generating function shares important useful properties with other
measure transforms but it has a serious drawback. The exponential function x e
tx
is
unbounded and therefore may be not integrable for some values of t and measures .
The application of moment generating functions is only possible in such cases where
the exponential function is integrable at least for all values of t in an interval of positive
length.
4.5. FOURIER TRANSFORMS 39
This kind of complication vanishes if we replace the real-valued exponential func-
tion x e
tx
by its complex version x e
itx
. The corresponding measure transform
is called the Fourier transform.
4.46 Discussion. Let us recall some basic facts on complex numbers.
The complex number eld
C = z = u + iv : u, v R
is an extension of the real numbers R in such a way that a number i (the imaginary
number) is introduced which satises i
2
= i i = 1. All other rules of calculation
carry over from R to C.
Complex numbers are not ordered but have an absolute value, dened by [z[ =
u
2
+ v
2
if z = u + iv. For every complex number z C there is a conjugate number
z := u iv. The operation of conjugation satises z
1
z
2
= z
1
z
2
. Moreover, we have
z z = [z[
2
.
Several functions dened on R can be extended to C. For our purposes only the
exponential function is of importance. It is dened by
e
u+iv
:= e
u
(cos(v) + i sin(v)), u, v R.
This denition satises e
z
1
+z
2
= e
z
1
e
z
2
, z
1
, z
2
C. For the notion of the Fourier
transform it is important to note that [e
iv
[ = 1, v R. This is a consequence of
familiar properties of trigonometric functions.
Differentiation and integration of complex-valued functions of a real variable is
easily dened to be performed for the real and the imaginary parts separately. Be sure
to note that we are not dealing with function of a complex variable ! This would be a
much more advanced topic called complex analysis.
4.47 Problem. Find the derivative of x e
ax
, x R, where a C.
4.48 Problem. Show that the basic derivation rules (summation rule, product rule
and chain rule) are valid for complex-valued functions.
4.49 Problem. Let f be a complex-valued measurable function (both the real and the
imaginary part are measurable). Show that [f[ is -integrable iff both the real and the
imaginary part of f are -integrable.
4.50 Problem. Show that the -integral of complex-valued functions on R is a linear
functional.
4.51 Problem. Let f be a complex-valued -integrable function. Show that
_
f d

_
[f[ d.
The next problem shows that the usual integration calculus (substitution, integra-
tion by parts) carries over from real-valued functions to complex-valued functions.
4.52 Problem. Show that indenite integrals of complex-valued functions on R are
primitives of their integrands.
4.53 Problem. Find
_
d
c
e
ax
dx, where c, d R, a C.
With these preparations we are in a position to proceed with Fourier transforms.
4.54 Denition. Let [B be a nite measure. Then the function
(t) =
_
e
itx
(dx), t R,
is called the Fourier transform of .
Note that the Fourier transform is well-dened and nite for every t R.
4.55 Problem. Find the Fourier transform of a point measure.
4.56 Problem. Find the Fourier transform of an exponential distribution.
4.57 Problem. Find the Fourier transform of a Poisson distribution.
Hint: The series expansion of the exponential function carries over to the complex-
valued case.
4.58 Problem. Find the Fourier transform of a Gaussian distribution.
Hint: Derive a differential equation for the Fourier transform.
The Fourier transform can be used to nd the moments of a measure.
4.59 Theorem. Let [B be a nite measure. If
_
[x[
k
(dx) < then is k-times
differentiable and
d
k
dt
k
(t)
t=0
= i
k
_
x
k
(dx)
The fundamental fact on Fourier transforms is the uniqueness theorem.
4.61 Theorem. Let
1
[B and
2
[B be nite measures. Then
1
=
2

1
=
2
.
4.5. FOURIER TRANSFORMS 41
We dont prove this theorem here since it is a reformulation of the fundamen-
tal Stone-Weierstrass approximation theorem of mathematical analysis. We refer to
Bauer, [2].
The notion of the Fourier transform can be extended to measures on (R
n
, B
n
).
4.62 Denition. Let [B
n
be a nite measure on R
n
. Then the function
(t) =
_
e
itx
(dx), t R
n
,
is called the Fourier transform of .
The uniqueness theorem is true also for the n-dimensional case.
Part II
Probability theory
43
Chapter 5
Beyond measure theory
5.1 Independence
The notion of independence marks the point where probability theory goes beyond
measure theory.
Recall that two events A, B T are independent if the product formula
P(A B) = P(A)P(B) is true. This is easily extended to families of events.
5.1 Denition. Let ( and T be subfamilies of T. The families ( and T are said to be
independent (with respect to P) if P(A B) = P(A)P(B) for every choice A (
and B T.
It is natural to call random variables X and Y independent if the corresponding
information sets are independent.
5.2 Denition. Two random variables X and Y are independent if (X) and (Y )
are independent.
The preceding denition can be stated as follows: Two random variables X and Y
are independent if
P(X B
1
, Y B
2
) = P(X B
1
)P(Y B
2
), B
1
, B
2
B.
This is equivalent to saying that the joint distribution P
X,Y
of X and Y is the product
of P
X
and P
Y
.
How to check independence of random variables ? Is it sufcient to check the
independence of generators of the information sets ? This is not true, in general, but
with a minor modication it is.
5.3 Theorem. Let X and Y be random variables and let ( and T be generators
of the corresponding information sets. If ( and T are independent and closed under
intersection then X and Y are independent.
45
46 CHAPTER 5. BEYOND MEASURE THEORY
5.4 Problem. Let F(x, y) be the joint distribution function of (X, Y ). Show that X
and Y are independent iff F(x, y) = h(x)k(y) for some functions h and k.
For independent random variables there is a product formula for expectations.
5.5 Theorem. (1) Let X 0 and Y 0 be independent random variables. Then
E(XY ) = E(X)E(Y )
(2) Let X L
1
and Y L
1
be independent random variables. Then XY L
1
and
E(XY ) = E(X)E(Y )
Proof: Apply measure theoretic induction to obtain (1). Part (2) follows from (1).
2
5.6 Problem. Let X and Y be random variables on a common probability space.
Show that X and Y are independent iff
E(e
i(sX+tY )
) = E(e
isX
)E(e
itY
), s, t R.
Recall that square integrable random variables X and Y are called uncorrelated if
E(XY ) = E(X)E(Y ). This is a weaker notion than independence.
5.7 Problem. Show that uncorrelated random variables need not be independent.
5.8 Problem. Find the variance of the sample mean of independent random variables.
5.9 Problem. Showthat X and Y are independent iff f(X) and g(Y ) are uncorrelated
for all bounded measurable functions f and g.
The notion of independence (as well as the notion of uncorrelated random vari-
ables) can be extended to more than two random variables. We will state the appropri-
ate facts when we need them.
5.2 Convergence and limit theorems
For probability theory other kinds of convergence play a predominant role than those
we are accustomed to so far.
5.2. CONVERGENCE AND LIMIT THEOREMS 47
5.10 Denition. Let (, T, P) be a probability space and let (X
n
) be a sequence
of random variables. The sequence (X
n
) is said to converge to a random variable X
P-almost sure if
lim
n
X
n
() = X() for P-almost all
This kind of convergence is also considered in measure theory and we know that
under certain additional conditions convergence P-almost sure implies convergence of
the expectations of the random variables.
However, the probabilistic meaning of almost sure convergence is limited. The
reason is that the idea of approximating a random variable X by another random vari-
able Y in a probabilistic sense does not require that the random variables similar for
all . It is sufcient that the probability of being near to each other is large.
5.11 Denition. Let (, T, P) be a probability space and let (X
n
) be a sequence of
random variables. The sequence (X
n
) is said to converge to a random variable X in
P-probability (X
n
P
X) if
lim
n
P([X
n
X[ > ) = 0 for every > 0
5.12 Problem.
(1) Prove Cebysevs inequality.
(2) Apply Cebysevs inequality to prove the weak law of large numbers (WLLN):
For a sequence (X
n
) of independent and identically distributed square integrable ran-
dom variables the corresponding sequence of sample means converges to the expecta-
tion in probability.
Convergence in probability is actually a weaker concept than almost sure conver-
gence.
5.13 Problem.
(1) Show that almost sure convergence implies convergence in probability.
(2) Show by an example that convergence in probability does not imply almost sure
convergence.
5.14 Problem. Show that convergence in the mean and convergence in the quadratic
mean imply convergence in probability.
The power of convergence in probability is also due to the fact that the dominated
convergence theorem remains valid if almost sure convergence is replaced by conver-
gence in probability.
Many assertions which are obviously valid for almost sure convergence are also
valid for convergence in probability. Let us state two of the most important assertions.
5.15 Theorem.
(1) The limit of a sequence of randomvariables which is convergent in P-probability
is uniquely determined P-almost everywhere.
(2) Convergence in probability is inherited under algebraic operations.
(3) Convergence in probability is inherited under compositions with continuous
functions.
We do not go into details of the proof of the preceding assertions but refer to the
literature.
There is another concept of convergence which is important for probability theory.
This is the notion of weak convergence. Let us comment on this by some motivational
remarks.
In many application one is interested in the approximation of distributions of ran-
dom variables rather than of the random variables themselves. E.g. if we consider
so-called asymptotic normality of random variables (X
n
) we think of approximating
probabilities P(X
n
I) by Q(I) where Q is some normal distribution.
The most famous special case of asymptotic normality is the central limit theorem
(CLT) which in its simplest form runs as follows.
5.16 Theorem. (Central limit theorem)
Let (X
n
) be a sequence of independent identically distributed random variables
square integrable random variables. Let (Z
n
) be the corresponding sequence of stan-
dardized sample means. Then
lim
n
P(Z
n
I) = Q(I)
for every interval I R, where Q denotes the standard normal distribution.
The kind of convergence considered in the central limit theorem is a special case of
weak convergence of distributions. If the limit is a continuous distribution then weak
convergence is concerned with arbitrary intervals I. In general, things are slightly
more complicated.
The proof of the CLT is carried out by Fourier tranfroms since pointwise conver-
gence of Fourier transforms is equivalent to weak convergence.
5.3 The causality theorem
When we are dealing with random variables which are not independent then we would
like to express the kind of dependence in an appropriate way. In this section we con-
sider the strongest kind of dependence.
5.3. THE CAUSALITY THEOREM 49
Let X and Y be random variables such that Y = f X where f is some Borel-
measurable function. Since (Y B) = (X f
1
(B)) it follows that (Y ) (X).
In other words: If Y is a function of X (causally dependent on X) then the information
set of Y is contained in the information set of X. This is intuitively very plausible:
Any assertion about Y can be stated as an assertion about X.
It is a remarkable fact that even the converse is true.
5.17 Theorem. (Causality theorem)
Let X and Y be random variables such that (Y ) (X). Then there exists a mea-
surable function f such that Y = f X.
Proof: By measure theoretic induction it is sufcient to prove the assertion for
Y = 1
A
, A T.
Recall that (Y ) = , , A, A
c
. From (Y ) (X) it follows that A (X),
i.e. A = (X B) for some B B. This means 1
A
= 1
B
X. 2
5.18 Problem. State and prove a causality theoremfor (X
1
, X
2
, . . . , X
k
)-measurable
random variables.
Hint: Let ( be the generating system of (X
1
, X
2
, . . . , X
k
) and let T be the family of
sets A such that 1
A
is a function of (X
1
, X
2
, . . . , X
k
). Show that T is a -eld and
that ( T. This implies that any indicator of a set in (X
1
, X
2
, . . . , X
k
) is a function
of (X
1
, X
2
, . . . , X
k
). Extend this result by measure theoretic induction.
Chapter 6
Random walks
6.1 The ruin problem
One player
Let us start with a very simple gambling system.
A gambler bets a stake of one unit at subsequent games. The games are indepen-
dent and p denotes the probability of winning. In case of winning the gamblers return
is the double stake, otherwise the stake is lost.
A stochastic model of such a gambling system consists of a probability space
(, T, P) and a sequence of random variables (X
i
)
i1
. The random variables are inde-
pendent with values +1 and 1 representing the gamblers gain or loss at time i 1.
Thus, we have P(X = 1) = p. The sequence of partial sums, i.e. the accumulated
gains,
S
n
= X
1
+ X
2
+ + X
n
is called a random walk on Z starting at zero. If p = 1/2 then it is a symmetric
random walk.
Assume the the gambler starts at i = 0 with capital V
0
= a. Then her wealth after
n games is
V
n
= a + X
1
+ X
2
+ + X
n
= a + S
n
The sequence (V
n
)
n0
of partial sums is a random walk starting at a.
We assume that the gambler plans to continue gambling until her wealth attains
agiven level c > a or 0. Let
T
x
:= minn : V
n
= x
6.1 Problem. Explain why T
x
is a random variable.
The conditional probability q
0
(a) := P(T
0
< T
c
[V
0
= a) is called the probability
of ruin. Similarly, q
c
(a) := P(T
c
< T
0
[V
0
= a) is the probability of winning.
51
52 CHAPTER 6. RANDOM WALKS
How to evaluate the probability of ruin ? This probability can be obtained by study-
ing the dynamic behaviour of the gambling situation. Thus, this is a basic example of
a situation which is typical for stochastic analysis: Probabilities are not obtained by
combinatorial methods but by a dynamic argument resulting in a difference or differ-
ential equation.
The starting point is the following assertion.
6.2 Lemma. The ruin probabilities satisfy the difference equation
q
c
(a) = p q
c
(a + 1) + (1 p) q
c
(a 1) whenever 0 < a < c
with boundary conditions q
c
(0) = 0 and q
c
(c) = 1.
It is illuminating to understand the assertion with the help of an heuristic argument:
If the random walk starts at V
0
= a, 0 < a < c, then we have V
1
= a + 1 with
probability p and V
1
= a 1 with probability 1 p. This gives
P(T
c
< T
0
[V
0
= a) = pP(T
c
< T
0
[V
1
= a + 1) + (1 p)P(T
c
< T
0
[V
1
= a 1)
However, the random walk starting at time i = 1 has the same ruin probabilites as the
random walk starting at i = 0. This proves the assertion. In this argument we utilized
the intuitively obvious fact that the starting time of the random walk does not affect its
ruin probabilites.
In order to calculate the ruin probabilities we have to solve the difference equation.
6.3 Discussion. The difference equation
x
a
= px
a+1
+ (1 p)x
a1
whenever a = 1, . . . , c 1
has the general solution
x
a
=
_
_
_
A + B
_
1 p
p
_
a
if p ,= 1/2
A + Ba if p = 1/2
(Hint: Try x
a
=
a
which gives two special solutions for . The general solution is a
linear combination of the special solutions.) The constants A and B are determined by
the boundary conditions. This gives
q
c
(a) =
_
_
_
1 p
p
_
a
1
_
1 p
p
_
c
1
if p ,= 1/2
a
c
if p = 1/2
6.1. THE RUIN PROBLEM 53
In order to calculate q
0
(a) we note that q
0
(a) = q
c
(c a) where q denotes the ruin
probabilities of a random walk with interchanged transitions probabilities. This im-
plies
q
0
(a) =
_
_
_
p
1 p
_
ca
1
_
p
1 p
_
c
1
if p ,= 1/2
c a
c
if p = 1/2
Easy calculations show that
q
c
(a) + q
0
(a) = 1
which means that gambling ends with probability 1.
6.4 Problem.
(a) Fill in the details of solving the difference equation of the ruin problem.
(b) Show that the random walk hits the boundaries almost surely (with probability
one).
Two players
Now we assume that two players with initial capitals a and b are playing against each
other. The stake of each player is 1 at each game. The game ends when one player is
ruined.
This is obviously equivalent to the situation of the preceding section leading to
P(player 1 wins) = q
a+b
(a)
P(player 2 wins) = q
0
(a).
We know that the game ends with probability one.
Let us turn to the situation where player 1 has unlimited initial capital. Then the
game can only end by the ruin of player 2, i.e. if
sup
n
S
n
b
where S
n
denotes the accumulated gain of player 1.
6.5 Theorem. Let (S
n
) be a random walk on Z. Then
P(sup
n
S
n
b) =
_
_
_
1 whenever p 1/2
_
p
1 p
_
b
whenever p < 1/2
Hint: Show that P(sup
n
S
n
b) = lim
a
q
a+b
(a).
Note that P(sup
n
S
n
1) is the probability that a gambler with unlimited initial
capital gains 1 at some time. If p 1/2 this happens with probability 1 if we wait
sufciently long. Later we will see that in a fair game (p = 1/2) the expected waiting
time is innite.
6.2 Optional stopping
Let us consider the question whether gambling chances can be improved by a gambling
system.
We start with a particularly simple gambling system, called optional stopping sys-
tem. The idea is as follows: The gambler waits up to a random time and then starts
gambling. (The game at period +1 is the rst game to play.) Gambling is continued
until a further random time and then stops. (The game at period is the last
game to play.) Random times are random variables , : N
0
.
Now it is an important point that the choice of the random times and depend
only on the information available up to those times since the gambler does not know
the future. A random time which satises this condition is called a stopping time. In
the following we will turn this intuitive notion into a precise denition.
Let X
1
, X
2
, . . . , X
k
, . . . be a sequence of random variables representing the out-
comes of a game at times k = 1, 2, . . ..
6.7 Denition. The -eld T
k
:= (X
1
, X
2
, . . . , X
k
). which is generated by the
events (X
1
B
1
, X
2
B
2
, . . . , X
k
B
k
), B
i
B, is called the past of the sequence
(X
i
)
i1
at time k.
The past T
k
at time k is the information set of the beginning (X
1
, X
2
, . . . , X
k
)
of the sequence (X
i
)
i1
. The history of the game is the family of -elds (T
k
)
k
0
where T
0
= , . The history is an increasing sequence of -elds representing the
increasing information in course of time.
6.8 Denition. Any increasing sequence of -elds is called a ltration.
6.9 Denition. A sequence (X
k
)
k1
of random variables is adapted to a ltration
(T
k
)
k0
if X
k
is T
k
-measurable for every k 0.
Clearly, every sequence of random variables is adapted to its own history.
Now we are in the position to give a formal denition of a stopping time.
6.10 Denition. Let (T
k
)
k0
be a ltration. A random variable : N
0

is a stopping time (relative to the ltration (T
k
)) if
( = k) T
k
for every k N.
6.3. WALDS EQUATION 55
In view of the causality theorem INSERT the realisation of the events ( = k) is
determined by the values of the random variables X
1
, X
2
, . . . , X
k
, i.e.
1
(=k)
= f
k
(X
1
, X
2
, . . . , X
k
)
where f
k
are any functions. In terms of gambling this means that the decisions on
starting or stopping the game depend only on the known past of the game.
6.11 Problem. Let (T
k
)
k0
be a ltration and let : N
0
be a random
variable. Show that the following assertions are equivalent:
(a) ( = k) T
k
for every k N
(b) ( k) T
k
for every k N
(c) ( < k) T
k1
for every k N
(d) ( k) T
k1
for every k N
(e) ( > k) T
k
for every k N
The most important examples of stopping times are rst passage times.
6.12 Problem. Let (X
n
)
n0
be adapted. Show that the hitting time or rst passage
time
= mink 0 : X
k
B
is a stopping time for any B B. (Note that = if X
k
, B for all k N.)
6.3 Walds equation
If our gambler applies a stopping system (, ) with nite stopping times then her gain
is S
.
k
) be a sequence adapted to (T
k
) and let be a nite stopping
time. Then X
is a random variable.
Does the stopping system improve the gamblers chances ? For answering this
question we require some preparations.
6.14 Problem. Let Z be a random variable with values in N
0
. Show that E(Z) =
k=1
P(Z k).
6.15 Theorem. (Walds equation)
Let (X
k
)
k0
be an independent sequence of integrable random variables with a
common expectation E(X
k
) = . If is an integrable stopping time then S
is inte-
grable and
E(S
) = E()
Proof: It is sufcient to show that the equation is true both for the positive parts
and the negative parts of X
k
. Let X
k
0. Then
E(S
) =
k=0
_
=k
S
k
dP =
i=0
_
i
X
i
dP
=
i=0
E(X
i
)P( i) = E()
(Note, that the second equality holds since all terms are 0.) 2
The following assertion answers our question for improving chances by stopping
strategies. It shows that unfavourable games cannot be turned into fair games and
fair games cannot be turned into favourable games. The result is a consequence of
Walds equation. Note, that is the average gain for a single game and S
is the
accumulated gain for the gambling strategy starting at and ending at .
6.16 Corollary. Let (X
k
) be an independent sequence of integrable random variables
with a common expectation E(X
k
) = . Let be integrable stopping times.
Then:
(a) < 0 E(S
) < 0.
(b) = 0 E(S
) = 0.
(c) > 0 E(S
) > 0.
The strategy of waiting until the accumulated gain passes a given level, say 1,
(which happens with probability one for fair games) and then stopping, seems to con-
tradict the preceding assertion. Let us have a closer look at this question.
The problem is concerned with one-sided boundaries. In this case rst passage
times may have expectation such that Walds equation does not apply. In fact,
for random walks with p < 1/2 there is a positive probability of never passing a
positive boundary. Hence, the corresponding rst passage times have the value
with positive probability and therefore E() = . In the symmetric case, however,
we know that each horizontal boundary is passed with probability one. Surprisingly,
the rst passage time has innite expectation, too.
6.17 Problem. Let (S
k
) be a symmetric random walk and let := min(k 0 : S
k
=
1). Show that P( < ) and E() = .
Hint: Assume E() < and derive a contradiction.
Finally, we will apply Walds equation to rst passage times of two-sided bound-
aries. Let (S
n
)
n0
be a random walk (starting at S
0
= 0) with discrete steps +1 and
1. Let
:= min(k 0 : S
k
= a or S
k
= b).
6.3. WALDS EQUATION 57
It should be noted that
is nite but not bounded ! The duration of a gambling system

based on such a stopping time is thus nite but not bounded. The random variable S
involves innitely many periods n N.

If
a
and
b
denote the hitting times of the one-sided boundaries a resp. b then
we have (S
= a) = (
a
<
b
) and (S
= b) = (
b
<
a
). Therefore the
probabilities P(S
= a) and P(S
= b) can be obtained immediately from (6.3).

For this, we have only to note that (using the notation of (6.3))
P(S
= a) = P((
a
<
b
) = P(T
a+b
< T
0
[V
0
= b) = q
a+b
(b)
and
P(S
= b) = P((
b
<
a
) = P(T
0
< T
a+b
[V
0
= b) = q
0
(b)
This gives us the distribution of S
since from (6.3) we also know that
is nite a.s.
and therefore S
has only two values a and b. In this way we can calculate

E(S
) = aP(S
= a) bP(S
= b) = . . .
Next let us turn to the problem of calculating E(
). Our idea is to apply Walds

equation E(S
) = E(
). However, we proved Walds equation only for integrable

stopping times. Up to now we dont know whether
is integrable. Thus, in order to

apply our version of Walds equation we have to use some tricks.
6.18 Problem.
(1) Show that the hitting time
satises E(S
) = E(
).
(2) Find E(
) if ,= 0.
Hint (for (1)): The basic trick is to approximate
by bounded stopping times. Dene
n = min
, n =
_

whenever
n
n whenever
n
It is easy to see that
n is a stopping time. Since it is bounded we may apply Walds

equation and obtain
E(S
n
) = E(
n)
It is clear that
as n and similarly S
n
S
. In order to obtain
the validity of Walds equation for
we have to think about the question whether the

corresponding expectations converge. The answer is that the left hand side converges
by Lebegues dominated convergence theorem and the right hand side converges by
Levis theorem.
Unfortunately, this method only works if ,= 0 which is the case if p ,= 1/2. It
does not work for the symmetric random walk. We will come back to this problem
later (problem 7.23).
6.4 Gambling systems
Now, we generalize our gambling system. We are going to admit that the gambler may
vary the stakes. The stopping system is a special case where only 0 and 1 are admitted
as stakes.
The stake for game n is denoted by H
n
and has to be nonnegative. It is xed
before period n and therefore must be T
n1
-measurable since it is determined by the
outcomes at times k = 1, 2, . . . , n 1. The sequence of stakes (H
n
) is thus not only
adapted but even predictable.
The gain at game k is H
k
X
k
= H
k
(S
k
S
k1
) = H
k
S
k
. For the wealth of the
gambler after n games we obtain
V
n
= V
0
+
n
k=1
H
k
(S
k
S
k1
) = V
0
+
n
k=1
H
k
S
k
(2)
If the stakes are integrable then we have
E(V
n
) = E(V
n1
) + E(H
n
)E(X
n
).
In particular, if p = 1/2 we have E(V
n
) = 0 for all n N.
It follows that variable stakes cannot change the fairness of a game. This is even
true if we combine variable stakes with stopping strategies. The following assertion
goes beyond Walds equation since the wealth sequence (V
n
)
n0
need not be a random
walk. Later we will use such sequences as prototypes of martingales.
6.19 Theorem.
Let (X
k
)
k1
be an independent sequence of integrable random variables with a
common expectation E(X
k
) = . Let (V
n
)
n0
be the sequence of wealths generated
by a gambling system with integrable stakes. If are bounded stopping times
then
(a) < 0 E(V
) < 0,
(b) = 0 E(V
) = 0,
(c) > 0 E(V
) > 0.
Proof: Let N := max . Since
V
=
N
k=1
H
k
X
k
1
<k
and since ( < k ) is independent of X
k
it follows that
E(V
) =
N
k=1
E(H
k
1
<k
).
2
6.4. GAMBLING SYSTEMS 59
There is a difference to the situation with constant stakes where we admitted un-
bounded but integrable stopping times. In the present case of variable stakes this is no
longer true as is shown by the famous doubling strategy.
6.20 Example. Let be the waiting time to the rst success, i.e.
= mink 1 : X
k
= 1
and dene stakes by
H
n
:= 2
n1
1
n
Obviously, the stakes are integrable. However, we have
P(V
= 1) = 1
for any p (0, 1). Therefore, a fair game can be transformed into a favourable game
by such a strategy. And this is true although the stopping time is integrable, actually
E() = 1/p !
6.21 Problem. Fill in the details of 6.20.
6.22 Problem. Let p = 1/2. Show that for the doubling strategy we have E(V
n
) =
0.
6.23 Problem. Explain for the doubling strategy, why Lebesgues theorem on domi-
nated convergence does not imply E(V
n
) E(V
), although V
n
V
as n .
Hint: Show that the sequence (V
n
) is not dominated from below by an integrable
random variable.
Chapter 7
Conditioning
7.1 Conditional expectation
Now we are going to explore the most important and most successful probabilistic
notion for describing dependence. The starting point is the relation between a random
variable and a -eld.
Let (, T, P) be a probability space and let / T be a sub--eld.
If a random variable X is /-measurable, i.e. (X) /, then the information
available in / tells us everything about X. In fact, every assertion (X B), B B,
about X is contained in /, and if we know which events in / have been realized we
know everything about X. A special case is the causality theorem according to which
/ = (Y ) implies that X = f(Y ).
Now, howcan we describe the relation between X and /if X is not /-measurable ?
If the random variable X is not /-measurable we could be interested in nding an
optimal /-measurable approximation of X. This idea leads to the concept of condi-
tional expectation. Let us explain the kind of approximation we have in mind.
A successful way consists in decomposing X into a sumX = Y +Rwhere Y is /-
measurable and R is uncorrelated to /. A minimal requirement on Y is that E(X) =
E(Y ) which implies E(R) = 0. Then the condition on R of being uncorrelated to /
means
_
A
RdP = 0 for all A /.
In other words the approximating variable Y should satisfy the condition
_
A
X dP =
_
A
Y dP for all A /
For these integrals to be dened we need nonnegative or integrable random variables.
7.1 Theorem. Let X 0 or integrable and let / T be a -eld. Then there exists
a P-a.s. uniqueley determined random variable Y satisfying
_
A
X dP =
_
A
Y dP for all A /
61
62 CHAPTER 7. CONDITIONING
If X 0 then Y 0 P-a.e. If X is integrable then Y is integrable, too.
Proof: This is a consequence of the Radon-Nikodym theorem. If X 0 then
(A) :=
_
A
X dP denes a measure on / such that P. Dene Y :=
d
dP
.
Then this random variable fulls the asserted equation, is nonnegative and is uniquely
determined P-a.e.
If X is integrable apply the preceding to X
+
and X
. 2
7.2 Denition. Let (, T, P) be a probability space and let / T be sub--eld.
Let X be a nonnegative or integrable random variable. The conditional expectation
E(X[/) of X given / is a nonnegative resp. integrable, and /-measurable random
variable satisfying
_
A
X dP =
_
A
E(X[/) dP for all A /
Exploring conditional expectations we have to discuss both the evaluation and the
mathematical properties of conditional expectations. Let us state the basic facts on
evaluation in terms of problems.
7.3 Problem. Find the conditional expectation given a nite eld.
Hint: Find the values of the conditional expectation on the sets of the generating parti-
tion.
Let X and Y be random variables where X is integrable. Then the conditional
expectation E(X[(Y )) is a (Y )-measurable random variable and thus can be written
as E(X[(Y )) = f Y . The function f is called the regression function of X w.r.t. Y .
Usually, notation is simplied by E(X[(Y )) =: E(X[Y ) and f(y) =: E(X[Y = y).
7.4 Problem. Let X and Y be random variables where X is integrable. Assume that
Y is a simple random variable. Calculate E(X[Y = y).
7.5 Problem. Let X and Y be random variables where X is integrable. Assume
that the joint distribution of (X, Y ) has the Lebesgue density p(x, y). Show that the
regression function of X w.r.t. Y is
f(y) = E(X[Y = y) =
_
x p(x, y) dx
_
p(x, y) dx
Hint: Show that
_
Y B
X dP =
_
Y B
f(Y ) dP, B B.
Let us turn to the mathematical properties of the conditional expectation.
7.1. CONDITIONAL EXPECTATION 63
The following properties are easy consequences of the denition of conditional
expectations. Moreover, these properties are intuitively plausible in the sense that any
reasonable notion of a conditional expectation should have these properties.
7.6 Problem. Assume that X is an integrable random variable. Then:
(a) E(E(X[/)) = E(X)
(b) If X is /-measurable then E(X[/) = X.
(c) If X is independent of / then E(X[/) = E(X).
7.7 Theorem. (Linearity of conditional expectations)
(1) Assume that X and Y are nonnegative random variables. Then
E(X + Y [/) = E(X[/) + E(Y [/) whenever , 0.
(2) Assume that X and Y are integrable random variables. Then
E(X + Y [/) = E(X[/) + E(Y [/) whenever , R.
Proof: The proof follows a scheme which is always the same for similar assertions
concerning formulas for conditional expectations: If we want to show that E(Z
1
[/) =
Z
2
then we have to show that the equations
_
A
Z
1
dP =
_
A
Z
2
dP, A /,
are true. The verication of these equations is in most cases completely straightforward
using standard properties of integrals. 2
From linearity it follows immediately that conditional expectations preserve the
order structure (isotonicity) and consequently full several inequalities which depend
on linearity and isotonicity.
7.8 Problem.
(1) If X Y are nonnegative or integrable random variables then E(X[/)
E(Y [/).
(2) Let X L
1
. Show that [E(X[/)[ E([X[[/).
(3) Let X L
1
. Show that E(X[/)
2
E(X
2
[/).
Hint: Start with E((X E(X[/))
2
[/) 0.
(4) Show that X L
2
implies E(X[/) L
2
.
After having established linearity we have to isolate those properties which are
characteristic for conditional expectations.
7.9 Theorem. (Iterated conditioning)
Let / B be sub--elds of T. Then for nonnegative or integrable X
E(E(X[/)[B) = E(E(X[B)[/) = E(X[/)
(The smaller -eld succeeds.)
7.10 Problem. Prove theorem (7.9).
The following is located at the core of the concept of conditional expectations. It
says that conditional expectations are not only homogeneous with respect to constant
factors but also with respect to /-measurable factors if / is the conditioning -eld.
7.11 Theorem. (Redundant conditioning)
Let X and Y be square integrable. If X is /-measurable then
E(XY [/) = XE(Y [/).
7.12 Problem. Prove theorem (7.11).
Hint: Use measure theoretic induction.
7.13 Problem. Let X L
2
. Show that E(X[/) minimizes E((X Y )
2
) among all
/-measurable variables Y L
2
.
7.14 Problem. Let X and Y be square integrable. If X is /-measurable and Y is
independent of / then
E(XY [/) = XE(Y ).
Let us make some comments on the assertion of the preceding problem.
7.15 Remark. If / = (X) and Y is independent of X then problem 7.14 implies
E(XY [X = x) = E(xY [X = x)
This is intuitively plausible since conditioning on X = x determines the value of X.
This is even true in more general cases
E(f(X, Y )[X = x) = E(f(x, Y ))
(provided that f is sufciently integrable).
If / is any -eld independent of X then the corresponding equations have to be
written as follows:
E(XY [/) = X where () = E(Y )
and
E(f(X, Y )[/) = X where () = E(f(, Y ))
7.2. MARTINGALES 65
This can be proved by measure theoretic induction.
7.16 Problem.
(1) Let X be a random variable with moment generating function being nite on
an interval I and / a -eld. If E(e
tX
[/) is constant for every t I then X and /
are independent.
(2) Find a similar assertion involving the Fourier transform.
7.2 Martingales
In gamblers speech gambling systems are called martingales. This might be the his-
torical reason for the mathematical terminology: A (mathematical) martingale is the
mathematical notion of a value process of a gambling strategy in a fair game.
From Theorem 6.19 we know that the expected gamblers wealth in a fair game
cannot be changed by stopping times. This property can equivalently be stated without
using the concept of stopping times. The basis is the following lemma.
7.17 Lemma. Let (X
n
)
n0
be adapted to (T
n
)
n0
. If are bounded stopping
times then for any A T
_
A
(X
)dP =
n
j=1
_
A(<j)
_
E(X
j
[T
j1
) X
j1
_
dP
Proof: Let n. It is obvious that
X
= X
0
+
j
(X
j
X
j1
) = X
0
+
n
j=1
1
j
(X
j
X
j1
)
This gives
_
A
(X
X
0
)dP =
n
j=1
_
A(j)
(X
j
X
j1
)dP
We may replace X
j
by E(X
j
[T
j1
). 2
7.19 Theorem. Let (X
n
)
n0
be adapted to (T
n
)
n0
. Then the following assertions
are equivalent:
(1) E(X
) = E(X
) for all bounded stopping times .

(2) E(X
j
[T
j1
) = X
j1
, j 1.
Proof: (2) (1) is clear from 7.17. Let us show that (1) (2).
Let F T
j1
and dene
:=
_
j 1 whenever F,
j whenever , F.
Then is a stopping time. From E(X
j
) = E(X
) the assertion follows. 2

n
)
n0
be a ltration and let (X
n
)
n0
be an adapted sequence
of integrable random variables. The sequence (X
n
) is called a martingale if any of
condition (1) or (2) of Theorem 7.19 is satised.
7.21 Problem. Let S
n
= X
1
+X
2
+ +X
n
where (X
i
) are independent identically
distributed (i.i.d.) and integrable random variables with E(X
i
) = .
(a) Show that M
n
:= S
n
n is a martingale.
(b) Derive Walds equation for bounded stopping times from the martingale prop-
erty.
7.22 Problem. Let S
n
= X
1
+ X
2
+ + X
n
where (X
i
) are independent identi-
cally distributed (i.i.d.) and square integrable random variables with E(X
i
) = 0 and
V (X
i
) =
2
.
(a) Show that M
n
:= S
2
n
2
n is a martingale.
Hint: Note that S
2
n
S
2
n1
= X
2
n
+ 2S
n1
X
n
.
(b) Show that E(S
2
) = E()
2
for bounded stopping times .
7.23 Problem. Let
be a rst passage time of a symmetric random walk with a

two-sided boundary.
(a) Show that E(S
2
) = E(
)
2
.
(b) Find E(
).
7.24 Problem. Let S
n
= X
1
+ X
2
+ + X
n
where (X
i
) are independent iden-
tically distributed (i.i.d.) and integrable random variables with E(X
i
) = 0. Let (H
k
)
be a predictable (with respect to the history of (S
n
)) sequence of integrable random
variables.
Show that
V
n
:= V
0
+
n
k=1
H
k
(S
k
S
k1
)
is a martingale.
7.25 Problem. Show that a predictable martingale is constant.
Equation (1) of Theorem 7.19 extends easily to stopping times after having dened
the past of a stopping time.
7.26 Problem. Let be a stopping time. Show that
T
:= A T : F ( j) T
j
, j 0
7.3. SOME THEOREMS ON MARTINGALES 67
is a -eld (the past of the stopping time ).
7.27 Theorem. (Optional stopping theorem)
Let (X
n
)
n0
be a martingale. Then for any pair of bounded stopping times
we have
E(X
[T
) = X
Proof: Applying 7.17 to A T
proves the assertion. 2

The importance of the martingale concept becomes even more clear by the fol-
lowing theorem which is the elementary version of the celebrated Doob-Meyer de-
composition. It is the nal mathematical formulation of the old idea that time series
can be decomposed into a noise component and a trend component. The notion of a
martingale turns out to be the right formalization of the idea of noise.
7.28 Theorem. (Doob-Meyer decomposition)
Each adapted sequence (X
n
)
n0
of integrable random variables can be written as
X
n
= M
n
+ A
n
, n 0,
where (M
n
) is a martingale and (A
n
) is a predictable sequence, i.e. A
n
is T
n1
-
measurable for every n 0.
The decomposition is unique up to constants.
The sequence (X
n
)
n0
is a martingale iff (A
n
) is constant.
Proof: Let
M
n
=
n
j=1
(X
j
E(X
j
[T
j1
))
and
A
n
=
n
j1
(E(X
j
[T
j1
) X
j1
)
This proves existence of the decomposition. Uniqueness follows from the fact that a
predictable martingale is constant. The rest is obvious. 2
7.3 Some theorems on martingales
A major problem of probability theory is how to deal with the variance of sums of
random variables. This question is important both for the LLN and for the CLT.
If the random variables are independent then things are easy since the variance
of the sum of independent random variables equals the sum of the variances of the
variables themselves. We know that sums of independent random variables with mean
zero are the simplest examples of martingales.
It turns out that the variance of a martingale is the sum of the variances of its
differences. This is an extension of the familiar formula for the variance of a sum of
independent random variables.
7.29 Problem. Let (M
k
) be a square integrable martingale. Then
E(M
2
n
) = M
0
+
n
k=1
E((M
k
M
k1
)
2
).
The study of the sequence (M
2
n
) where (M
n
) is a martingale is basic for many
results in the theory of stochastic processes. It is convenient to isolate the essential
property of such sequences.
n
)
n0
be a ltration and let (X
n
)
n0
be an adapted sequence
of integrable random variables.
The sequence (X
n
) is called a submartingale if E(X
) E(X
) for all bounded

stopping times .
The sequence (X
n
) is called a supermartingale if E(X
) E(X
) for all bounded

stopping times .
7.31 Problem. Extend theorem 7.19 to submartingales and supermartingales.
7.32 Problem. Describe the predictable components of submartingales and super-
martingales in the Doob-Meyer decomposition.
n
) be a square integrable martingale. Show that ([M
n
[) and
(M
2
n
) are a submartingales.
Apart from its conceptual importance the concept of martingales there is also a
great impact on proving mathematical theorems in probability theory. There are re-
lations of martingale theory to convergence of random variables and to maximal in-
equalities.
At the present stage of this text we only consider maximal inequalities. A remark-
able property of submartingales is the fact that the maximum of a sequence can be
bounded from above by the last component.
7.34 Theorem. (Maximal inequality)
Let (X
k
) be a nonnegative submartingale. Then
P(max
jn
X
j
)
1
_
max
jn
X
j
X
n
dP
E(X
n
)
7.3. SOME THEOREMS ON MARTINGALES 69

Proof: Let = mink : X
k
and put = n if max
kn
X
k
< . This is a
stopping time. Denote M := max
kn
X
k
. Then
E(X
n
) E(X
)
=
_
M
X
dP +
_
M<
X
dP
P(M ) +
_
M<
X
n
dP
2
7.35 Problem. Let X 0. Show that E(X
2
) 2
_
0
tP(X t) dt.
Hint: Note that X()
2
= 2
_
X()
0
t dt and apply Fubinis theorem.
7.36 Theorem. (Kolmogoroffs inequality for martingales)
Let (X
k
) be a nonnegative submartingale. Then
E(max
kn
X
2
k
) 4E(X
2
n
)
Proof: Let X = max
jn
X
j
and Y = X
n
. Then we know from the maximal
inequality that
P(X t)
1
t
_
Y t
Y dP
It follows that
E(X
2
) = 2
_

0
tP(X t) dt
2
_

0
_
Xt
Y dP dt
= 2
_
XY dP 2
_
E(X
2
)E(Y
2
)
2
Chapter 8
Stochastic processes
8.1 Basic concepts
A stochastic process (random process) on a probability space (, T, P) is a family
(X
t
)
t0
of random variables. The parameter t is usually interpreted as time. Therefore,
the intuitive notion of a stochastic process is that of a random system whose state at
time t is X
t
.
There are some notions related to a stochastic process (X
t
)
t0
which are important
from the very beginning: the starting value X
0
, the increments X
t
X
s
for s < t,
and the paths t X
t
(), .
The most important prototypes of stochastic processes are dened in terms of the
properties of the increments and their path properties.
8.2 The Poisson process
Let us start with the Poisson process. This process has the advantage of admitting a
constructive approach.
Assume that incidental signals appear in the course of time. The waiting times be-
tween two subsequent signals follow an exponential distribution with a xed parameter
> 0. Different waiting times are independent from each other. The Poisson process
(N
t
)
t0
is the counting process of the signals, i.e. N
t
is number of signals in [0, t].
8.1 Problem. Let be a nonnegative random variable. Show that follows an
exponential distribution iff it has the absence of memory property
P( > t + s[ > t) = P( > s) s > 0, t > 0.
Hint: For the sufciency part show that g(t) := P( > t) satises g(s+t) = g(s)g(t).
Let us put the idea of the Poisson process into mathematical terms.
71
72 CHAPTER 8. STOCHASTIC PROCESSES
Let
1
,
2
, . . . ,
n
, . . . be a sequence of independent random variables each dis-
tributed according to G(1, ), i.e. an exponential distribution with parameter . These
random variables stand for the waiting times between subsequent signals. Let
T
k
=
1
+
2
+ +
k
be the waiting time for the k-th signal. We know that T
k
follows the Gamma distribu-
tion G(k, ).
The Poisson process (N
t
)
t0
is now dened by
N
t
= n T
n
t < T
n+1
, n = 0, 1, 2, . . . (3)
This means: N
t
is the number of signals during the time interval [0, t].
Basically, this is the full denition of the Poisson process. There are several other
equivalent denitions. But for the moment we are interested in the basic properties of
the Poisson process which follow from our construction.
8.2 Problem. Explain why the starting value of a Poisson process is N
0
= 0.
8.3 Problem. Show that P(N
t
< ) = 1, t > 0. (The Poisson process has no
explosions).
Hint: Show that lim
n
T
n
= , P-a.s.
Let us turn to the path properties of the Poisson process. Any single path t
N
t
() of the Poisson process starts at N
0
= 0 and jumps to N
t
= 1 as soon as the rst
signal occurs. Any further change of the value is a jump of height +1. Between two
jumps the paths are constant. By denition the intervals where the paths are constant
are left closed and right open which implies that the paths of the Poisson process are
continuous from right and have limits from left (cadlag).
8.4 Denition. A stochastic process is a counting process if it starts at 0, has cadlag
paths which are constant except at jumps of height +1 and has no explosions.
As a result we obtain: With respect to its path properties the Poisson process is a
counting process.
It should be clear that there are as many counting processes as there are sequences
of nonnegative random variables (
n
)
n1
such that
n
i=1
i
. The Poisson pro-
cess is a very special counting process which is generated by an i.i.d sequence of
exponentially distributed random variables.
Let us turn to the distributional properties of the Poisson process. The key result is
as follows.
8.5 Theorem.
(1) A Poisson process has independent increments, i.e. if 0 < t
1
< t
2
< . . . < t
n
then the increments
N
t
1
, N
t
2
N
t
1
, . . . , N
t
n
N
t
n1
8.2. THE POISSON PROCESS 73
are independent variables.
(2) The increments N
t
N
s
, 0 s < t, of a Poisson process follow a Poisson
distribution with parameter (t s).
The rst part of the assertion says that the Poisson process has independent incre-
ments. The second part implies that the distribution of the increments depends only of
the length of the interval but not on the position. This means that the Poisson process
has stationary increments.
Both parts of the preceding theorem are consequences of the fact that the waiting
times between subsequent jumps have a common exponential distribution.
8.6 Problem. Prove part (2) of Theorem 8.5.
Hint: Apply redundant conditioning to show that
P(T
n
t < T
n+1
) =
_
T
n
t
P(
n+1
> t T
n
[T
n
) dP =
_
t
0
P(
n+1
> t y) P
T
n
(dy)
A formal proof of part (1) of Theorem 8.5 involves the substitution formula for
multiple integration. The proof is based on a property of the Poisson process which is
of interest in itself.
8.7 Problem. Let (N
t
)
t0
be a Poisson process. Show that 8.5 implies that for
0 < s < t and k + l = n, k, l N,
P(N
s
= k, N
t
N
s
= [N
t
= n) =
n!
k! !
_
s
t
_
k
_
t s
t
_
In order to prove 8.5 one has to establish the assertion of the preceding exercise as
consequence of exponentially distributed waiting times. The assertion itself says that
given the number of signals in an interval the positions of the signals are distributed
like independent and uniformly distributed random variables. This fact can be used for
efcient simulation of Poisson processes.
We saw that the Poisson process is a counting process with stationary and indepen-
dent increments. It can be shown that these properties already determine the Poisson
process. The exponential distribution of the waiting times (or equivalently the Poisson
distribution of the increments) is a necessary consequence.
8.8 Problem. Show that a Poisson process is continuous in probability,
i.e. lim
st
N
s
= N
t
(P).
Note that continuity in probability does not contradict to the existence of jumps. In
particular, it does not imply that paths are continuous ! If a process with cadlag paths
is continuous in probability then the only assertion we can make is that the paths dont
have xed jumps like calendar effects.
8.3 Point processes
The Poisson process is in many respects the most basic and simple prototype of a
stochastic process. Let us give a brief survey of classes of stochastic processes which
arise by generalizing the Poisson process.
Let us start with counting processes. We have already mentioned that any sequence
of nonnegative random variables (
n
)
n1
such that
n
i=1
i
denes a counting
process (N
t
)
t0
by formula (3). If these random variables are independent and identi-
cally distributed then the counting process is a renewal process. We will not go into
further details of renewal processes.
A counting process can be viewed as a system which produces the value +1 at
certain random times T
n
=
1
+
2
+ +
n
. A point process is a system which
produces more general values (Y
n
) at randomtimes T
n
. Thus, a point process is dened
by a sequence (T
n
, Y
n
)
nN
of pairs of random variables.
After n jumps and before the n + 1-st jump, i.e. if T
n
t < T
n+1
, the value X
t
of
the point process is S
n
= Y
1
+ Y
2
+ + Y
n
. This means X
t
= S
N
t
, t 0.
It should be clear that point processes have cadlag paths which are constant except
at jumps. Jumps heights can have any size and may be random.
A simple but important special case of a point process is a compound Poisson
process. In such a case (Y
n
) is a sequence of independent and identically distributed
random variables and (N
t
) is a Poisson process which is independent of (Y
n
). In other
words we have
X
t
=
N
t
i=1
Y
i
, t 0.
8.9 Problem. Let (Y
1
, Y
2
, . . . , Y
n
) be independent random variables with common
distribution Q and let N be a Poisson random variable with parameter . The distribu-
tion of S = Y
1
+ +Y
N
is called compound Poisson distribution with parameters
Q and .
(1) Find the Fourier transform of a compound Poisson distribution.
(2) Show that for every compound Poisson distribution there exists a uniquely de-
termined pair (Q, ) such that Q0 = 0.
8.10 Problem. Find expectations and variances of a compound Poisson process.
8.11 Problem. Show that a compound Poisson process has independent and station-
ary increments.
8.12 Problem. Show that a compound Poisson process is continuous in probability.
For every stochastic process (X
t
) with cadlag paths the jump at time t is given by
X
t
:= X
t
X
t
. We may count the number of jumps with particular jump heights.
8.4. LEVY PROCESSES 75
E.g.
st
1
(|X
s
|1)
denotes the number of jumps until time t with absolute jump height 1. Since a
cadlag function on a compact interval can have only nitely many jumps 1 the sum
is nite. In general, for B B let
N
t
(B) :=
st
1
(X
s
B\{0})
.
For every B B which is bounded away from zero this is a counting process.
8.13 Denition. Let (X
t
) be a stochastic process with cadlag paths. Then
t
(B) :=
E(N
t
(B)), B B, is called the jump measure of the process.
8.14 Problem. Find the jump measure of a compound Poisson process.
8.4 Levy processes
If we concentrate at the incremental properties of the Poisson process then we arrive
at another important class of processes.
8.15 Denition. A stochastic process is called a Levy process if it has cadlag paths,
independent and stationary increments and is continuous in probability.
Applying this terminology we note that Poisson processes and compound Poisson
processes are Levy processes.
Compound Poisson processes have paths which are piecewise constant (constant
on intervals which are separated by isolated jumps). General Levy processes need not
have piecewise constant paths. The path structure of a Levy process may be of a very
complicated nature. There may be continuous parts and parts with accumulation of
innitely many very small jumps.
Note that for stochastic sequences the property of having independent and sta-
tionary increments is typical for random walks. Thus, Levy processes are the natural
extension of random walks to continuous time.
8.16 Problem. Assume that (X
t
) is a square integrable Levy process. Show that
expectations and variances of X
t
are proportional to t.
t
) be a Levy process.
(1) Show that E(e
iuX
t
) = e
it(u)
.
(2) Show that the function determines the distribution of the increments of (X
t
).
(3) Find the function for Poisson processes and compound Poisson processes.
At this point we are far from a systematic discussion of the theory of Levy pro-
cesses. But we are in a position to get an idea of the richness of this concept.
The distributions of the random variables of a Levy process have a remarkable
property. In order to describe this property we need the notion of convolution. The
probabilistic version of convolution is as follows: Let X and Y be independent random
variables. Then the distribution of X +Y is called the convolution of P
X
and P
Y
and
is denoted by P
X
P
Y
.
8.18 Problem. Let X and Y be independent random variables with distributions
1
and
2
, respectively. Let m be the distribution of X + Y . Show that
_
f dm =
_ _
f(x + y)
1
(dx)
2
(dy) (4)
for every bounded measurable function f.
Equation (4) serves as denition of the convolution m =
1

2
of arbitrary -
nite measures. In analytical terms a convolution is most easily described by Fourier
transforms.
8.19 Problem. Show that m =
1

2
iff m =
1

2
.
Now we are in a position to state the announced remarkable property of the distri-
butions of a Levy process.
t
)
t0
be a Levy process and denote
t
:= P
X
t
, t 0. Then
s

t
=
s+t
, i.e. the distribution of X
s+t
is the convolution of the distributions of
X
s
and X
t
, s, t 0.
8.21 Problem. Prove Theorem 8.20.
Now, there is an important converse of Theorem 8.20. It says that for every family
of distributions (
t
) satisfying the convolution property of 8.20 (together with some
continuity condition) there exists a corresponding Levy process.
The examples of Levy processes we know so far are the Poisson process and the
compound Poisson process. But there is a plenty of further examples of Levy processes
which can be described by the corresponding family (
t
).
8.22 Problem. Show that the family of normal distributions
t
:= (0, t), t 0, has
the convolution property.
8.23 Problem. Show that the family of Gamma distributions
t
:= G(t, 1), t 0,
has the convolution property.
The second example corresponds to a Levy process with a very complicated path
structure. The paths are increasing and not constant on any interval, but do not contain
8.5. THE WIENER PROCESS 77
any continuous parts. The process is a pure jump process driven by innitely many
jumps which are dense on every time interval.
The rst example corresponds to the Wiener process. This is the only Levy process
having continuous paths.
8.5 The Wiener Process
8.24 Denition. A stochastic process (W
t
)
t0
is called a Wiener process if
(1) the starting value is W
0
= 0,
(2) the increments W
t
W
s
are N(0, t s)-distributed and mutually independent for
non-overlapping intervals,
(3) the paths are continuous for P-almost all .
The Wiener process is thus a Levy process with continuous paths. Later (problem
14.24) it will be shown that the Wiener process is the only Levy process with continu-
ous paths (up to a scaling factor). The distributional properties of the increments are a
necessary consequence.
8.25 Problem. Let (W
t
) be a Wiener process and dene X
t
= x
0
+ t + W
t
(generalized Wiener process). Discuss the properties of (X
t
).
As it is the case with many probability models one has to ask whether there exists
a probability space (, T, P) and a family of random variables (W
t
)
t0
satisfying
the properties of Denition 8.24. The mathematical construction of such models is a
complicated matter and is one of great achievements of probability theory in the rst
half of the 20th century. Accepting the existence of the Wiener process as a valid
mathematical model we may forget the details of the construction (there are several of
them) and start with the axioms stated in 8.24.
It is, however, easy to set up discrete time random walks which approximately
share the properties of a Wiener process.
8.26 Discussion. Let X
1
, X
2
, . . . be independent replications of a bounded random
variable X such that E(X) = 0 and V (X) =
2
< . (E.g. P(X
i
= u) = d/(u+d),
P(X
i
= d) = u/(u + d), where u, d > 0.) Then for every n N
S
nt
=
1
n
[nt]
i=1
X
i
, t 0,
is a random walk with jumps at every point t = k/n, k N, and being constant on
intervals of length 1/n.
This random walk approximately shares the properties of a Wiener process if n is
large. Indeed, the paths are almost continuous since the jump heights are uniformly
small in that they are are determined by the distribution of X/
n. The increments are

independent at least on the discrete time scale. And from the CLT it follows that the
distribution of the S
nt
as well as of the increments S
nt
S
ns
tend to normal distributions
N(0,
2
t) and N(0,
2
(t s)), respectively.
t
)
t0
be a Wiener process. Show that X
t
:= W
t
, t 0, is a
Wiener process, too.
8.28 Problem. Show that W
t
/t
P
0 as t .
For beginners the most surprising properties are the path properties of a Wiener
process.
The paths of a Wiener process are continuous (which is part of our denition). In
this respect the paths seem to be not complicated since they have no jumps or other
singularities. It will turn out, however, that in spite of their continuity, the paths of a
Wiener process are of a very peculiar nature.
8.29 Theorem. Let (W
t
)
t0
be a Wiener process. For every t > 0 and every
Riemannian sequence of subdivisions 0 = t
n
0
< t
n
1
< . . . < t
n
n
= t
n
i=1
[W(t
n
i
) W(t
n
i1
)[
2
P
t, t > 0.
Hint: Let Q
n
:=
n
i=1
[W(t
n
i
) W(t
n
i1
)[
2
for a particular Riemannian sequence of
subdivisions. Show that E(Q
n
) = t and V (Q
n
) 0. Then the assertion follows from
Cebysevs inequality.
Theorem 8.29 shows that (almost) all paths of a Wiener process have nonvanishing
quadratic variation. This implies that the paths dont have bounded variation on any
interval (see problem 3.46). Actually, it can even be shown that they are nowhere
differentiable.
The assertion of 8.29 can be improved to P-almost sure convergence which implies
that the quadratic variation on [0, t] of almost all paths is actually t. It is remarkable
that the quadratic variation of the Wiener process is a deterministic function of a very
simple nature in that it is a linear function.
Chapter 9
Martingales
9.1 From independent increments to martingales
The topic of this section are martingales. Our main examples will be general Levy
processes. Therefore the assertions of this section cover Wiener processes, Poisson
processes and compound Poisson processes.
We start with the concept of the past of a process.
t
)
t0
be stochastic process. The past of the process (X
t
)
t0
at
time t is the -eld of events T
X
t
= (X
s
: s t) generated by variables X
s
of the
process prior to t, i.e. s t. The internal history of (X
t
)
t0
is the family (T
X
t
)
t0
of
pasts of the process.
The intuitive idea behind the concept of past is the following: T
X
t
consists of all
events which are observable if one observes the process up to time t. It represents the
information about the process available at time t. It it obvious that t
1
< t
2
T
X
t
1

T
X
t
2
. If X
0
is a constant then T
X
0
= , .
9.2 Denition. Any increasing family of -elds (T
t
)
t0
is called a ltration. A
process (Y
t
)
t0
is adapted to the ltration (T
t
)
t0
if Y
t
is T
t
-measurable for every
t 0.
The internal history (T
X
t
)
t0
of a process (X
t
)
t0
is a ltration and the process
(X
t
)
t0
is adapted to its internal history. But also Y
t
:= (X
t
) for any measurable
function is adapted to the internal history of (X
t
)
t0
. Adaption simply means that
the past of the process (Y
t
)
t0
at time t is contained in T
t
. Having the information
contained in T
t
we know everything about the process up to time t.
9.3 Denition. A martingale relative to the ltration (T
t
)
t0
is an adapted and
integrable stochastic process (X
t
)
t0
such that
E(X
t
[T
s
) = X
s
whenever s < t
79
80 CHAPTER 9. MARTINGALES
It is a square integrable martingale if E(X
2
t
) < , t 0.
9.4 Problem. Show that the martingale property remains valid if the ltration is
replaced by another ltration consisting of smaller -elds, provided that the process
is still adapted.
For establishing the martingale property the property of having independent incre-
ments plays an important role. Therefore Levy processes are natural candidates for
martingales.
9.5 Lemma. Let (X
t
)
t0
be a Levy process. Then the increments X
t
X
s
of the
Levy process (X
t
)
t0
are independent of the past T
X
s
.
Proof: Let s
1
s
2
. . . s
n
s < t. Then the random variables
X
s
1
, X
s
2
X
s
1
, . . . , X
s
n
X
s
n1
, X
t
X
s
are independent. By calculating partial sums it follows that even the random variables
X
s
1
, X
s
2
, . . . , X
s
n
are independent of X
t
X
s
. Since this is valid for any choice of
time points s
i
s the independence assertion carries over to the whole past T
X
t
. 2
9.6 Theorem. Let (X
t
)
t0
be an integrable Levy process such that E(X
1
) = . Then
M
t
= X
t
t is a martingale with respect to its internal history.
Proof: Since M
t
M
s
is independent of T
s
it follows that E(M
t
M
s
[T
s
) =
E(M
t
M
s
) = 0. Hence E(M
t
[T
s
) = E(M
s
[T
s
) = M
s
. 2
It follows that any Wiener process (W
t
) is a martingale. If (N
t
) is a Poisson process
with parameter then N
t
t is a martingale.
9.7 Problem. Apply Theorem 9.6 to a compound Poisson process.
A nonlinear function of a martingale typically is not a martingale. But the next
theorem is a rst special case of a very general fact: It is sometimes possible to correct
a process by a bounded variation process in such a way that the result is a martingale.
9.8 Theorem. Let (X
t
)
t0
be a square integrable Levy process such that E(X
1
) =
and V (X
1
) =
2
and let M
t
= X
t
t. Then the process M
2
t

2
t is a martingale
with respect to the internal history of the driving Levy process (X
t
)
t0
.
Proof: Note that
M
2
t
M
2
s
= (M
t
M
s
)
2
+ 2M
s
(M
t
M
s
)
This gives
E(M
2
t
M
2
s
[T
s
) = E((M
t
M
s
)
2
[T
s
) + 2E(M
s
(M
t
M
s
)[T
s
) = t s
2
9.2. A TECHNICAL ISSUE: AUGMENTATION 81
9.9 Problem. Apply Theorem 9.8 to the Wiener process, to the Poisson process and
to compound Poisson processes.
t
) be a Wiener process. The process exp(aW
t
a
2
t/2) is a
martingale with respect to the internal history of the driving Wiener process (W
t
)
t0
.
Proof: Use e
aW
t
= e
a(W
t
W
s
)
e
aW
s
to obtain
E(e
aW
t
[T
s
) = E(e
a(W
t
W
s
)
) e
aW
s
= e
a
2
(ts)/2
e
aW
s
2
The process
c(W)
t
:= exp(W
t
t/2)
is called the exponential martingale of (W
t
)
t0
.
9.2 A technical issue: Augmentation
For technical reasons which will become clear later the internal history of Wiener
process sometimes is slightly too small. It is convenient to increase the -elds of the
internal history in a way that does not destroy the basic properties of the underlying
process. This procedure is called augmentation.
Roughly speaking, the augmentation process consists in two steps. First, the ltra-
tion is slightly enlarged such that events are added to T
t
which occur immediately
after t. This makes the ltration right-continuous. Secondly, all negligible sets of
T
are added. This is a purely technical act insuring that the equivalence classes
of stochastic processes (containing all negligible modications) contain sufciently
regular versions.
9.11 Denition. Let T
t+
:=
s>t
T
s
and dene
T
t
:= F T
: P(FG) = 0 for some G T

t+
Then (T
t
)
t0
is the augmented ltration.
9.12 Problem. Show that the augmented ltration is really a ltration.
9.13 Lemma. Let (T
t
)
t0
be a ltration. Then the augmented ltration is right-
continuous, i.e.
T
t
=
s>t
T
s
Proof: It is clear that holds. In order to prove let F
s>t
T
s
. We have to
show that F T
t
.
For every n N there is G
n
T
t+1/n
such that P(FG
n
) = 0. Dene
G :=
m=1
_
n=m
G
n
=
m=K
_
n=m
G
n
T
t+1/K
for all K N.
Then G T
t+
and P(GF) = 0. 2
One says that a ltration satises the usual conditions if it is right-continuous
and contains all negligible sets of T
. The internal history of the Wiener process does

not satisfy the usual conditions. However, every augmented ltration satises the usual
conditions.
It is important that the augmentation process does not destroy the basic structural
properties. We show this fact at hand of Wiener processes. A similar argment applies
to general Levy processes. It should be noted, however, that when we are dealing with
point processes only, augmentation is not necessary.
t
)
t0
be a Wiener process. Then the increments W
t
W
s
are
independent of T
W
s
, s 0.
Proof: (Outline) It is easy to see that
E(e
a(W
t
W
s
)
[T
W
s+
) = e
a
2
(ts)/2
From problem 7.16 it follows that W
t
W
s
is independent of T
W
s+
. It is clear that this
carries over to T
s
. 2
Thus, 9.14 shows that every Wiener process has independent increments with re-
spect to a ltration that satises the usual conditions. When we are dealing with a
Wiener process we may suppose that the underlying ltration satises the usual con-
ditions.
9.15 Problem. Show that the assertions of 9.6, 9.8 and 9.10 are valid for the aug-
mented internal history of the Wiener process.
Let us illustrate the convenience of ltrations satisfying the usual conditions by a
further result. For some results on stochastic integrals it will be an important point that
martingales are cadlag. A general martingale need not be cadlag. We will show that a
martingale has a cadlag modication if the ltration satises the usual conditions.
t
)
t0
be a martingale w.r.t. a ltration satisfying the usual
conditions. Then there is a cadlag modication of (X
t
)
t0
.
Proof: (Outline. Further reading: Karatzas-Shreve, [15], Chapter 1, Theorem
3.13.)
We begin with path properties which are readily at hand: There is a set A T
,
satisfying P(A) = 1 and such that the process (X
t
)
tQ
(restricted to a rational time
9.3. STOPPING TIMES 83
scale) has paths with right and left limits for every A. This is a consequence
of the basic convergence theorem for martingales. For further details see Karatzas-
Shreve, [15], Chapter 1, Proposition 3.14, (i).
It is now our goal to modify the martingale in such a way that it becomes cadlag.
The idea is to dene
X
+
t
:= lim
st,sQ
X
s
, t 0.
on A and X
+
t
:= 0 elsewhere. It is easy to see that the paths of (X
+
t
)
t0
are cadlag.
Since (T
t
)
t0
satises the usual conditions it follows that (X
+
t
)
t0
is adapted. We have
to show that (X
+
t
)
t0
is a modication of (X
t
)
t0
, i.e. X
t
= X
+
t
P-a.e. for all t 0.
Let s
n
t, (s
n
) Q. Then X
s
n
= E(X
s
1
[T
s
n
) is uniformly integrable which
implies X
s
n
L
1
X
+
t
. From X
t
= E(X
s
n
[T
t
) we obtain X
t
= E(X
+
t
[T
t
) = X
+
t
P-a.e.
2
As a result we can say: Under the usual conditions every martingale is cad-
lag.
9.3 Stopping times
Let (X
t
)
t0
be a cadlag adapted process such that X
0
= 0 and for some a > 0 let
= inft 0 : X
t
a
The random variable is called a rst passage time: It is the time when the process
hits the level a for the rst time. By right continuity of the paths we have
t max
st
X
s
a (5)
Thus, we have ( t) T
t
for all t 0.
9.17 Problem. Prove (5).
9.18 Denition. A random variable : [0, ] is called a stopping time if
( t) T
t
for all t 0.
Let be a stopping time. The intuitive meaning of ( t) T
t
is as follows: At
every time t it can be decided whether t or not.
9.19 Problem. Show that is a stopping time iff ( < t) T
t
for every t 0.
9.20 Problem. Let , and
n
be stopping times.
(a) Then , and + are stopping times.
(b) + for 0 and for 1 are stopping times.
(c) sup
n
n
, inf
n
n
are stopping times.
9.21 Problem. Show that every bounded stopping time is limit of a decreasing
sequence of bounded stopping times each of which has only nitely many values.
Hint: Let T = max . Dene
n
= k/2
n
whenever (k 1)/2
n
< k/2
n
, k =
0, . . . , T2
n
.
Let (M
t
)
t0
be a martingale. In the discrete time case we were able to describe
martingales in terms of expectations of the stopped process. This carries over to the
continuous case.
9.22 Theorem. Let (M
t
)
t0
be an integrable process with right-continuous paths.
The the following assertions are equaivalent:
(1) (M
t
)
t0
is a martingale.
(2) For all bounded stopping times M
is measurable and integrable and we have

E(M
) = E(M
0
).
Proof: (1) (2): Assume that T. Let
n
where
n
are bounded
stopping times with nitely many values. Then it follows from the discrete version of
the optional stopping theorem that E(M
n
) = E(M
0
). Clearly we have M
n
M
.
Since M
n
= E(M
T
[T
n
) it follows that the sequence (M
n
) is even mean-convergent.
(2) (1): For s < t and F T
s
dene
:=
_
s whenever F,
t whenever , F.
Then is a stopping time. From E(M
t
) = E(M
) the assertion follows. 2

This characterization of martingales has important and interesting consequences
for many applications. We indicate some these in the next section.
The interplay between stopping times and adapted processes is at the core of
stochastic analysis. In the rest of this section we try to provide some information for
reasons of later reference. Throughout the section we assume tacitely that the ltration
satises the usual conditions.
Hitting times
Let (X
t
)
t0
be a process adapted to a ltration (T
t
)
t0
and let A R. Dene
A
= inft : X
t
A
Then
A
is called the hitting time of the set A. First passage are hitting tmes of
intervals.
For which sets A are hitting times stopping times ?
9.23 Remark. The question, for which sets A a hitting time
A
is a stopping time, is
completely solved. The solution is as follows.
We may assume that P[T
is complete, i.e. that all subsets of negligible sets are

added to T
. The whole theory developed so far is not affected by such a completion.

We could assume from the beginning that our probability space is complete. The
reason why we did not mention this issue is simple: We did not need completeness so
far.
However, the most general solution of the hitting time problem needs complete-
ness. The following is true: If P[T
is complete and if the ltration satises the

usual conditions then the hitting time of every Borel set is a stopping time. For further
comments see Jacod-Shiryaev, [14], Chapter I, 1.27 ff.
For particular cases the stopping time property of hitting times is easy to prove.
9.24 Theorem. Assume that (X
t
)
t0
has right-continuous paths and is adapted to a
ltration which satises the usual conditions.
(a) Then
A
is a stopping time for every open set A.
(b) If (X
t
)
t0
has continuous paths then
A
is a stopping time for every closed set A.
Proof: (a) Note that
< t X
s
A for some s < t
Since A is open and (X
t
)
t0
has right-continuous paths it follows that
< t X
s
A for some s < t, s Q
(b) Let A be closed and let (A
n
) be open neighbourhoods of A such that

A
n
A.
Dene := lim
n
A
n

A
which exists since
A
n
. We will show that =
A
.
Since
A
n

A
we have
A
. By continuity of paths we have X
A
n
X
whenever < . Since X
A
n

A
n
it follows that X
A whenever < . This

implies
A
. 2
The optional stopping theorem
We need a notion of the past of a stopping time.
9.25 Problem. A stochastic interval is an interval whose boundaries are stopping
times.
(a) Show that the indicators of stochastic intervals are adapted processes.
Hint: Consider 1
(,)
and 1
[,)
.
(b) Let be a stopping time and let F . Show that the process 1
F
1
[0,)
is adapted
iff F ( t) T
t
for all t 0.
(c) Let T
:= F : F ( t) T
t
, t 0. Show that T
is a -eld.
9.26 Denition. Let be a stopping time. The -eld T
is called the past of .

The intuitive meaning of the past of a stopping time is as follows: An event F is in
the past of if at every time t the occurrence of F can be decided provided that t.
Many of the subsequent assertions can be understood intuitively if this interpretation
is kept in mind.
9.27 Problem.
Let and be stopping times.
(a) If then T
.
(b) T
= T
.
(c) The sets ( < ), ( ) and ( = ) are in T
.
Hint: Start with proving ( < ) T
and ( ) T
.
(d) Show that every stopping time is T
-measurable.
(e) Let
n
. Show that T
n=1
T
n
.
There is a fundamental rule for iterated conditional expectations with respect to
pasts of stopping times.
9.28 Theorem. Let Z be an integrable or nonnegative random variable and let and
be stopping times. Then
E(E(Z[T
)[T
) = E(Z[T
)
Proof: The proof is bit tedious and therefore many textbooks pose it as exercise
problem (see Karatzas-Shreve, [15], Chapter 1, 2.17). Let us give more detailed hints.
We have to start with showing that
F ( < ) T
and F ( ) T
whenever F T
Note that the nontrivial part is to show T
. The trick is to observe that on ( )

we have ( t) = ( t) ( t).
The second step is based on the rst step and consists in showing that
1
()
E(Z[T
) = 1
()
E(Z[T
) (6)
Finally, we prove the assertion separately on ( ) and ( ). For case 1
we apply (6) to the inner conditional expectation. For case 2 we apply (6) to the outer
conditional expectation (interchanging the roles of and ). 2
We arrive at the most important result on stopping times and martingales. Aprelim-
inary technical problem is whether an adapted process stopped at is T
-measurable.
Intuitively, this should be true.
9.29 Discussion. (Measurability of stopped processes)
Let (X
t
)
t0
be an adapted process and a stopping time. We ask whether X
1
(<)
is T
-measurable.
It is easy to prove the assertion for right-continuous processes with the help of
9.21. This would be sufcient for the optional stopping theorem below. However,
for stochastic integration we want to be sure that the assertion is also valid for left-
continuous processes. This can be shown in the following way.
Dene
X
n
t
:= n
_
t
0
X
s
e
n(st)
ds
Then (X
n
t
)
t0
are continuous adapted processes such that X
n
t
X
t
provided that
(X
t
)
t0
has left-continuous paths. Since the assertion is true for (X
n
t
) it carries over
to (X
t
).
9.30 Theorem. (Optional stopping theorem)
Let (M
t
)
t0
be a right continuous martingale. If is a bounded stopping time and
is any stopping time then
E(M
[T
) = M
Proof: The proof is based on the following auxiliary assertion: Let be a bounded
stopping time and let M
t
:= E(Z[T
t
) for some integrable random variable Z. Then
M
= E(Z[T
).
Let be a stopping time with nitely many values t
1
< t
2
< . . . < t
n
. Then
M
t
n
M
=
n
k=1
(M
t
k
M
t
k1
)1
(t
k1
)
(Prove it on ( = t
j1
)). It follows that E((M
t
n
M
)1
F
) = 0 for every F T
.
This proves the auxiliary assertion for stopping times with nitely many values. The
extension to arbitrary bounded stopping times is done by 9.21.
Let T = sup . The assertion of the theorem follows from
E(M
[T
) = E(E(M
T
[T
)[T
) = E(M
T
[T
) = M
.
2
We nish this section with two consequences of the optional stopping theorem
which are fundamental for stochastic integration.
9.31 Corollary. Let be any stopping time. If (M
t
)
t0
is a martingale then (M
t
)
t0
is a martingale, too.
9.33 Corollary. Let (M
t
)
t0
be a martingale. Let be stopping times and let Z
be T
-measurable and bounded. Then Z(M

t
M
t
) is a martingale, too.
Hint: Apply 9.22.
9.4 Application: First passage times of the Wiener pro-
cess
As an application of the optional stopping theorem we derive the distribution of rst
passage times of the Wiener process.
One-sided boundaries
t
)
t0
be a Wiener process and for a > 0 and b R dene
a,b
:= inft : W
t
a + bt
Then we have
E(e
a,b
1
(
a,b
<)
) = e
a(b+
b
2
+2)
, 0
Proof: Applying the optional stopping theorem to the exponential martingale of the
Wiener process we get
E(e
W
2
/2
) = 1
for every R and every bounded stopping time . Therefore this equation is true for
n
:=
a,b
n for every n N. We note that (use 8.28)
e
W
n
/2
P
e
W
a,b
a,b
/2
1
(
a,b
<)
Applying the dominated convergence theorem it follows (at least for sufciently large
) that
E(e
W
a,b
a,b
/2
1
(
a,b
<)
) = 1
The rest are easy computations. Since W
a,b
= a + b
a,b
we get
E(e
(b
2
/2)
a,b
1
(
a,b
<)
) = e
ab
Putting := b +
2
/2 proves the assertion. 2
9.37 Problem. In the following problems treat the cases b > 0, b = 0 and b < 0
separately.
(a) Find P(
a,b
< ).
(b) Find E(
a,b
).
9.38 Problem. (a) Does the assertion of the optional sampling theorem hold for the
martingale (W
t
)
t0
and
a,b
?
9.4. APPLICATION: FIRST PASSAGE TIMES OF THE WIENER PROCESS 89
(b) Does the assertion of the optional sampling theorem hold for the martingale W
2
t
t
and
a,b
?
9.39 Problem. (a) Show that P(
0,b
= 0) = 1 for every b > 0. (Consider E(e
a
n
,b
)
for a
n
0.) Give a verbal interpretation of this result.
(b) Show that P(max
t
W
t
= , min
t
W
t
= ) = 1.
(c) Conclude from (a) that almost all paths of (W
t
)
t0
innitely often cross every
horizontal line.
From 9.35 we obtain the distribution of the rst passage times.
9.40 Corollary. Let
a,b
be dened as in 9.35. Then
P(
a,b
t) = 1
_
a + bt
t
_
+ e
2ab
_
a + bt
t
_
, t 0
Proof: Let G(t) := P(
a,b
t) and let F
a,b
(t) denote the right hand side of the
asserted equation. We want to show that F
a,b
(t) = G(t), t 0. For this we will apply
the uniqueness of the Laplace transform. Note that 9.35 says that
_

0
e
t
dG(t) = e
a(b+
b
2
+2)
, t 0
Therefore, we have to show that
_

0
e
t
dF
a,b
(t) = e
a(b+
b
2
+2)
, t 0
This is done by the following simple calculations. First, it is shown that
F
a,b
(t) =
1
2
_
t
0
a
s
3/2
exp
_
a
2
2s

b
2
s
2
ab
_
ds
(This is done by calculating the derivatives of both sides.) Then it follows that
e
ab
_
t
0
e
s
dF
a,b
(s) = e
a
b
2
+2
F
a,
b
2
+2
(t)
Putting t = the assertion follows. 2
9.42 Problem. Find the distribution of max
st
W
s
.
Two-sided boundaries
The following problems are concerned with rst passage times for two horizontal
boundaries. Let c, d > 0 and dene
c,d
= inft : W
t
, (c, d)
9.43 Problem. (a) Show that
c,d
is a stopping time.
(b) Show that P(
c,d
< ) = 1.
For
c,d
the application of the optional sampling theorem is straightforward since
[W
t
[ maxc, d for t
c,d
.
9.44 Problem. Find the distribution of W
c,d
.
Hint: Note that E(W
c,d
) = 0 (why ?) and remember that W
c,d
has only two different
values.
Solution: P(W
c,d
= c) =
d
c+d
, P(W
c,d
= d) =
c
c+d
9.45 Problem. Find E(
c,d
).
Hint: Note that E(W
2
c,d
) = E(
c,d
).
Solution: E(
c,d
) = cd.
9.46 Discussion. Distribution of
c,d
The distribution of the stopping time
c,d
is a more complicated story. It is easy
to obtain the Laplace transforms. Obtaining probabilistic information requires much
more analytical efforts.
For reasons of symmetry we have
A :=
_
W
c,d
=c
e
c,d
/2
dP =
_
W
d,c
=c
e
d,c
/2
dP
and
B :=
_
W
c,d
=d
e
c,d
/2
dP =
_
W
d,c
=d
e
d,c
/2
dP
From
1 = E
_
e
W
c,d
c,d
/2
_
and 1 = E
_
e
W
d,c
d,c
/2
_
we obtain a system of equations for A and B leading to
A =
e
d
e
d
e
(c+d)
e
(c+d)
and B =
e
c
e
c
e
(c+d)
e
(c+d)
This implies
E(e
c,d
) =
e
c
2
+ e
d
2
1 + e
(c+d)
2
9.4. APPLICATION: FIRST PASSAGE TIMES OF THE WIENER PROCESS 91
Expanding this into an innite geometric series and applying
_

0
e
t
dF
a,0
(t) = e
a
2
, t 0
we could obtain an innite series expansion of the distribution of
c,d
.
(Further reading: Karatzas-Shreve [15], section 2.8.)
The reection principle
Let (W
t
)
t0
be a Wiener process and let (T
t
)
t0
be its internal history.
Let s > 0 and consider the process X
t
:= W
s+t
W
s
, t 0. Since the Wiener pro-
cess has independent increments the process (X
t
)
t0
is independent of T
s
. Moreover,
it is easy to see that (X
t
)
t0
is a Wiener process. Let us give an intuitive interpretation
of these facts.
Assume that we observe the Wiener process up to time s. Then we know the past
T
s
and the value W
s
at time s. What about the future ? How will the process behave
for t > s ? The future variation of the process after time s is give by (X
t
)
t0
. From the
remarks above it follows that the future variation is that of a Wiener process which is
independent of the past. The common formulation of this fact is: At every time s > 0
the Wiener process starts afresh.
9.47 Problem. Show that the process X
t
:= W
s+t
W
s
, t 0 is a Wiener process
for every s 0.
There is a simple consequence of the property of starting afresh at every time s.
Note that
W
t
=
_
W
t
whenever t s
W
s
+ (W
t
W
s
) whenever t > s
Dene the corresponding process reected at time s by
W
t
=
_
W
t
whenever t s
W
s
(W
t
W
s
) whenever t > s
Then it is clear that (W
t
)
t0
and (
W
t
)
t0
have the same distribution. This assertion
looks rather harmless and self-evident. However, it becomes a powerful tool when it
is extended to stopping times.
9.48 Theorem. (Reection principle)
Let be any stopping time and dene
W
t
=
_
W
t
whenever t
W
(W
t
W
) whenever t >
Then the distributions of (W
t
)
t0
and (
W
t
)
t0
are equal.
Proof: Let us show that the single random variables W
t
and
W
t
have equal distri-
butions. Equality of the nite dimensional marginal distributions is shown in a similar
manner.
We have to show that for any bounded continuous function f we have E(f(W
t
)) =
E(f(
W
t
)). For obvious reasons we need only show
_
<t
f(W
t
) dP =
_
<t
f(
W
t
) dP
which is equivalent to
_
<t
f(W
+ (W
t
W
)) dP =
_
<t
f(W
(W
t
W
)) dP
The last equation is easily shown for stopping times with nitely many values. The
common approximation argument then proves the assertion. 2
9.49 Problem. To get an idea of how the full proof of the reection principle works
show E(f(W
t
1
, W
t
2
)) = E(f(
W
t
1
,
W
t
2
)) for t
1
< t
2
and bounded continuous f.
Hint: Distinguish between < t
1
, t
1
< t
2
and t
2
.
The reection principle offers an easy way for obtaining information on rst pas-
sage times.
9.50 Theorem. Let M
t
:= max
st
W
s
. Then
P(M
t
y, W
t
< y x) = P(W
t
> y + x), t > 0, y > 0, x 0
Proof: Let := inft : W
t
y and := inft :
W
t
y. Then
P(M
t
y, W
t
< y x) = P( t, W
t
< y x)
= P( t,
W
t
< y x)
= P( t, W
t
> y + x)
= P(W
t
> y + x)
2
9.51 Problem. Use 9.50 to nd the distribution of M
t
.
9.52 Problem. Find P(W
t
< z, M
t
< y) when z < y, y > 0.
9.5. THE MARKOV PROPERTY 93
9.5 The Markov property
We explain and discuss the Markov property at hand of the Wiener process. Similar
assertions are valid for general Levy processes.
When we calculate conditional expectations given the past T
s
of a stochastic pro-
cess (X
t
)
t0
then from the general point of view conditional expectations E(X
t
[T
s
)
are T
s
-measurable, i.e. they depend on any X
u
, u s. But when we were dealing
with special conditional expectations given the past of a Wiener process then we have
got formulas of the type
E(W
t
[T
s
) = W
s
, E(W
2
t
[T
s
) = W
2
s
+ (t s), E(e
aW
t
[T
s
) = e
aW
s
+a
2
(ts)/2
These conditional expectations do not use the whole information available in T
s
but
only the value W
s
of the Wiener process at time s.
t
)
t0
be a Wiener process and (T
t
)
t0
its internal history.
Then for every P-integrable function Z which is (
us
T
u
)-measurable we have
E(Z[T
s
) = (W
s
)
where is some measurable function.
Proof: For the proof we only have to note that the system of functions
e
a
1
W
s+h
1
+a
2
W
s+h
2
++a
n
W
s+h
n
, h
i
0, n N,
is total in L
2
((
us
T
u
)). 2
9.54 Problem. Under the assumptions of 9.53 show that E(Z[T
s
) = E(Z[W
s
).
9.53 is the simplest and basic formulation of the Markov property. It is, however,
illuminating to discuss more sophisticated versions of the Markov property.
Let us calculate E(f(W
s+t
)[T
s
) where f is bounded and measurable. We have
E(f(W
s+t
)[T
s
) = E(f(W
s
+ (W
s+t
W
s
))[T
s
)
Since W
s
is T
s
-measurable and W
s+t
W
s
is independent of T
s
we have (see remark
INSERT)
E(f(W
s+t
)[T
s
) = W
s
where () = E(f( + (W
s+t
W
s
))) (7)
Roughly speaking, conditional expectations simply are expectations depending de-
pending on a parameter slot where the present value of the process has to be plugged
in.
t
)
t0
t
)
t0
its internal history.
Then the conditional distribution of (W
s+t
)
t0
given T
s
is the same as the distribution
of a process +
W
t
where = W
s
and (
W
t
)
t0
is any (other) Wiener process.
Proof: Extend (7) to functions of several variables. 2
9.55 contains that formulation which is known as the ordinary Markov property of
the Wiener process. It says that at every time point s the Wiener process starts afresh at
the state = W
s
as a new Wiener process forgetting everything what happened before
time s.
It is a remarkable fact with far reaching consequences that the Markov property
still holds if time s is replaced by a stopping time. The essential preliminary step is the
following.
9.56 Theorem. Let be any stopping time and dene Q(F) = P(F[ < ),
F T
. Then the process

X
t
:= W
+t
W
, t 0,
is a Wiener process under Q which is independent of T
.
Proof: (Outline) Let us show that
_
F
f(W
+t
W
) dP = P(F)E(f(W
t
))
when F ( < ), F T
and f is any bounded continuous function. But this is

certainly true for stopping times with nitely many values. The common approxima-
tion argument proves the equation. Noting that the equation holds for + s, s > 0,
replacing , proves the assertion. 2
9.58 Theorem. (Strong Markov property)
Let (W
t
)
t0
t
)
t0
its internal history. Let be any
stopping time. Then on ( < ) the conditional distribution of (W
+t
)
t0
given T
is the same as the distribution of a process +
W
t
where = W
and (
W
t
)
t0
is some
(other) Wiener process.
Further reading: Karatzas-Shreve [15], sections 2.5 and 2.6.
Part III
Stochastic calculus
95
Chapter 10
The stochastic integral
10.1 Integrals along stochastic paths
Let (X
t
)
t0
be any cadlag (right-continuous with left limits) adapted process.
10.1 Denition. The process (X
t
)
t0
is a process of nite variation on [0, T] if
P-almost all paths are of bounded variation on [0, T].
Every process with increasing paths is a nite variation process. Therefore the
Poisson process is a nite variation process. The Wiener process is not a nite variation
process.
10.2 Problem. Show that every compound Poisson process is a nite variation
process.
If (X
t
)
t0
is a nite variation process then we may use the paths of the process for
dening Stieltjes integrals.
t
)
t0
be a process of nite variation on [0, T] and (H
t
)
t0
be
a caglad (left continuous with right limits) adapted process. Then the random variable
_
T
0
H dX :
_
T
0
H
s
() dX
s
()
is called the stochastic integral of (H
t
)
t0
with respect to (X
t
)
t0
.
For the stochastic integral we have the following basic approximation result which
follows from 3.36.
t
)
t0
be a process of nite variation on [0, T] and (H
t
)
t0
be a
caglad (left continuous with right limits) adapted process. Then for every Riemannian
sequence of subdivisions of [0, T]
k
n
i=1
H
s
i1
(X
s
i
X
s
i1
)
P
_
T
0
H
s
dX
s
97
98 CHAPTER 10. THE STOCHASTIC INTEGRAL
Denition 10.3 only works for stochastic processes whose paths are of bounded
variation. It is important to extend the stochastic integral to a larger class of processes
which contains e.g. the Wiener process, too. Since most properties of the Stieltjes
integral are consequences of approximations by Riemannian sequences the extension
should be such that Theorem 10.4 remains true.
The following sections are devoted to this extension of the stochastic integral.
Let us make some historical remarks. The rst extension of the stochastic integral
to processes which are not of nite variation was concerned with the Wiener process.
For non-stochastic integrands this integral was constructed by Norbert Wiener already
in the rst half of the 20th century. The starting point of the general concept was
the integral of K. Ito about the middle of the 20th century. A key concept was the
restriction of the integrands to non-anticipating adapted functions and the application
of the at that time new martingale concept by Doob. In the following decades a gen-
eral theory of stochastic integration has been established under the leadership of the
French school of probability theory. The most prominent member of that process was
P. A. Meyer. The theory culminated in the notion of so-called semimartingales being
the most general class of processes that can be used for dening a stochastic integral.
Originally, semimartingales were dened as processes which can be decomposed
into a sum of a martingale and a nite variation process. This denition reects the fact
that for both types of processes a stochastic integral can be dened either as a Stieltjes
integral or by the martingale construction due to Ito. Nevertheless, it seemed neces-
sary to unify both the semimartingale concept and the construction of the stochastic
integral. Based on ideas of P. A. Meyer such a new approach has been presented by
Protter, [19] and [20].
Our presentation follows the outline by Protter.
10.2 The integral of simple processes
In this section we collect some basic notation and facts which are easy to obtain.
Let 0 t
1
< t
2
T. The most simple caglad process is of the form
H
s
() = a()1
(t
1
,t
2
]
(s) =
_
_
_
0 whenever s t
1
a() whenever t
1
< s t
2
0 whenever t
2
< s
In order to be adapted the process must satisfy H
s
T
s
for all s 0. This only
matters for t
1
< s t
2
where a should be T
s
-measurable for all s > t
1
. Imposing the
usual conditions this means that a must be T
t
1
-measurable.
For such a simple process we dene
_
T
0
H
s
dX
s
=
_
T
0
H dX := a (X
t
2
X
t
1
)
10.2. THE INTEGRAL OF SIMPLE PROCESSES 99
This denition is identical to the denition of the Stieltjes-integral for each , i.e. for
each path of the underlying processes separately. Note, that the stochastic integral is a
random variable, i.e. it depends on .
If 0 t T we dene
_
t
0
H dX =
_
T
0
1
(0,t]
H dX
Since it is easy to see that
1
(0,t]
1
(t
1
,t
2
]
= 1
(t
1
t,t
2
t]
we have
_
t
0
H dX = a (X
t
2
t
X
t
1
t
)
Now the stochastic integral can be considered as a function of both and t, i.e. as
stochastic process. Since X is assumed to be cadlag and adapted the stochastic integral
is cadlag and adapted, too.
The next step is to consider sums.
10.5 Denition. Let c
0
be the set of simple processes dened on a deterministic
subdivision, i.e. processes of the form
H
t
() =
n
j=0
a
j1
()1
(s
j1
,s
j
]
(t)
where 0 = s
0
< s
1
< . . . < s
n
= T is a subdivision and a
j
is T
s
j
-measurable for
every j.
It is clear that every process (H
t
)
t0
c
0
is caglad and adapted. Again we may
dene the integral pathwise by
_
t
0
H dX :=
n
j=1
a
j1
(X
t
j
t
X
t
j1t
)
Next let us adopt a more general view. For applications to nancial markets we
have to admit that subdivisions are based on random times rather than on determinstic
interval limits. This leads us to the set of simple processes.
10.6 Denition. Let c of simple processes, i.e. processes of the form
H
t
() =
n
j=0
a
j1
()1
(
j1
,
j
]
(t) (8)
where 0 =
0
<
1
< . . . <
n
= T is a subdivision of stopping times and a
j
is
T
j
-measurable for every j.
Again it is obvious that the paths are caglad and from 9.25(b) we know that the
processes in c are adapted.
For functions in c we may dene the integral again pathwise, i.e. separately for
each as a Stieltjes integral. This leads to the following denition:
_
t
0
H dX :=
_
T
0
1
(0,t]
H dX =
n
j=1
a
j1
(X
j
t
X
j1
t
)
if H is dened by (8). Since for each single path this is an ordinary Stieltjes integral
we have immediately the properties:
_
t
0
(H
1
+ H
2
) dX =
_
t
0
H
1
dX +
_
t
0
H
2
dX (9)
_
t
0
Hd(X
1
+ X
2
) =
_
t
0
H dX
1
+
_
t
0
H dX
2
(10)
For notational convenience we denote the process dened by the stochastic integral
by H X : t
_
t
0
H dX. The preceding discussion shows that H X is cadlag
and adapted if X is cadlag and adapted. Moreover, if X has continuous paths then
H X has continuous paths, too. A third property is stated as a theorem in view
of its fundamental importance. Note that the boundedness assumption is required for
integrability. The martingale aspect of stochastic integration is pursued in chapter 14.
10.7 Theorem. Let (M
t
)
t0
be a martingale and let H c be bounded. Then HM
is a martingale.
Proof: Apply 9.33. 2
10.3 Semimartingales
Our next step is to extend the stochastic integral from simple processes H c to more
general integrands.
10.8 Denition. Let L
0
be the set of all adapted processes with caglad paths.
We want to construct the stochastic integral in such a way that for every process
H L
0
the integral is well-dened.
Let us talk about the difculties arising with a naive approach to this problem.
For every process H L
0
we have
lim
n
k
n
i=1
H
t
i1
()1
(t
i1
,t
i
]
(s) H
s
(), s [0, T], , (11)
10.3. SEMIMARTINGALES 101
for every Riemannian subdivision of [0, T]. Therefore it tempting to try to dene the
stochastic integral as
_
T
0
H dX := lim
n
k
n
i=1
H
t
i1
(X
t
i
X
t
i1
) (12)
But such a denition only works if the limits on the right hand side exist and are
independent of the underlying sequence of Riemannian subdivisions.
We know that this is the case if the process (X
t
)
t0
is an (adapted cadlag) nite
variation process. However, as counterexamples show, this is not the case for com-
pletely general adapted cadlag processes. Therefore the question arises how to restrict
the class of processes (X
t
)
t0
?
The answer to this question is the notion of a semimartingale.
10.9 Denition. A cadlag adapted process (X
t
)
t0
is a semimartingale if for every
sequence (H
n
) of simple processes the following condition holds:
sup
sT
[H
n
s
[ 0 sup
tT
_
t
0
H
n
s
dZ
s
P
0
The set of all semimartingales is denoted by o.
Thus, semimartingales are dened by a continuity property: If a sequence of
simple processes converges to zero uniformly on compact time intervals then the
stochastic integrals of that sequence converge to zero, too.
As amatter of fact for deterministic processes (not depending on ) this con-
tinuity property is equivalent to bounded variation. However, for a stochastic process
the continuity property does not imply that (Z
t
) is a nite variation process.
It will turn out that a reasonable extension process of the stochastic integral can be
carried out for integrator processes which are semimartingales. It is therefore impor-
tant to get an overview over typical processes that are semimartingales. From (??) it
follows that concept of semimartingales covers adapted cadlag processes with paths of
bounded variation. The following result opens the door to stochastic processes like the
Wiener process.
10.10 Theorem. Every square integrable cadlag martingale (M
t
)
t0
is a semimartin-
gale.
Proof: Let (H
n
) be a sequence in c such that [[H
n
[[
u
0. Since
_
t
0
H
n
dM is a
martingale we have by the maximal inequality
P
_
sup
st
_
s
0
H
n
dM
> a
_
1
a
2
E
__
_
t
0
H
n
dM
_
2
_
For convenience let M
j
:= M
n
j
t
. We have
E
__
_
t
0
H
n
dM
_
2
_
= E
__
n
j=1
a
j1
(M
j
M
j1
)
_
2
_
= E
_
n
j=1
a
2
j1
(M
j
M
j1
)
2
_
[[H
n
[[
2
u
E
_
n
j=1
(M
j
M
j1
)
2
_
= [[H
n
[[
2
u
E
_
n
j=1
(M
2
j
M
2
j1
)
_
[[H
n
[[
2
u
E(M
2
t
)
2
It follows that the Wiener process is a semimartingale.
The set of semimartingales is a very convenient set to work with.
10.11 Theorem.
(a) The set of semimartingales is a vector space.
(b) If X o then for every stopping time the stopped process X
:= (X
t
)
t0
is a semimartingale.
(c) Let
n
be a sequence of stopping times such that X
n
o for every n N.
Then X o.
10.12 Problem. Prove Theorem 10.11.
Hint for part (c): Note that (X
t
,= X
n
t
) (
n
< t).
10.13 Problem. Show that (W
2
t
)
t0
is a semimartingale.
10.14 Problem. Show that every cadlag martingale (M
t
)
t0
with continuous paths is
a semimartingale.
Hint: Let
n
= inft : [M
t
[ n and show that M
n
is a square integrable martingale
for very n N.
Summing up, we have shown that every cadlag process which is a sum of a con-
tinuous martingale and an adapted process with paths of bounded variation is a semi-
martingale.
Actually every cadlag martingale is a semimartingale. See Jacod-Shiryaev, [14],
Chapter I, 4.17.
10.4. EXTENDING THE STOCHASTIC INTEGRAL 103
10.4 Extending the stochastic integral
The extension of the stochastic integral from c to L
0
is based on the fact that every
process in L
0
can be approximated by processes in c converging uniformly on compact
time intervals.
In short, the procedure is as follows. Let X be a semimartingale and let H L
0
.
Consider some sequence (H
n
) in c such that H
n
H uniformly on compact time
intervals and dene
_
T
0
H dX := lim
n
_
T
0
H
n
dX (13)
However, in order to make sure that such a denition makes sense one has to consider
several mathematical issues. For the interested reader let us collect some of the details.
10.15 Discussion. The main points of denition (13) are existence and uniqueness
of the limit. Let X o and H L
0
. We follow Protter, [20], chapter II, section 4.
(1) One can always nd a sequence (H
n
) c such that
sup
sT
[H
n
s
H
s
[
P
0
Note, that such an approximation is in general not available with deterministic Rieman-
nian sequences of subdivisions. However, it is available with Riemannian sequences
of subdivisions based on stopping times. Moreover, the construction requires right
continuous ltrations. Therefore, in general, we require augmented ltrations.
(2) Semimartingales satisfy
(H
n
) c, sup
sT
[H
n
s
[
P
0 sup
tT
_
t
0
H
n
dX
P
0.
(This is slightly stronger than the dening property of semimartingales.)
(3) From (2) it follows that for every sequence (H
n
) c satisfying (1) the corre-
sponding sequence of stochastic integrals
_
T
0
H
n
dX is a Cauchy sequence with respect
to convergence in probability, uniformly on [0, T]. Therefore there exists a process Y
such that
sup
tT
_
t
0
H
n
dX Y
t
P
0.
(4) From (2) it follows that the limiting process Y does not depend on the sequence
(H
n
).
(5) The type of convergence which is used for the extension procedure implies that
the processes H X are cadlag and adapted.
The preceding discussion shows that there is a well-dened stochastic integral pro-
cess (HX)
t
=
_
T
0
H dX whenever H L
0
and X o. This process is adapted and
cadlag. The question of continuity is answered by the following exercise.
10.16 Problem.
(1) Show that (H X)
t
= H
t
X
t
.
Hint: Explain that the assertion is true for H c.
(2) Let H L
0
and X o. If X is continuous then H X is continuous, too.
For deriving (understanding) the basic properties or rules of this stochastic integral
we need not go through all details of the mathematical construction. The reason is that
at the end of the mathematical construction it is proved that stochastic integrals can
be approximated by arbitrary Riemannian sequences, even with determinstic interval
limits. The underlying processes H L
0
do not (necessarily) converge uniformly on
compact time intervals but only pointwise, but as long as we are dealing with semi-
martingales we have convergence of the integrals.
10.17 Theorem. Let X be a semimartingale and H L
0
. Assume that 0 = t
n
0
<
t
n
1
< . . . < t
n
k
n
= t is any Riemannian sequence of subdivisions of [0, t]. Then
k
n
j=1
H
t
j1
(X
t
j
X
t
j1
)
_
t
0
H dX
P
0
Proof: Protter, [20], Chapter II, Theorem 21. 2
Let us apply 10.17 for the rst evaluation of a stochastic integral.
t
)
t0
be a Wiener process. Then
_
t
0
W
s
dW
s
=
1
2
(W
2
t
t) (14)
Proof: Let 0 = t
0
t
1
. . . t
n
= t be an interval partition such that
max [t
j
t
j1
[ 0 as n . This implies by 10.17 that
n
j=1
W
t
j1
(W
t
j
W
t
j1
)
P
_
t
0
W
s
dW
s
On the other hand we have
W
2
t
=
n
j=1
(W
2
t
j
W
2
t
j1
) =
=
n
j=1
(W
t
j
W
t
j1
)
2
+ 2
n
j=1
W
t
j1
(W
t
j
W
t
j1
)
We know that
n
j=1
(W
t
j
W
t
j1
)
2
P
t
10.5. THE WIENER INTEGRAL 105
This proves the assertion. 2
It is clear that the linearity properties (9) remain valid for the stochastic integral
with H L
0
. Dene
_
t
s
H dX :=
_
t
0
1
(s,)
H dW
It is clear that the concatenation property
_
t
s
H dX =
_
u
s
H dX +
_
t
u
H dX
holds if s < u < t.
An important extension of ordinary linearity is homogeneity with respect to ran-
dom factors.
10.19 Problem. Let H L
0
and X o. Show that
_
t
s
ZH dX = Z
_
t
s
H dX whenever Z T
s
.
Hint: Consider H c
0
and represent 1
(s,t]
H by a subdivision of (s, t].
10.5 The Wiener integral
Any stochastic integral with respect to the Wiener process is called an Ito-integral.
The special case when the integrand is not random it is called a Wiener integral.
Let (W
t
)
t0
be a Wiener process.
10.20 Problem. Let f =
n
j=1
a
j
1
(t
j1
,t
j
]
.
(a) Show that the Wiener integral
_
t
0
f dW has a normal distribution with mean 0 and
variance
_
t
0
f
2
(s)ds.
(b) Show that f W is a continuous process with independent increments.
In order to extend these properties to arbitrary non-random f L
0
we need the
following lemma.
10.21 Lemma. Let f L
0
be dened on [0, t] and non-random. Let (f
n
) be a se-
quence of functions of the form
n
j=1
f(t
j1
)1
(t
j1
,t
j
]
based on a Riemannian sequence
of subdivisions. Then
_
t
0
(f
n
(s) f(s))
2
ds 0 and V
_
_
t
0
f
n
dW
_
t
0
f dW
_
0
Proof: From left-continuity of f it follows that f
n
f pointwise. Since f is
bounded on [0, t] the rst convergence follows from Lebesgues dominated conver-
gence theorem.
Thus, (f
n
) is a Cauchy sequence in L
2
([0, t]). Since
V
_
_
t
0
f
n
dW
_
t
0
f
m
dW
_
=
_
t
0
(f
n
(s) f
m
(s))
2
ds
we obtain that
_
t
0
f
n
dW is a Cauchy sequence in L
2
(P) and therefore has a limit Z with
respect to L
2
-convergence. However, we know that
_
t
0
f
n
dW has a limit in probability
which equals
_
t
0
f dW. Since (by Cebysevs inequality) the L
2
-limit is also the limit
in probability the second convergence follows. 2
10.22 Theorem. Let f L
0
be non-random. Then the process f W has the
following properties:
(1)
_
t
0
f dW has a normal distribution with mean 0 and variance
_
t
0
f(s)
2
ds.
(2) The process has independent increments.
(3) The process has continuous paths.
Proof: Property (3) is a consequence of general assertions on stochastic integrals.
Properties (1) and (2) carry over from step functions since the Fourier transforms of
the joint distributions of increments converge by 10.21. 2
0
be non-random and let X
t
=
_
t
0
f dW. Show that
(X
t
)
t0
is a square integrable martingale.
0
t
=
_
t
0
f dW. Show that
_
X
2
t

_
t
0
f(s)
2
ds
_
t0
10.25 Problem.
Let f L
0
t
=
_
t
0
f dW. Show that
_
e
X
t
t
0
f(s)
2
ds/2
_
t0
Chapter 11
Calculus for the stochastic integral
There are three fundamental rules for calculations with the stochastic integral which
correspond to the three rules considered for Stieltjes integration:
(1) the associativity rule,
(2) the integration-by-parts formula,
(3) the chain rule (Itos formula)
11.1 The associativity rule
The associativity rule can be formulated briey as follows.
Let H, G L
0
and X o. Then
H (G X) = (HG) X, in short: d(G X) = GdX (15)
To state it a bit more explicitly:
_
t
0
H d(G X) =
_
t
0
HGdX, H L
0
.
And to say it in words: A stochastic integral whose integrator is itself a stochastic
integral G X can be written as a stochastic integral with integrator X by multiplying
the integrand by G.
11.1 Theorem.
(1) Let X o and G L
0
. Then G X is in o.
(2) Let H L
0
. Then
_
t
0
H d(G X) =
_
t
0
HGdX.
Proof: It is easy to see that for H
n
c
0
we have
_
t
0
H
n
d(G X) =
_
t
0
H
n
GdX
If H
n
0 in an appropriate sense this implies the semimartingale property of G X.
If H
n
H in an appropriate sense the asserted equation follows. 2
107
108 CHAPTER 11. CALCULUS FOR THE STOCHASTIC INTEGRAL
11.2 Problem. Show the associativity rule for simple processes in c
0
.
There is an important consequence of rule (15) which should be isolated.
11.3 Theorem. (Truncation rule)
Let H L
0
and X o. Then for any stopping time
_
t
0
1
(0,]
H dX =
_
t
0
H dX =
_
t
0
H dX
Note, that the second expression means (H X)

t
. Let us prove the truncation
rule step by step.
11.4 Problem. Let X o, H L
0
and let be a stopping time.
(a) Show that
1
(0,]
X = X
Hint: Apply the denition of the stochastic integral.

(b) Show that
_
T
0
1
(0,]
H dX =
_
T
0
H dX
Hint: Apply 11.1 and part (a) with X replaced by H X.
(c) Show that
_
T
0
1
(0,]
H dX =
_
T
0
H dX
Hint: Apply 11.1 and part (a) with X replaced by 1

(0,]
X.
The next exercise is a typical example of a non-trivial application of the truncation
lemma.
11.5 Problem. Let X and Y be continuous semimartingales and a stopping time.
Show that
_
t
0
X
dY =
_
t
0
X dY + X
t
(Y
t
Y
t
)
Hint: Split 1
(0,t]
= 1
(0,t]
+ 1
(t,t]
. Show that for s (0, t] we have X
s
= X
s
.
Show that for s ( t, t] we have X
s
= X
t
.
11.2 Quadratic variation and the integration-by-parts
formula
Recall the deterministic integration-by-parts formula for cadlag BV-functions:
f(t)g(t) f(0)g(0) =
_
t
0
f
dg +
_
t
0
g
df +
0<st
f(s)g(s)
11.2. QUADRATICVARIATIONANDTHE INTEGRATION-BY-PARTS FORMULA109
There is a similar formula for arbitrary semimartingales. Note that a non-continuous
semimartingale (X
t
) can only be used as an integrand of our stochastic integral if it is
replaced by its left-continuous version X
:= (X
t
)
t0
.
11.6 Denition. Let X and Y be semimartingales. Dene
[X, Y ]
t
:= X
t
Y
t
X
0
Y
0
_
t
0
X
dY
_
t
0
Y
dX, t 0.
This process is called the quadratic covariation of X and Y .
It is clear that [X, Y ] is well-dened and is a cadlag adapted process. If we knew
that [X, Y ] is even a semimartingale then by the associativity rule we could write as
_
t
0
H d(XY ) =
_
t
0
HX
dY +
_
t
0
HY
dX +
_
t
0
H d[X, Y ], H L
0
or in short
d(XY ) = X
dY + Y
dX + d[X, Y ]
However, this only makes sense if [X, Y ] is actually a semimartingale. So let us have
a closer look onto [X, Y ].
11.7 Problem. Show that [X, Y ] is linear in both arguments.
11.8 Problem. Show that [X, X]
t
= (X)
2
t
.
11.9 Theorem. Let X and Y be semimartingales. For every Riemannian sequence
of subdivisions of [0, t]
n
j=1
(X
t
j
X
t
j1
)(Y
t
j
Y
t
j1
)
P
[X, Y ]
t
, t 0.
Proof: Note that for s < t
X
t
Y
t
X
s
Y
s
= (X
t
X
s
)(Y
t
Y
s
) + X
s
(Y
t
Y
s
) + Y
s
(X
t
X
s
)
and apply it to
X
t
Y
t
= X
0
Y
0
+
n
j=1
(X
t
j
Y
t
j
X
t
j1
Y
t
j1
)
2
From 11.9 it follows that [X, X] =: [X] is the quadratic variation of X. This is an
increasing process, hence a FV-process and a semimartingale. Moreover, since
[X, Y ] =
1
2
([X + Y ] [X Y ])
also the quadratic covariation is a FV-process and a semimartingale.
11.11 Problem. If X and Y are semimartingales then XY is a semimartingale, too.
11.12 Problem. Let X be a FV-process. Show that [X]
t
=
0<st
(X)
2
t
.
11.13 Problem. Let X be a continuous FV-process and Y, Z any semimartingales.
Show that:
(a) [X] = 0,
(b) [X, Y ] = 0,
(c) [X + Y, Z] = [Y, Z].
Hint: For proving (b) apply the Cauchy-Schwarz inequality.
Next we ask for the quadratic variation process of a stochastic integral. The basic
result on this topic is prepared by a preliminary assertion. The intuitive meaning of
this formula is clear: If the process X is stopped at then it is constant after and
therefore after there is no further contribution to the quadratic variation.
11.14 Problem. Let X and Y be semimartingales and a stopping time. Show that
[X
, Y ] = [X, Y ]
.
Hint: This can be shown by combining the truncation lemma with the denition of
covariation. Apply 11.5.
11.15 Theorem. Let X and Y be semimartingales and H L
0
. Then
[H X, Y ] = H [X, Y ]
Proof: For details see Protter, [20], chapter II, Theorem 29. The idea is as follows.
From 11.14 we get for stopping times
[X
, Y ] = [X, Y ]
[X, Y ]
In other words the assertion is true for H = 1

(,]
. Let Z be a random variable which is
T
measurable. Then using the explicit formula of 11.9 one can see that the assertion
is even true for H = Z1
(,]
and thus for all H c.
In order to apply the common induction argment for passing to L
0
we need some
information on the behaviour of [X, Y ] under convergence of semimartingales. This is
obtained from 11.6 and 10.17. 2
11.2. QUADRATICVARIATIONANDTHE INTEGRATION-BY-PARTS FORMULA111
The most important consequence of 11.15 is the quadratic variation of a stochastic
integral:
[H X]
t
=
_
t
0
H
2
d[X]
This means that the quadratic variation of a stochastic integral
_
t
0
H dX can be written
as a Stieltjes integral with respect to the quadratic variation of X.
11.16 Problem. Let X o. Use integration by parts for nding a formula for dX
3
.
11.17 Problem. Let X o be a continuous semimartingale. Show by induction that
for k 2
dX
k
= kX
k1
dX +
k(k 1)
2
X
k2
d[X]
11.18 Problem. Extend the preceding problem to arbitrary semimartingales.
t
) be a Wiener process and H L
0
. Calculate [H W].
t
) be a Wiener process and H L
0
. Show that H W is a
FV-process iff H 0.
11.21 Denition. Let (W
t
)
t0
be a Wiener process and let a and b processes in L
0
.
Then a process of the form
X
t
= x
0
+
_
t
0
a
s
ds +
_
t
0
b
s
dW
s
is called an Ito-process.
Ito-processes are an important class of processes for many kinds of applications.
11.22 Problem. Let
X
t
= x
0
+
_
t
0
a
s
ds +
_
t
0
b
s
dW
s
be an Ito-process.
(a) Explain why X is a continuous semimartingale.
(b) Show that X determines uniquely the processes a and b.
(c) Calculate the quadratic variation of X.
(d) Evaluate
_
t
0
H dX for H L
0
.
11.3 Itos formula
Now we turn to the most important and most powerful rule of stochastic analysis.
It is the extension of the transforamtion formula for Stieltjes integrals to stochastic
integration.
11.23 Theorem. Itos formula
Let X o and let : R R be twice differentiable with continuous derivatives.
(1) If X is a continuous semimartingale then
(X
t
) = (X
0
) +
_
t
0
(X
s
) dX
s
+
1
2
_
t
0
(X
s
) d[X]
s
(2) If X is any semimartingale then
(X
t
) = (X
0
) +
_
t
0
(X
s
) dX
s
+
1
2
_
t
0
(X
s
) d[X]
s
+
0<st
_
(X
s
) (X
s
)
(X
s
)X
s
_
Note, that for an FV-process (X
t
) Itos formula equals the transformation formula
for Stieltjes integrals.
Our main applications will be concerned with continuous semimartingales. In this
case Itos formula implies for H L
0
_
t
0
H d( X) =
_
t
0
H
(X) dX +
1
2
_
t
0
H
(X) d[X]
Thus, in differential notation Itos formula for continuous semimartingales can be writ-
ten as
d( X) =
(X) dX +
1
2
(X) d[X]
Proof: Let us indicate the proof for continuous semimartingales. With integration
by parts it can be shown that for k 2
dX
k
= kX
k1
dX +
k(k 1)
2
X
k2
d[X]
(use an induction argument). This formula is identical to Itos formula for (x) = x
k
.
Thus, Itos formula is true for powers and hence also for polynomials. Since smooth
functions can be approximated uniformly by polynomials in such a way that also the
corresponding derivatives are approximated the assertion follows. 2
11.24 Problem. Prove part (2) of Itos formula for (x) = x
k
, k 2.
11.3. ITOS FORMULA 113
t
)
t0
be a Wiener process. Calculate dW
k
t
, k N.
t
)
t0
be a Wiener process. Calculate de
W
t
, R.
11.27 Problem. Let X be an Ito-process. Calculate dX
k
t
, k N.
11.28 Problem. Let X be an Ito-process. Calculate de
X
t
, R.
11.29 Denition. Let X o be continuous. Then
c(X) = e
X[X]/2
is called the stochastic exponential of X.
11.30 Problem. Let X o be continuous and Y := c(X). Show that
Y
t
= Y
0
+
_
t
0
Y
s
dX
s
, in short: dY = Y dX
Hint: Let Z = X [X]/2 and expand e
Z
by Itos formula.
There is a subtle point to discuss. Consider some positive continuous semimartin-
gale X and a function like (x) = log(x) or (x) = 1/x. Then we may consider (X)
since it is well-dened and real-valued. But Itos formula cannot be applied in that ver-
sion we have proved it. The reason for this difculty is due to the fact that the range
of X is not contained in a compact interval where can be approached uniformly by
polynomials.
11.31 Problem. Let X be a positive continuous semimartingale.
(a) Show that Itos formula holds for (x) = log(x) and for (x) = 1/x.
Hint: Let
n
= mint 0 : X
t
1/n. Apply Itos formula to X
n
and let n .
(b) Show that (X) is a semimartingale.
11.32 Problem. Let X be a continuous positive semimartingale. Find
_
t
0
1/X
k
s
dX
s
,
k N.
Hint: Find dX
1k
.
11.33 Problem. Show that every positive continuous semimartingale X can be
written as a stochastic exponential c(L).
Hint: Expand log X
t
by Itos formula.
11.34 Theorem. (Itos formula, 2-dimensional case)
Let X, Y o be continuous and let : R
2
R be twice differentiable with
continuous derivatives. Then
(X
t
, Y
t
) = (X
0
, Y
0
)
+
_
t
0
1
(X
s
, Y
s
) dX
s
+ +
_
t
0
2
(X
s
, Y
s
) dY
s
+
1
2
_
t
0
11
(X
s
, Y
s
) d[X]
s
+
_
t
0
12
(X
s
, Y
s
) d[X, Y ]
s
+
1
2
_
t
0
22
(X
s
, Y
s
) d[Y ]
s
Proof: The assertion is true for polynomials (use integration by parts). Since
smooth functions can be approximated uniformly by polynomials in such a way that
also the corresponding derivatives are approximated the assertion follows. 2
11.35 Problem. State and explain Itos formula for (x, t).
Hint: Apply 11.34 to Y
t
= t.
Chapter 12
Applications to nancial markets
12.1 Financial markets
A mathematical model of a nancial market is collection of semimartingales. As we
shall see the restriction to stochastic processes which are semimartingales is necessary
to describe the process of trading in terms of stochastic integrals.
In the following we consider market models with a nite time horizon T < .
The most famous example of a nancial market is the Black-Scholes model. The
Black-Scholes model consists of two assets (B
t
, S
t
)
t[0,T]
. The rst asset (B
t
) is a
bank account with a xed interest rate r, i.e.
B
t
= B
0
e
rt
dB
t
= rB
t
dt (16)
Denoting the rendite process by R
t
:= rt the bank account follows the differential
equation
dB
t
= B
t
dR
t
Stochastic models for nancial assets are often based on a stochastic model for the
rendite process (R
t
). In the Black-Scholes model the second asset is also dened in
terms of the rendite process.
Assume that R
t
= t +W
t
where (W
t
) is a Wiener process. If this (generalized
Wiener process) is a model of the rendite of the asset (S
t
) then it follows that
dS
t
= S
t
dR
t
= S
t
dt + S
t
dW
t
(17)
This is a stochastic differential equation. The number > 0 is called the volatility of
the asset.
12.1 Problem. Show that S
t
= S
0
e
(
2
/2)dt+W
t
is a solution of (17).
12.2 Denition. A Black-Scholes model is a market model which is generated by
two assets (B
t
, S
t
) following equations (16) and (17).
115
116 CHAPTER 12. APPLICATIONS TO FINANCIAL MARKETS
An obvious generalization of the Black-Scholes model is a diffusion model. In
such a model the rendite process of the asset (S
t
) follows a diffusion equation
dR
t
= (t, R
t
) dt + (t, R
t
) dW
t
This is a stochastic differential equation. In the next chapter we will consider and solve
simple cases of such stochastic differential equations.
12.2 Trading strategies
Let S = (S
0
, S
1
) be a nancial market. A trading strategy is a process which deter-
mines for every time point t how many units of each asset are held at time t.
To begin with let us study simple trading strategies, i.e. trading strategies where
trading is performed only at nitely many stopping times. During subsequent trading
times the trading strategies are constant:
H
k
t
=
n
j=1
a
k
j1
1
(
j1
,
j
]
, k = 0, 1
where 0 =
0

1
. . .
n
= T are stopping times and a
k
j1
is T
j1
-measurable.
Thus, the processes H
k
t
are adapted and left-continuous processes. Left-continuity is
essential because trading strategies must be predictable.
The market value of the portfolio (H
k
t
)
k
at time t is given by
V
t
= H
0
t
S
0
t
+ H
1
t
S
1
t
which means
V
j
= a
0
j1
S
0
j
+ a
1
j1
S
1
j
and V
j
+0
= a
0
j
S
0
j
+ a
1
j
S
1
j
for every trading time
j
. The process (V
t
) is the wealth process corresponding to the
trading strategy.
A fundamental concept of mathematical nance is the notion of a self-nancing
trading strategy. The trading strategy is called self-nancing if the changes in the
portfolio at time
j
are nanced by nothing else than the value of the portfolio. In order
to understand the implications of such a condition let us have a look at the change of
the market value during a particular time interval(
j
,
j+1
]. We have
V
j+1
V
j
= (V
j
+0
V
j
) + (V
j+1
V
j
+0
) (18)
=
1
k=0
(a
k
j
a
k
j1
)S
k
j
+
1
k=0
a
k
j
(S
k
j+1
S
k
j
) (19)
This decomposition describes very clearly how the value change comes about: First
by trading at time
j
and then by the change of the asset prices during the time interval
12.3. THE BLACK-SCHOLES EQUATION 117
(
j
,
j+1
]. For a self-nancing trading strategy the trading process must not inuence
the value of the portfolio which implies that the rst term in the sum is zero and we
have
V
j+1
V
j
= a
0
j
(S
0
j+1
S
0
j
) + a
1
j
(S
1
j+1
S
1
j
) (20)
Thus we obtain the following fundamental result.
12.3 Theorem. A simple trading strategy (H
0
t
, H
1
t
) is self-nancing iff
V
t
= V
0
+
j
a
0
j1
(S
0
j
t
S
0
j1
t
) +
j
a
1
j1
(S
1
j
t
S
1
j1
t
) (21)
= V
0
+
_
t
0
H
0
dS
0
+
_
t
0
H
1
dS
1
(22)
Now we are ready to turn to general trading strategies.
12.4 Denition. A trading strategy (H
0
t
, H
1
t
) (consisting of left-continuous adapted
processes) is self-nancing if the value process V
t
:= H
0
t
S
0
t
+ H
1
t
S
1
t
satises
V
t
= V
0
+
_
t
0
H
0
dS
0
+
_
t
0
H
1
dS
1
or in differential notation dV = H
0
dS
0
+ H
1
dS
1
.
The property of being self-nancing is a very strong property which narrows the
set of available wealth processes considerably. Let us illustrate this fact at hand of
continuous models.
Assume that the market model and the trading strategy are continuous. Then from
the integration by parts formula we obtain
d(H
0
S
0
) + d(H
1
S
1
) = H
0
dS
0
+ H
1
dS
1
+ S
0
dH
0
+ S
1
dH
1
+ d[S
0
, H
0
] + d[S
1
, H
1
]
If the trading strategy is self-nancing then the expression on the second line vanishes.
This shows that the components H
0
t
and H
1
t
must be adjusted with respect to each
other in a very special way in order to obtain a self-nancing trading strategy.
12.3 The Black-Scholes equation
An important question is how to characterize the value processes of self-nancing
trading strategies. This can sometimes be done by partial differential equations. The
most famous partial differential equation of this type is the Black-Scholes equation.
Let (B
t
, S
t
) be a Black-Scholes market model. A trading strategy is Markovian if
the value process (V
t
) is of the form
V
t
= f(t, B
t
, S
t
).
This is the case e.g. if
H
0
t
= (t, S
t
), H
1
t
= (t, S
t
),
where and are continuous functions.
Since we have B
t
= B
0
e
rt
we may write
V
t
= g(t, S
t
)
If we assume that g is a sufciently differentiable function then the Black-Scholes
equation gives a criterion for the self-nancing property.
12.5 Theorem. (Black-Scholes equation)
Assume that g(t, x) is a smooth function. Then V
t
= g(t, S
t
) is the market value of a
self-nancing portfolio in a Black-Scholes model iff
rg(t, x) = g
t
(t, x) + rg
x
(t, x)x +
1
2
g
xx
(t, x)
2
x
2
, t [0, T], x [0, ). (23)
Proof: Note that by Itos formula we have
dV
t
= g
t
(t, S
t
) dt + g
x
(t, S
t
) dS
t
+
1
2
g
xx
(t, S
t
) d[S]
t
=
_
g
t
(t, S
t
) + g
x
(t, S
t
)S
t
+
1
2
g
xx
(t, S
t
)
2
S
2
t
_
dt + g
x
(t, S
t
)S
t
dW
t
(24)
Assume rst that the portfolio is self-nancing, i.e.
V
t
= (t, S
t
)B
t
+ (t, S
t
)S
t
and dV
t
= (t, S
t
) dB
t
+ (t, S
t
) dS
t
This implies that
dV
t
=
_
(t, S
t
)rB
t
+ (t, S
t
)S
t
_
dt + (t, S
t
)S
t
dW
t
(25)
Since the components of an Ito expansion are uniquely determined it follows that
(t, S
t
) = g
x
(t, S
t
) (26)
From the denition of the portfolio it follows that
(t, S
t
) =
V
t
g
x
(t, S
t
)S
t
B
t
(27)
12.3. THE BLACK-SCHOLES EQUATION 119
Plugging these expressions into 25 and comparing the bounded variation terms of 25
and 24 gives equation 23 for all pairs (t, x) which are in the range of (t, S
t
).
Now assume conversely that equation 23 is true. Dening the trading strategy by
26 and 27 gives a self-nancing portfolio such that V
t
= g(t, S
t
). 2
12.6 Problem. Let V
t
= g(t, S
t
) be a positive self-nancing value process in a
Black-Scholes model. Dene logarithmic drift
V
and volatility
V
of (V
t
) by
dV
t
=
V
t
V
t
dt +
V
t
V
t
dW
t
Show that
V
t
r
V
t
=
r
(the market price of risk is constant).

Hint: Expand dV
t
by Itos formula, apply the Black-Scholes equation and compare
both Ito-representations of (V
t
).
A remarkable consequence of the proof of the Black-Scholes equation is the fol-
lowing assertion.
12.7 Corollary. Let (B
t
, S
t
) be a Black-Scholes market model. If (, ) is a self-
nancing trading strategy with value process V
t
= g(t, S
t
) then = g
x
.
There is a popular misunderstanding of the preceding assertion which even found
its way into famous textbooks.
12.8 Discussion. In several textbooks one can nd the following argument for prov-
ing the Black-Scholes equation.
Let (B
t
, S
t
) be a Black-Scholes model. Let V
t
= g(t, S
t
) be a tradable value
process (i.e. dened by self-nancing trading strategy). Consider the value process
t
:= V
t
g
x
(t, S
t
)S
t
of a portfolio consisting of one unit of V
t
and g
x
(t, S
t
) units of S
t
.
At this point authors apply the following argument: This portfolio is self-nancing.
As we shall see later this argument is not true, in general. But lets have a look on how
the argumention proceeds.
Assume that the portfolio is actually self-nancing. Then it follows that
d
t
= dV
t
g
x
(t, S
t
) dS
t
which by Itos formula implies that the portfolio is of bounded variation. By some
no-arbitrage argument (which is actually correct but needs to be proved for the Black-
Scholes model) each tradable value process of bounded variation must be proportional
to (B
t
). In oher words (
t
) has constant rendite r, i.e.
d
t
= r
t
dt
Now, plugging in the previous expressions for
t
and d
t
and expanding dV
t
according
to Itos formula gives the Black-Scholes equation.
Unfortunately, the portfolio (
t
) need not be self-nancing. This can easily be
seen as follows.
From the proof of Theorem 12.5 it follows that
dV
t
g
x
(t, S
t
) dS
t
= r(g(t, S
t
) g
x
(t, S
t
)S
t
) dt
V
t
g
x
(t, S
t
)S
t
= g(t, S
t
) g
x
(t, S
t
)S
t
In other words, we have
dV
t
g
x
(t, S
t
) dS
t
=
g(t, S
t
) g
x
(t, S
t
)S
t
e
rt
dB
t
V
t
g
x
(t, S
t
)S
t
=
g(t, S
t
) g
x
(t, S
t
)S
t
e
rt
B
t
Assuming that the portfolio (
t
) is self-nancing we obtain as necessary consequence
that
g(t, S
t
) g
x
(t, S
t
)S
t
e
rt
is constant ! It is easy to see that this is only possible if V
t
= aS
t
+ bB
t
which is a
portfolio without any trading.
12.9 Problem. Show: If g(t, S
t
) = g
x
(t, S
t
)S
t
+ bB
t
is a self-nancing portfolio in
a Black-Scholes model then g(t, S
t
) = aS
t
+ bB
t
.
Hint: Solving the differential equation for every xed t it follows that g(t, S
t
) =
a(t)S
t
+ bB
t
. Then the self-nancing property implies a(t) a.
12.10 Problem. Let (V
t
) be a tradable value process in a Black-Scholes model. Find
a self-nancing trading strategy (, ) such that V + S = B.
12.4 The general case
Let / = (X
1
, X
2
, . . . , X
n
) be a nancial market model consisting of Ito-processes
dX
i
t
=
it
dt +
it
dW
t
.
This is a so-called one-factor model since only one Wiener process is responsible for
random uctuations. We assume that the processes (
it
) are positive.
Let V be a wealth process generated by some trading strategy. The wealth process
is called Markovian if there exists a function f(x, t), x R
n
, t 0, such that V
t
=
f(X
t
, t) where X
t
= (X
1
t
, X
2
t
, . . . , X
n
t
).
There are two questions which arise in such a situation:
(1) Which wealth processes can be obtained by self-nancing strategies ?
(2) How to obtain the trading strategies from the wealth process ?
12.5. CHANGE OF NUMERAIRE 121
We will show that if the function f(x, t) is smooth then partial differential equa-
tions are necessary and sufcient for the self-nancing property. In special cases these
partial differential equations are called Black-Scholes equations.
To begin with we note that the self-nancing property implies the existence of a
trading strategy (
1
,
2
, . . . ,
n
) such that
f(X
t
, t) = f(X
0
, 0) +
n
i=1
_
t
0
i
s
dX
i
s
=
n
i=1
i
t
X
i
t
On the other hand the Ito-formula gives
f(X
t
, t) = f(X
0
, 0) +
n
i=1
_
t
0
f
x
i
(X
s
, s)dX
i
s
+
_
t
0
f
t
(X
s
, s)ds +
1
2
i,j
_
t
0
f
x
i
x
j
(X
s
, s)
is
js
ds
In order to guarantee the self-nancing property it is sufcient to require that both
the dX
t
-parts and the dt-parts coincide. The equality of the dX
t
-parts is true if
i
t
=
f
x
i
(X
t
, t) and this gives the rst partial differential equation:
f(X
t
, t) =
n
i=1
f
x
i
(X
t
, t)X
i
t
Comparing the dt-part gives the second partial differential equation:
f
t
(X
t
, t) +
1
2
i,j
f
x
i
x
j
(X
t
, t)
it
jt
= 0
These equations are sufcient conditions for the self-nancing property of the wealth
processes. Moreover, from the rst equation we obtain the explicit form of the trading
strategy.
12.5 Change of numeraire
In nancial calculations it is often convenient to change the unit of money. The most
simple example is discounting by a xed interest rate. Let us start with considering a
Black-Scholes model.
Let (B
t
, S
t
)
tT
be a Black-Scholes market model with B
t
= e
rt
and
dS
t
= S
t
dt + S
t
dW
t
where (W
t
) denotes a Wiener process. The discounted market model is (B
t
, S
t
)
where
B
t
= B
t
e
rt
= 1, S
t
= S
t
e
rt
12.11 Problem. Find the stochastic differential equation for (S
t
).
Let (V
t
) be the value process of some self-nancing trading strategy, i.e. assume
that
V
t
= H
0
t
B
t
+ H
1
t
S
t
, dV
t
= H
0
t
dB
t
+ H
1
t
dS
t
For the discounted value process V
t
= V
t
e
rt
we have
V
t
= H
0
t
B
t
+ H
1
t
S
t
, dV
t
= H
0
t
dB
t
+ H
1
t
dS
t
Here, the rst equation is clear, but the second is only intuitively clear. The intuitive
reason is that changing the unit of money must not disturb the self-nancing property
of a trading strategy. A few lines below we shall prove this assertion rigorously.
But let us draw conclusions from those equations. Since B
t
= 1 we have dB
t
= 0
which implies
dV
t
= H
1
t
dS
t
or in other words
V
t
= V
0
+
_
t
0
H
1
t
dS
s
This means that in a discounted Black-Scholes market every stochastic integral with
respect to the asset is a tradable value process.
12.12 Problem. Find (H
0
t
) such that for a given process (H
1
t
) the pair (H
0
t
, H
1
t
) is a
self-nancing trading strategy in the Black-Scholes model.
There are also important applications where the numeraire is not a bank account
with a xed interest rate but some other stochastic process. It seems to be intuitively
clear that such a change of numeraire should have no inuence on the trading strategy
and should not destroy the self-nancing property. The following theorem shows that
this is actually true in full generality.
12.13 Theorem. Assume that the market model (X, Y ) is continuous and let Z be a
continuous semimartingale. If V := H
X
X + H
Y
Y satises
dV = H
X
dX + H
Y
dY
then V Z = H
X
XZ + H
Y
Y Z satises
d(V Z) == H
X
d(XZ) + H
Y
d(Y Z)
12.5. CHANGE OF NUMERAIRE 123
Proof: The rst equality is obvious. The second follows from
d(V Z) = ZdV + V dZ + d[Z, V ]
= ZH
X
dX + ZH
Y
dY + H
X
XdZ + H
Y
Y dZ + d[Z, V ]
= H
X
(ZdX + XdZ) + H
Y
(ZdY + Y dZ) + d[Z, V ]
= H
X
(d(XZ) d[X, Z]) + H
Y
(d(Y Z) d[Y, Z]) + d[Z, V ]
= H
X
d(XZ) + H
Y
d(Y Z) + d[Z, V ] H
X
d[X, Z] H
Y
d[Y, Z]
2
Let (X, Y ) be a continuous market model. Assume now that X is a positive con-
tinuous semimartingale. Then it can be used as a numeraire. If V
t
= H
X
t
X
t
+ H
Y
t
Y
t
then we obtain
V
t
/X
t
= H
X
t
+ H
Y
t
(Y
t
/X
t
)
and
V
t
/X
t
= V
0
/X
0
+
_
t
0
H
Y
d(Y/X)
This can be interpreted in the following way: In the normalized market (1, Y/X) the
wealth process of a self-nancing strategy depends only the number H
Y
of assets of
the second kind.
An important consequence is the fact that for constructing a self-nancing strategy
it is sufcient to choose H
Y
. Then the wealth process in the normalized market is
xed. The process H
X
is then obtained from the rst equation.
Chapter 13
Stochastic differential equations
13.1 Introduction
A (Wiener driven) stochastic differential equation is an equation of the form
dX
t
= b(t, X
t
)dt + (t, X
t
) dW
t
where (W
t
)
t0
is a Wiener process and b(t, x) and (t, x) are given functions. The
problem is to nd a process (X
t
)
t0
that satises the equation. Such a process is then
called a solution of the differential equation.
Note that the differential notation is only an abbreviation for the integral equation
X
t
= x
0
+
_
t
0
b(s, X
s
) ds +
_
t
0
(s, X
s
) dW
s
There are three issues to be discussed for differential equations:
(1) Theoretical answers for existence and uniqueness of solutions.
(2) Finding analytical expressions for solutions.
(3) Calculating solutions by numerical methods.
We will focus on analytical expressions for important but easy special cases. How-
ever, let us indicate some issues which are important from the theoretical point of view.
For stochastic differential equations even the concept of a solution is a subtle ques-
tion. We have to distinguish between weak and strong solutions, even between weak
and strong uniqueness. It is not within the scope of this text to give precise denitions
of these notions. But the idea can be described in an intuitive way.
A strong solution is a solution where the driving Wiener process (and the underly-
ing probability space) is xed in advance and the solution (X
t
)
t0
is a function of this
given driving Wiener process. A weak solution is an answer to the question: Does
there exist a probability space where a process (X
t
)
t0
and a Wiener process (W
t
)
t0
exist such that the differential equation holds ?
125
126 CHAPTER 13. STOCHASTIC DIFFERENTIAL EQUATIONS
When we derive analytical expressions for solutions we will derive strong solu-
tions. In particular for linear differential equations (to be dened below) complete
formulas for strong solutions are available.
There is a general theory giving sufcient conditions for existence and uniqueness
of non-exploding strong solutions. Both the proofs and the assertions of this theory
are quite similar to the classical theory of ordinary differential equations. We refer to
Hunt-Kennedy [12] and Karatzas-Shreve [15].
Let us introduce some terminology.
Any stochastic differential equation is time homogeneous if b(t, x) = b(x) and
(t, x) = (x).
A linear differential equation is of the form
dX
t
= (a
0
(t) + a
1
(t)X
t
)dt + (
0
(t) +
1
(t)X
t
)dW
t
It is a homogeneous linear differential equation if a
0
(t) =
0
(t) = 0.
The simplest homogeneous case is
dX
t
= X
t
dt + X
t
dW
t
which corresponds to the Black Scholes model. The constant is called the volatility
of the model.
There are plenty of linear differential equations used in the theory of stochastic in-
terest rates. If (B
t
) denotes a process that is a model for a bank account with stochastic
interest rate then
r
t
:=
B
t
B
t
B
t
= B
0
e
t
0
r
s
ds
is called the short rate. Popular short rate models are the Vasicek model
dr
t
= a(b r
t
)dt + dW
t
and the Hull-White model
dr
t
= ((t) a(t)r
t
)dt + (t)dW
t
13.2 The abstract linear equation
Let Y and Z be any continuous semimartingales. The abstract homogeneous linear
equation is
dX
t
= X
t
dY
t
and its solution is known to us as
X
t
= x
0
e
Y
t
[Y ]
t
/2
= x
0
c(Y
t
)
This is the recipe to solve any homogeneous linear stochastic differential equation.
There is nothing more to say about it at the moment.
13.2. THE ABSTRACT LINEAR EQUATION 127
13.1 Problem. Solve dX
t
= a(t)X
t
dt + (t)X
t
dW
t
.
Things become more interesting when we turn to the general inhomogeneous equa-
tion
dX
t
= X
t
dY
t
+ dZ
t
There is an explicit expression for the solution but it is much more illuminating to
memorize the approach how to arrive there.
The idea is to write the equation as
dX
t
X
t
dY
t
= dZ
t
and to nd an integrating factor that transforms the left hand side into a total differen-
tial.
Let dA
t
= A
t
dY
t
and multiply the equation by 1/A
t
giving
1
A
t
dX
t
X
t
A
t
dY
t
=
1
A
t
dZ
t
(28)
Note that
d
1
A
t
=
1
A
t
dY
t
+
1
A
t
d[Y ]
t
Then
d
_
1
A
t
X
t
_
=
1
A
t
dX
t
+ X
t
d
1
A
t
+ d
_
1
A
t
, X
t
_
=
1
A
t
dX
t
X
t
A
t
dY
t
+
X
t
A
t
d[Y ]
t
1
A
t
d[Y, X]
t
=
1
A
t
dX
t
X
t
A
t
dY
t
1
A
t
d[Y, Z]
t
Thus, the left hand side of (28) differs froma total differential by a known BV-function.
We obtain
d
_
1
A
t
X
t
_
=
1
A
t
dZ
t
1
A
t
d[Y, Z]
t
leading to
X
t
= A
t
_
x
0
_
t
0
1
A
s
d[Y, Z]
s
+
_
t
0
1
A
s
dZ
s
_
(29)
Note that the solution is particularly simple if either Y or Z are BV-processes.
13.2 Problem. Fill in and explain all details of the derivation of (29).
13.3 Wiener driven models
The Vasicek model is
dX
t
= ( X
t
)dt + dW
t
For = 0 the solution is called the Ornstein-Uhlenbeck process.
The Vasicek model is a special case of the inhomogeneous linear equation which
is clear by setting
dY
t
= dt and dZ
t
= dt + dW
t
Therefore the integrating factor is A
t
= e
t
and the solution is obtained as in the case
of an ordinary linear differential equation.
13.3 Problem. Show that the solution of the Vasicek equation is
X
t
= e
t
x
0
+

(1 e
t
) +
_
t
0
e
(ts)
dW
s
13.4 Problem. Derive the following properties of the Vasicek model:
(a) The process (X
t
)
t0
is a Gaussian process (i.e. all joint distribution are normal
distributions).
(b) Find E(X
t
) and lim
t
E(X
t
).
(c) Find V (X
t
) and lim
t
V (X
t
).
(d) Find Cov(X
t
, X
t+h
) and lim
t
Cov(X
t
, X
t+h
).
13.5 Problem. Let X
0
N
_
,

2
2
_
. Explore the mean and covariance structure of a
Vasicek model starting with X
0
.
Let us turn to models that are not time homogneous.
13.6 Problem. The Brownian bridge:
(a) Find the solution of
dX
t
=
1
1 t
X
t
dt + dW
t
, 0 t < 1.
(b) Show that (X
t
)
t0
is a Gaussian process. Find the mean and the covariance
structure.
(c) Show that X
t
0 if t 1.
13.7 Problem. Find the solution of the Hull-White model:
dX
t
= ((t) a(t)X
t
)dt + (t)dW
t
13.3. WIENER DRIVEN MODELS 129
Finally, let us consider a nonlinear model.
13.8 Problem. Let Z
t
= c(t + W
t
).
(a) For a > 0 nd the differential equation of
X
t
:=
Z
t
1 + a
_
t
0
Z
s
ds
(b) What about a < 0 ?
Chapter 14
Martingale properties of stochastic
integrals
Let (M
t
)
t0
be a martingale. We would like to know for which H L
0
H M : t
_
t
0
H dM
is a martingale, too. This is a reasonable question for several reasons. First of all, when
we were dealing with stochastic sequences then we observed that gambling strategies
do not change the martingale properties of the underlying stochastic sequence. The
values of such gambling strategies are of the same structure as stochastic integrals
with simple integrands. This observation carries over to the continuous as is shown in
theorem 10.7.
A general version of this assertion is a delicate matter. It turns out that dealing with
martingale properties of stochastic integrals the notion of an ordinary martingale is not
the right concept.
14.1 Locally square integrable martingales
We start with a preliminary assertion concerning the martingale property of stochastic
integrals with respect to square integrable martingales. This assertion extends theorem
10.7 and is the basis of many further results. The proof is given at the end of the
section.
14.1 Lemma. Let (M
t
)
t0
be a square integrable martingale. Then H M is a square
integrable martingale for every bounded H L
0
.
How can we extend this assertion to more general integrands H L
0
? The idea
is to truncate the process H L
0
by stopping times. Such a procedure works for
left-continuous processes, thus in particular for continuous processes.
131
132CHAPTER 14. MARTINGALE PROPERTIES OF STOCHASTIC INTEGRALS
14.2 Problem. Show that for every caglad process (X
t
)
tT
there is a sequence of
stopping times
n
such that [X
n
t
[ n and
n
.
Hint: Let
n
= inft : [X
t
[ n.
14.3 Problem. Discuss the question whether such a truncation procedure can also be
applied to cadlag processes with jumps.
Let (M
t
) be a square integrable martingale and let H L
0
. Then for any stopping
time we have (HM)
= H
M. If we choose in such a way that H
is bounded
then it follows that (H M)

As a result we see that even if H M is not a martingale it can be stopped in
such a way that it becomes a martingale and this stopping procedure can be performed
arbitrarily late. This leads to the concept of local martingales.
14.4 Denition.
A process (X
t
) is a local martingale if there exists a sequence of stopping times
n
such that X
n
is a martingale for every n N.
Aprocess (X
t
) is a locally square integrable martingale if there exists a sequence
of stopping times
n
such that X
n
is a square integrable martingale for every
n N.
The term locally is used in a very general and extensive way. Whenever a process
is such that it can be stopped by a sequence
n
of stopping times in such a way
that the stopped processes have a certain property then it is said that the process has
this property locally.
14.5 Problem.
(1) Every caglad process is locally bounded.
(2) Every continuous martingale is a locally square integrable martingale.
(2) Every continuous local martingale is a locally square integrable martingale.
Applying what we know so far leads us to the following assertion.
14.6 Problem. If M is a square integrable martingale and if H L
0
then H M is
a locally square integrable martingale.
It follows that stochastic integrals with respect to the Wiener process or the Poisson
process are locally square integrable martingales.
But stochastic integration is associative, i.e. we may consider stochastic integrals
with respect to processes which themselves are stochastic integrals. These, in general,
are not necessarily square integrable martingales.
Fortunately, even more is true. The next theorem shows that being a locally square
integrable martingale is a property which is stable under stochastic integration. Since
it applies to continuous local martingales and to martingales with uniformly bounded
jumps we have arrived at a very useful result.
14.1. LOCALLY SQUARE INTEGRABLE MARTINGALES 133
14.7 Theorem. Let M be a locally square integrable martingale. Then each stochastic
integral H M with H L
0
is again a locally square integrable martingale.
Proof: Let
n
be a sequence of stopping times such that M
n
are square
integrable martingales and let
n
be a sequence of stopping times such that H
n
are bounded. Then (H M)
n
are square integrable martingales. 2
14.8 Problem. Let M be a locally square integrable martingale. Show that M
2
[M]
is a locally square integrable martingale, too.
Hint: Apply integration by parts.
14.9 Problem. Let M be a continuous local martingale. Then each stochastic integral
H M with H L
0
is again a continuous local martingale.
It can be shown that the assertion of Theorem 14.7 remains valid if continuity or
square integrability is removed. But this result is beyond our scope. INSERT
We nish this section with proving Lemma 14.1.
Proof: We show that E((H M)
2
t
) < and E(
_
t
0
H dM) = 0.
Let 0 = t
0
< t
1
< . . . < t
n
= t be a the n-th element of a Riemannian sequence
of subdivisions and dene
H
n
=
n
j=1
H
t
j1
1
(t
j1
,t
j
]
Then E(
_
t
0
H
n
dM) = 0 and
_
t
0
H
n
dM
P
_
t
0
H dM. It remains to showthat E((
_
t
0
H
n
dM)
2
)
is bounded since this implies
_
t
0
H
n
dM
L
1
_
t
0
H dM.
For this, note that
E
__
_
t
0
H
n
dM
_
2
_
= E
__
n
j=1
H
t
j1
(M
t
j
M
t
j1
)
_
2
_
=
n
j=1
E(H
2
t
j1
(M
t
j
M
t
j1
)
2
)
C
n
j=1
E((M
t
j
M
t
j1
)
2
) = C(E(M
2
t
) E(M
2
0
))
2
14.10 Problem. Why is it sufcient in the proof of 14.1 to show E(
_
t
0
H dM) = 0 ?
14.11 Problem. Let X
n
P
X and E(X
2
n
) C, n N. Show that [[X
n
X[[
1
0.
14.12 Problem. Explain why in the proof of 14.1
E
__
n
j=1
H
t
j1
(M
t
j
M
t
j1
)
_
2
_
=
n
j=1
E(H
2
t
j1
(M
t
j
M
t
j1
)
2
)
14.2 Square integrable martingales
Although being a locally square integrable martingale is a sufciently strong martin-
gale property for many purposes it is also important to know when the martingale
property holds without localization. In particular, we would like to know under what
circumstances Ito-intgrales are martingales.
First we note that Kolmogoroffs inequality for martingales carries over from the
discrete time case to the continuous time case.
14.13 Lemma. Let M be a square integrable martingale. Then
E(sup
tT
M
2
t
) 4E(M
2
T
) <
This gives a more or less obvious criterion for the martingale property.
14.14 Corollary. A locally square integrable martingale M is a square integrable
martingale iff E(sup
tT
M
2
t
) < .
Proof: We have to show that E(M
t
) = E(M
0
). This is certainly true for the
processes stopped at a suitable localization. We can get rid of the stopping times by
applying the dominated convergence theorem. 2
14.15 Problem. Every bounded local martingale is a martingale.
The preceding criterion is unconvenient to apply. It is much better to have a cri-
terion in terms of analytically tractable expressions like the quadratic variation. The
following theorem contains a fundamental property of square integrable martingales
which is a rst step into that direction.
14.16 Theorem. For any square integrable martingale (M
t
)
t0
the process M
2
t
[M]
t
is a martingale. In particular, we have E(M
2
t
) = E([M]
t
), t T.
Proof: It is sufcent to prove the second part of the assertion. Applying the second
part to stopped processes gives the martingale property.
In view of problem 14.8 there is a sequence of stopping times
n
be such
that M
2
n
t
[M]
n
t
are martingales which implies E(M
2
n
t
) = E([M]
2
n
t
). By
Kolmogoroffs maximal inequality we obtain E(M
2
t
) = E([M]
t
). 2
14.3. LEVYS THEOREM 135
Note that 14.16 is known to us for the Wiener process. Thus, it is a generalization
of a familiar structure. Let us turn to some consequences of 14.16.
14.17 Problem. Show that every continuous locally square integrable martingale of
nite variation is necessarily constant.
t
)
t0
be a locally square integrable martingale. If (A
t
) is a
continuous adapted process of bounded variation such that M
2
t
A
t
is a martingale,
then A
t
= [M]
t
.
Now we are in a position to give a sufciently general criterion for the martingale
property of a local martingale.
14.19 Theorem. A locally square integrable martingale is a square integrable mar-
tingale iff E([M]
T
) < .
Proof: Necessity follows from 14.16.
To prove sufciency let (
n
) be a localizing sequence and note that Kolmogoroffs
maximal inequality implies
E(sup
tT
M
2
n
t
) 4E(M
2
n
T
) = 4E([M]
n
T
) 4E([M]
T
).
Let n , apply Levis theorem and Corollary 14.14. 2
14.20 Corollary. Let (M
t
)
t0
be a locally square integrable martingale and H L
0
.
Then H M is a square integrable martingale iff
E
_
_
T
0
H
2
s
d[M]
s
_
< .
14.21 Problem. Explain how 14.20 follows from the preceding assertions.
14.22 Problem. Discuss the martingale properties of Ito-integrals.
14.3 Levys theorem
Levys theorem is a far reaching characterization of the Wiener process. The remark-
able fact is that the Wiener process is a local martingale which is uniquely determined
by its quadratic variation.
14.23 Theorem. (Levy)
Let (M
t
) be a continuous local martingale. If [M]
t
= t then (M
t
) is a Wiener
process.
Proof: Let
Z
t
= e
iaM
t
+
a
2
2
t
Note that (Z
t
)
tT
is bounded. Moreover, as can be shown by a complex version of
Itos formula, (Z
t
) satises the linear differential equation
Z
t
= 1 + ia
_
t
0
Z dM
Hence, (Z
t
) is a local martingale and, since bounded, even a square integrable martin-
gale. The martingale property implies
E(Z
t
[T
s
) = Z
s
which means
E
_
e
ia(M
t
M
s
T
s
_
= e
a
2
2
(ts)
Thus, the increments are independent and N(0, t s)-distributed. 2
t
) be a continuous process with independent increments,
E(X
t
) = 0 and V (X
t
) = t, t 0. Show that (X
t
) is a Wiener process.
t
) be a continuous process with independent increments,
E(X
t
) = 0, and assume that g(t) := V (X
t
) is continuous and strictly increasing.
Show that there exists a Wiener process (W
t
) such that X
t
= W
g(t)
.
t
) be a continuous local martingale such that
[X]
t
=
_
t
0
2
(s) ds
where
2
(t) > 0 is continuous. Show that there exists a Wiener process (W
t
) such that
X
t
= X
0
+
_
t
0
(s) dW
s
14.4 Martingale representation
Let (W
t
)
t0
be a Wiener process and let (T
t
) the augmented internal history. We know
that
_
t
0
H
s
dW
s
, 0 t T,
is a square integrable martingale iff
E
_
_
T
0
H
2
s
ds
_
< .
Now, in this special case there is a remarkable converse: Each square integrable mar-
tingale with respect to (T
t
) arises in this way !
14.4. MARTINGALE REPRESENTATION 137
The case of caglad processes
Actually, we have to be a bit more modest: If we conne ourselves (as we have done
so far) to H L
0
(caglad adapted processes) then square integrable martingales can
only be approximated with arbitrary precision by stochastic integrals.
The martingale representation fact is a consequence of the following seemingly
simpler assertion: Each random variable C L
2
(T
T
) (each claim) can be (approx-
imately) written as a constant plus a stochastic integral (hedged by a self-nancing
strategy).
Let us introduce some simplifying terminology.
14.27 Denition. A set ( of random variables in L
2
(T
T
) is called dense if for every
C L
2
(T
T
) there is a sequence C
n
( such that E((C
n
C)
2
) 0.
A set ( of random variables in L
2
(T
T
) is called total if the linear hull of ( is
dense.
Thus, we want to prove
14.28 Theorem. The set of all integrals a +
_
T
0
H dW with a R, H L
0
and
E(
_
T
0
H
2
s
ds) < is dense in L
2
(T
T
).
Proof: The starting point is that T
T
is generated by (W
s
)
sT
and therefore also
by (e
W
s
)
sT
. Therefore an obvious dense set consists of the functions
(e
W
s
1
, e
W
s
2
, . . . , e
W
s
n
),
where is some continuous function with compact support and s
1
, s
2
, . . . , s
n
is some
nite subset of [0, T]. Every continuous function can be approximated uniformly by
polynomials (Weierstrass theorem) and polynomials are linear combinations of pow-
ers. Thus, we arrive at a total set consisting of
exp
_
n
j=1
k
j
W
s
j
_
which after reshufing can be written as
exp
_
n
j=1
a
j1
(W
s
j
W
s
j1
)
_
= exp
_
_
T
0
f(s) dW
s
_
(30)
for some bounded left-continuous (step) function f : [0, t] R. It follows that the set
of functions (differring from (30) by constant factors)
G
T
= exp
_
_
T
0
f(s) dW
s
1
2
_
T
0
f
2
(s) ds
_
is total when f varies in the set of all bounded left-continuous (step) functions f :
[0, T] R.
Recall that (G
t
)
tT
is a square integrable martingale and satises
G
t
= 1 +
_
t
0
Gd(f W) =
_
t
0
G
s
f(s)dW
s
From 14.16 it follows that
E
_
_
T
0
G
2
s
f
2
(s) ds
_
< .
Therefore, the set of integrals
1 +
_
t
0
H
s
dW
s
where H L
0
and E
_
_
t
0
H
2
s
ds
_
<
is total and by linearity of the integral the assertion follows. 2
How can be apply this result to martingale representation ?
Note, that for the proof of the preceding theorem we did not make use of the usual
conditions. So at rst, let (W
t
) be a Wiener process with inner history (T
t
). Assume
that (M
t
) is a square integrable martingale w.r.t. (T
t
). If there is a representation
M
T
= M
0
+
_
T
0
H
s
dW
s
,
then it follows from the martingale property of the stochastic integral that for every
t T
M
t
= E(M
T
[T
t
) = M
0
+ E
_
_
T
0
H
s
dW
s
T
t
_
= M
0
+
_
t
0
H
s
dW
s
P-a.s.
However, this does not imply that the paths the processes are equal P-almost sure.
This is the reason why we have to turn to the augemented ltration. Then we may
assume that (M
t
) is cadlag and that the stochastic integral is continuous. In this way
it follows that the martingale (M
t
) has continuous paths, too, and that both processes
are indistinguishable.
Predictable processes
So far we have dened stochastic integrals with integrands that are adapted and left-
continuous (with right limits). This class of processes is sufciently large for appli-
cations but not for theoretical purposes. A typical example where the restriction to
caglad processes is annoying is the martingale representation theorem.
The reason why we can only approximate martingales by stochastic integrals is
that the class of caglad processes that we are presently using as integrands is too small.
We have to enlarge the space of integrands.
14.4. MARTINGALE REPRESENTATION 139
Now we will indicate how to enlarge the space of integrands. We will consider only
the case of the Wiener process. The approach is the same for every square integrable
martingale.
Consider a time interval [0, T] with nite horizon T. Stochastic processes (H
s
) are
functions
H : [0, T] R : (, s) H
s
()
The most simple example of an adapted caglad process is
H
s
() := 1
F
()1
(t
1
,t
2
]
(s) = 1
F(t
1
,t
2
]
(, s), F T
t
1
.
This elementary process can be written as the indicator function of a predictable rect-
angle F (t
1
, t
2
] [0, T], F T
t
1
.
14.29 Denition. The -eld T
T
on [0, T] which is generated by predictable
rectangles F (t
1
, t
2
], F T
t
1
, (including 0), is the predictable -eld. A
T
T
-measurable function H : [0, T] R is called a predictable process.
14.30 Theorem. All processes in L
0
(caglad and adapted) are predictable.
Proof: Let H
s
() = a()1
(t
1
,t
2
]
(s) where a is T
t
1
-measurable. Since a is the
limit of linear combinations of indicators in T
t
1
the process H is the limit of linear
combinations of indicators of predictable rectangles and thus predictable. It follows
that all processes in c
0
are predictable and thus all processes in L
0
. 2
Our stochastic integral was dened for integrands H L
0
which are predictable
processes. It is our goal to extend the notion of the stochastic integral to predictable
integrands. This will be done by an approximation process in some suitably dened
L
2
-space.
We start with dening the measure of the underlying measure space. On the
predictable -eld T
T
we dene the measure
(A) = E
_
_
T
0
1
A
(, s) ds
_
, A T
T
. (31)
By measure theoretic induction this measure satises
_
H
2
d = E
_
_
T
0
H
2
s
ds
_
, H L
2
(, T
T
, ). (32)
From 4.44 it follows that the set of all linear combinations of predictable rectangles
is dense in L
2
( [0, T], T
T
, ). Since L
0
contains this set it follows that the set
L
0
L
2
( [0, T], T
T
, ) is a dense subspace of L
2
( [0, T], T
T
, ). Thus, every
square integrable predictable process can be approximated by caglad processes.
Now it is clear how the extension of the stochastic integral form L
0
to predictable
processes will be carried out. For each H L
2
( [0, T], T
T
, ) we may nd a
sequence H
n
L
0
L
2
( [0, T], T
T
, ) such that
lim
n
_
(H
n
H)
2
d = 0
Since we have
E
__
_
T
0
(H
m
s
H
n
s
) dW
s
_
2
_
= E
_
_
T
0
(H
m
s
H
m
s
)
2
ds
_
=
_
(H
m
H
n
)
2
d
the stochastic integrals
_
T
0
H
n
dW converge in L
2
(T
T
, P). Thus, we may dene
_
T
0
H dW := lim
n
_
T
0
H
n
dW
Essentially all properties of the stochastic integral on L
0
carry over to the stochastic
integral for predictable integrands.
In particular the martingale representation theorem can be stated in a satisfactory
way.
t
) be a Wiener process and (T
t
) its augmented inner history.
For every C L
2
(T
T
) there exists a predictable process H L
2
( [0, T], T
T
, )
such that
C = E(C) +
_
T
0
H dW
14.32 Corollary. (Martingale representation theorem)
Let (W
t
t
) its augmented inner history. For every
square integrable martingale (M
t
) there exists a predictable process H L
2
(
[0, T], T
T
, ) such that
M
t
= M
0
+
_
t
0
H dW, t T.
t
t
) its inner history.
Show that for every continuous local martingale there exists a predictable process
H such that
_
T
0
H
2
s
ds < P-a.e. and M
t
= M
0
+
_
t
0
H dW, t T.
Chapter 15
Exponential martingale and
Girsanovs theorem
15.1 The exponential martingale
Let (L
t
) be a continuous positive semimartingale on [0, T]. We know that L can be
written as a stochastic exponential:
dM
t
=
1
L
t
dL
t
dL
t
= L
t
dM
t
L
t
= L
0
e
M
t
[M]
t
/2
The following assertion is basic.
15.1 Theorem. Let L and M be continuous semimartingales such that
L
t
= e
M
t
[M]
t
/2
(i.e. L
0
= 1). Then:
(1) L is a local martingale iff M is a local martingale.
(2) If L is a local martingale then it is a supermartingale and satises E(L
T
) 1.
(3) If L is a local martingale then it is a martingale iff E(L
T
) = 1.
Proof: Part (1) is true since stochastic integrals of contiuous local martingales are
continuous local martingales.
As for part (2) rst we show that a local martingale which is bounded from below
is a supermartingale, i.e. E(L
t
[T
s
) L
s
, s < t T. Let
n
be a localizing sequence.
Then we have E(L
n
t
[T
s
) = L
n
s
, s < t T. Now, by Fatous lemma it follows that
E(L
t
[T
s
)) = E(lim
n
L
n
t
[T
s
) lim
n
E(L
n
t
[T
s
) = lim
n
L
n
s
= L
s
This is the supermartingale property.
It is now easy to see (next problem) that the martingale property is equivalent to
E(L
T
) = L
0
. 2
15.2 Problem. Let (L
t
)
tT
be a supermartingale which is bounded form below.
Show that it is a martingale iff E(L
T
) = L
0
.
141
142CHAPTER15. EXPONENTIAL MARTINGALE ANDGIRSANOVS THEOREM
15.2 Likelihood processes
Recall that we are working with a xed probability space (, T, P) and a ltration
(T
t
)
0tT
. We assume that T = T
T
since T
T
is the largest -eld which is needed to
study processes adapted to (T
t
)
tT
.
We have a basic probability measure P[T
T
. Now, we consider a second probabilty
measure Q[T
T
.
15.3 Denition. Let P Q be two equivalent probability measures on T
T
. Then
L
t
:= E
_
dQ
dP
T
t
), 0 t T,
is called the likelihood process of Q w.r.t. P.
Note that L
T
=
dQ
dP
since P and Q are considered as probability measures on
T
T
, which implies that
dQ
dP
is T
T
-measurable. The random variables L
t
of the likeli-
hood process have the property of being the Radon-Nikodym derivatives of Q and P
restricted to T
t
.
15.4 Problem. Let P[T
T
Q[T
T
.
(a) Show that L
t
=
dQ[T
t
dP[T
t
, t T.
(b) Let P[T
T
Q[T
T
. Show that the likelihood process (L
t
)
tT
is a positive martin-
gale.
Hint: For proving positivity note that P[T
t
Q[T
t
, t T.
Let Q[T
T
P[T
T
and let (L
t
)
tT
be the likelihood process. Since the likelihood
process is a positive semimartingale it can be written as
L
t
= c(M)
t
= e
M
t
[M]
t
/2
, where dM
t
=
1
L
t
dL
t
.
Since the likelihood process is even a martingale it follows that E(L
T
) = 1 and that
(M
t
)
tT
is a local martingale.
But also the converse is true: Positive martinagles can be used to dene equivalent
probability measures.
t
)
tT
be a continuous local martingale and let L
t
= c(M)
t
.
Assume that E(L
T
) = 1 and dene
Q := L
T
P Q(F) =
_
F
L
T
dP; F T
T
.
Show that Q is a probability measure and that (L
t
)
tT
is the likelihood process of Q
w.r.t. P.
15.3. CHANGE OF PROBABILITY MEASURES 143
For calculation prposes we need a formula for the relation between conditional
expectations w.r.t. Q and w.r.t. P. This is the so-called Bayes-formula.
15.6 Problem. Let (, T, P) be a probability space and let (T
t
)
t0
be a ltration. Let
Q[T
T
P[T
T
be equivalent probability measures and let (L
t
)
tT
be the likelihood
process.
Prove the Bayes-formula:
E
Q
(X[T
s
) =
E
P
(XL
T
[T
s
)
E
P
(L
T
[T
s
)
whenever X is T
T
-measurable and X 0 or X L
1
(Q).
Note that if X is T
t
-measurable, t T, then the Bayes-formula holds with L
T
replaced by L
t
.
15.3 Change of probability measures
In nancial mathematics an important and common method of pricing claims is to
calculate expectations under some probability measure Q which is different from P.
Therefore we have to discuss some substantial features of such a change of measure.
The rst problem is for warming up.
t
)
t0
be a Wiener process on a probability space (, T, P)
and (T
t
)
t0
its internal history. Dene
L
t
:= e
aW
t
a
2
t/2
, t T,
and let Q := L
T
P.
(a) Show that Q[T
T
is equivalent to P[T
T
.
(c) Show that
W
t
:= W
t
at, t T, is a Wiener process under Q.
Hint: Prove that for s < t
E
Q
_
e
(
W
t
W
s
)
T
s
_
= e
2
(ts)/2
.
Now we turn to theory. The rst assertion deals with the inheritance of the semi-
martingale property. Since the semimartingale property is concerned with convergence
in probability the assertion of 15.8 follows from problem 4.27.
15.8 Theorem. Let Q[T
T
P[T
T
. If (X
t
)
tT
is a semimartingale under P then it is
a semimartingale under Q.
Which other properties of a stochastic process do not change under a measure
change ? It is clear that continuity of paths is left unchanged. A remarkable but
plausible fact is that the quadratic variation is invariant under a measure change.
15.9 Theorem. Let Q[T
T
P[T
T
and let (X
t
)
tT
be a continuous semimartingale.
Then the quadratic variation under P coincides with the quadratic variation under Q.
15.10 Problem. Explain how 15.8 and 15.9 follow from 4.27.
Next we consider the question howmartingale properties are inuenced by a change
of probability measures. The basic result in this direction is Girsanovs theorem.
We begin with preliminary assertions.
15.11 Lemma. Let Q P and let (L
t
) be the likelihood process. Then a process
(X
t
) is a local Q-martingale iff (L
t
X
t
) is a local P-martingale.
Proof: It is sufcient to prove the assertion: A process (X
t
) is a Q-martingale iff
(L
t
X
t
) is a P-martingale.
The Bayes formula implies
E
Q
(X
t
[T
s
) =
E
P
(X
t
L
t
[T
s
)
E
P
(L
t
[T
s
)
=
E
P
(X
t
L
t
[T
s
)
L
s
which is equivalent to
E
Q
(X
t
[T
s
)L
s
= E
P
(X
t
L
t
[T
s
)
Hence
E
Q
(X
t
[T
s
) = X
s
E
P
(X
t
L
t
[T
s
) = X
s
L
s
2
15.12 Theorem. (Girsanov)
Let Q[T
T
P[T
T
with continuous likelihood process L
t
= c(Z
t
), t T. If
(X
t
)
tT
is a continuous local P-martingale then X
t
[X, Z]
t
, t T, is a local Q-
martingale.
Proof: In order to show that X
t
[X, Z]
t
is a local Q-a martingale we have to
show that L
t
(X
t
[X, Z]
t
) is a local P-martingale. This is done by integration by
parts. 2
15.13 Problem. Complete the proof of Girsanovs theorem.
Actually Girsanovs theorem provides an assertion on compensators. The com-
pensator of a continuous semimartingale (X
t
) is a continuous FV-process (A
t
) such
that X
t
A
t
is a local martingale. It can be shown that a continuous compensator is
uniquely determined.
15.3. CHANGE OF PROBABILITY MEASURES 145
If (X
t
)
tT
is a continuous local P-martingale then its compensator under P is zero.
On the other hand, we have
X
t
= (X
t
[X, Z]
t
) + [X, Z]
t
, t T.
Girsanovs theorem tells us that X
t
[X, Z]
t
is a local Q-martingale. Therefore
([X, Z]
t
)
tT
is the compensator of (X
t
)
tT
under Q.
Girsanovs theorem is of extremely great practical importance. It provides a for-
mula of the compensator of a semimartingale after a change of measure.
t
)
tT
be a Wiener process under P. Let Q be the probability
measure with likelihood process c(W)
t
, > 0..
(a) Find the compensator (A
t
)
tT
of (W
t
)
tT
under Q.
(b) Explain why the compensated process (W
t
A
t
)
tT
is a Wiener process.
t
)
tT
be a Wiener process and dene X
t
= at + W
t
,
t T. Find a martingale measure, i.e. a probability measure Q such that (X
t
)
tT
is a
Q-martingale.
t
)
tT
be a Wiener process under P and let dX
t
= a(t)dW
t
where a(t) ,= 0 is continuous on [0, T]. Let Q be the probability measure with likeli-
hood process c( W)
t
where (t) > 0 is continuous on [0, T].
(a) Find the compensator (A
t
)
tT
of (X
t
)
tT
under Q.
(b) Find the distribution of (X
t
A
t
)
tT
under Q.
t
) be a Wiener process under P and let dX
t
= a(t)dt +
(t)dW where a(t) and (t) > 0 are continuous on [0, T]. Find a martingale measure,
i.e. a probability measure Q such that (X
t
) is a Q-martingale.
The following problem shows that sometimes a change of measure can make the
drift term of a stochastic differential equation to zero.
t
)
tT
be a Wiener process under P and let (X
t
)
tT
be a
solution of the stochastic differential equation
dX
t
= b(X
t
, t)dt + (X
t
, t)dW
t
, (x, t) > 0.
Dene
dZ
t
=
b(X
t
, t)
(X
t
, t)
dW
t
and let Q := L
T
P with L
t
= c(Z)
t
.
Assume that E
P
(L
T
) = 1 ! (This depends on the properties of b(x, t) and (x, t).)
Show that there is a Q-Wiener process (
W
t
) such that
dX
t
= (X
t
, t)d
W
t
Hint: Show that (X
t
)
tT
is a local Q-martingale and that 1/(X
t
, t)dX
t
is a Q-Wiener
process.
Chapter 16
Martingales in nancial markets
16.1 Pricing in nancial markets
Let / = (S
0
, S
1
, . . . , S
m
) be a market model consisting of m semimartingales on a
ltered probability space (, (T
t
)
tT
, P).
A claim at time t = T is any T
T
-measurable random variable C. The fundamental
problem of mathematical nance is to nd a reasonable price x
0
at time t = 0 for the
claim C.
There are two methods to nd a price x
0
for the claim. The insurance method is to
dene x
0
as the expectation under P of the discounted claim. The risk of this kind of
pricing is controlled by selling a large number of claims. Then by the LLN the average
cost of that set of claims equals x
0
. But this works only if the claims are independent.
That might be true for insurance but not for nancial markets.
The more recent and most important method of pricing is risk neutral pricing
using hedge strategies.
A claim C has an hedge in the market (is attainable) if there is a wealth process
V of some self-nancing trading strategy satisfying V
T
= C. In this case it looks
reasonable to dene x
0
:= V
0
. This is called risk neutral pricing.
It should be noted that the preceding description is of an heuristic nature. Sev-
eral questions have to be answered before one can be sure whether the denition is
reasonable. This is related to the notion of no-arbitrage which is not subject of this
text.
16.2 Pricing in Black-Scholes markets
Let (B
t
, S
t
) be a Black-Scholes market model with B
t
= e
rt
and
dS
t
= S
t
dt + S
t
dW
t
147
148 CHAPTER 16. MARTINGALES IN FINANCIAL MARKETS
where (W
t
) denotes a Wiener process. The discounted market model is (B
t
, S
t
)
where
B
t
= B
t
e
rt
= 1, S
t
= S
t
e
rt
Let C L(T
T
) be an attainable claim. Then there exists a trading strategy
(H
0
t
, H
1
t
) such that
C = V
T
= V
0
+
_
T
0
H
0
s
dB
s
+
_
T
0
H
1
s
dS
s
The risk neutral pricing principle denes V
0
to be the correct price of C at time t = 0.
How can we obtain the price V
0
?
One possibility for obtaining the price is to apply the Black-Scholes equation.
There are powerful methods for solving terminal value problems for partial differential
equations.
However, much simpler is the following probabilistic method which is based on an
equivalent martingale measure. Let us describe this approach step by step.
First, we have to turn to the discounted market model (1, S
t
). We have
Ce
rT
= V
T
= V
0
+
_
T
0
H
1
s
dS
s
Secondly, we have to nd a measure Q which is equivalent to P and is such that (S
t
)
is a Q-martingale. Such a measure is called an equivalent martingale measure.
16.1 Problem. Show that Q = exp
_
W
t
_
P is an equivalent martingale
measure for the Black-Scholes market model.
If the claim C is square integrable w.r.t. Q then the martingale representation
theorem implies that we may assume that (H
1
t
) is such that (V
t
) is a square integrable
Q-martingale. It follows that
V
0
= V
0
= E
Q
(Ce
rT
) = E
Q
(C)e
rT
Thus, we have proved the following result.
16.2 Theorem. Let (B
t
, S
t
) be a Black-Scholes model and let
Q = exp
_
W
t
_
P. Then the following assertions are true:
(1) Every claim C L
2
(T
T
, Q) is attainable by some trading strategy (H
0
t
, H
1
t
)
satisfying
E
_
_
T
0
(H
1
t
)
2
dt
_
<
(2) Every trading strategy of this kind has a value process (V
t
) satisfying V
0
= E
Q
(C)e
rT
.
16.3. PRICING IN DIFFUSION MARKET MODELS 149
16.3 Problem. Fill in the details of the proof of Theorem 16.2.
16.4 Problem. Let (B
t
, S
t
) be a Black-Scholes market model.
(1) Find the risk neutral price for the claim C = (S
T
K)
+
.
(2) Find a self-nancing trading strategy such tha the value process satises V
T
=
C.
16.3 Pricing in diffusion market models
Assume that a market (B
t
, S
t
) consists of a bond B
t
= e
rt
and a stock
dS
t
= b(S
t
, t)S
t
dt + (t, S
t
)S
t
dW
t
where (W
t
) is a Wiener process under P. Let (T
t
) be the inner history of (S
t
). We
want to nd an risk-neutral price for a claim C L(T
T
).
An risk-neutral price is given by the initial wealth for a self-nancing trading strat-
egy which replicates the claim C. So we have to nd such a self-nancing trading
strategy and to determine its initial wealth.
Let S
t
:= S
t
e
rt
. Then
dS
t
= (b(S
t
e
rt
, t) r)S
t
dt + (t, S
t
e
rt
)S
t
dW
t
Let Q be martingale measure for S which can be found by Girsanovs theorem. There
is a Q-Wiener process (
W
t
) such that
dS
t
= (t, S
t
e
rt
)S
t
d
W
t
Any wealth process coming from a self-nancing trading strategy has the form
V
t
= V
t
e
rt
= V
0
+
_
t
0
H
1
s
dS
s
= V
0
+
_
t
0
H
1
s
(s, S
s
e
rs
)S
s
d
W
s
, t T.
The trading strategy replicates the claim C if C = V
T
. Does such a trading strategy
exist ?
Assume that the claimC is square integrable w.r.t. Q. It can be shown that the inner
history of (S
t
) coincides with the inner history of (W
t
) and with the inner history of
(
W
t
). Then by the martingale representation theorem applied to L
2
(, T
T
, Q) there
exists a process (G
t
) such that
Ce
rT
= E
Q
(Ce
rT
) +
_
T
0
G
s
d
W
s
and E
_
_
T
0
G
2
s
ds
_
< .
Therefore, if we dene
H
1
t
=
G
t
(t, S
t
e
rt
)S
t
150 CHAPTER 16. MARTINGALES IN FINANCIAL MARKETS
the corresponding trading strategy satises replicates C. This proves the existence part
of our problem. Note that this was not a constructive argument.
The price of the claim is the initial wealth of the replicating trading strategy
V
0
= e
rT
E
Q
(C).
Thus, all we need for pricing a claim is a martingale measure such that the claim is
square integrable.
Let us have a look at the wealth process. Since (G
t
) gives rise to a square integrable
martingale we have
V
t
= V
0
+
_
t
0
G
s
d
W
s
= E
Q
(V
T
[T
t
)
How can we obtain the trading strategy in an explicit way ?
Since (T
t
) is the inner history of a Markov process we have V
t
= f(S
t
, t) for
some function f. If this function is sufciently smooth we may apply Itos formula
and obtain
V
t
= V
0
+
_
t
0
f
x
(S
s
, s)dS
s
+ FV-processes
Hence, by uniqueness of the compensator and the martingale part of (V
t
) it follows
that
H
1
t
= f
x
(S
t
, t)
and that the FV-processes in Itos formula vanish. The rst assertion is the basis of
delta-hedging and second assertion is a partial differential equation.
Part IV
Appendix
151
Chapter 17
Foundations of modern analysis
Futher reading: Dieudonn, [8].
17.1 Basic notions on set theory
Set operations
Let be a basic set and let A, B, C . . . be subsets. Remember the basic set operations
A B (intersection), A B (union), A
c
(complementation) and their rules.
17.1 Problem. Describe in words de Morgans laws:
(A B)
c
= A
c
B
c
, (A B)
c
= A
c
B
c
Let us denote the difference of sets by A B := A B
c
.
17.2 Problem.
(1) Show that
A (B C) = (A B) (A C)
(2) Expand A (B C).
Set operations can also be applied to innite families of sets, e.g. to a sequence
(A
i
)
i=1
of sets.
17.3 Problem. Describe in words:
(1) Innite unions and intersections:
_
i=1
A
i
,
i=1
A
i
153
154 CHAPTER 17. FOUNDATIONS OF MODERN ANALYSIS
(2) De Morgans laws:
_

_
i=1
A
i
_
c
=
i=1
A
c
i
,
_

i=1
A
i
_
c
=
_
i=1
A
c
i
(3) Describe the elements of the sets
liminf
i
A
i
:=
_
k=1
i=k
A
i
, limsup
i
A
i
:=
k=1
_
i=k
A
i
by the properties: is contained in at most nitely many A
i
, is contained in innitely
many A
i
, is contained in all but nitely many A
i
.
(4) Establish the subset relations between the sets mentioned of (1) and (3).
A sequence (A
i
)
i=1
of sets is increasing (A
i
) if A
1
A
2
A
3
. . . and it is
decreasing (A
i
) if A
1
A
2
A
3
. . . A sequence of sets is a monotone sequence if
it is either increasing or decreasing.
17.4 Problem.
(1) Find the union and the intersection of monotone sequences of sets.
(2) Find liminf and limsup of monotone sequences of sets.
The preceding problems explain why the union of an increasing sequence is called
its limit. Similarly the intersection of a decreasing sequence is called its limit.
17.5 Problem.
(1) Let a < b. Find the limits of
(a, b + 1/n], (a, b 1/n], (a, b + 1/n), (a, b 1/n)
[a + 1/n, b), [a 1/n, b), (a + 1/n, b), (a 1/n, b]
(2) Find the limits of
x : [x[ < 1/n, x : [x[ 1/n, x : [x[ > 1/n, x : [x[ 1/n
x : [x[ < 1 1/n, x : [x[ < 1 + 1/n, x : [x[ 1 1/n, x : [x[ 1 + 1/n
17.6 Problem. Let (A
i
)
i=1
be any sequence of sets. Determine the limits of
B
n
:=
n
_
i=1
A
i
, C
n
:=
n
i=1
A
i
,
17.1. BASIC NOTIONS ON SET THEORY 155
for n .
The set of all subsets of a set A is the power set of A.
17.7 Problem. Let A be a set with N elements. Explain why the power set contains
2
N
elements.
The preceding exercise explains the name of the power set and why the power set
of set A is denoted by 2
A
.
Cartesian products
Let A and B be sets. Then the (Cartesian) product AB is the set of all ordered pairs
(a, b) where a A and b B. This notion is extended in an obvious way to products
of any nite or innite collection of sets. We write A
2
:= A A, A
3
:= A A A
etc.
The elements of a product A
n
are lists (vectors) a = (a
1
, a
2
, . . . , a
n
) whose el-
ements a
i
are called components. For every product of sets there are coordinate
functions
X
i
: A
n
A : a = (a
1
, a
2
, . . . , a
n
) a
i
In this way subsets of A
n
can be described by
(X
i
= b) = a A
n
: a
i
= b, (X
i
= b
1
, X
j
= b
2
) = a A
n
: a
i
= b
1
, a
j
= b
2
17.8 Problem.
(1) Let = 0, 1
3
. Find (X
1
= 0), (X
1
+ X
3
= 1).
(2) Let =
1
, . . . ,
N
. Find the number of elements of
n
.
(3) Let = 0, 1
n
. Find the number of elements of (max X
i
= 1) and (min X
i
= 1).
(4) Let = 0, 1
n
. Find the number of elements of (X
1
+ + X
n
= k).
17.9 Problem.
(1) Let = 0, 1
3
. Find 2
.
(2) Let = 0, 1
n
. Find the number of elements in and in 2
.
The symbol A
N
denotes the set of all innite sequences consisting of elements of
A.
17.10 Problem. Let = 0, 1
N
. Describe by formula:
(1) The set of all sequences in containing no components = 1.
(2) The set of all sequences in containing at least one component = 1.
(3) The set of all sequences in containing at most nitely many components = 1.
(4) The set of all sequences in containing at least innitely many components = 1.
(5) The set of all sequences in where all but at most nitely many components are
= 1.
(6) The set of all sequences in where all components are = 1.
Uncountable sets
An innite set is countable if its elements can be arranged as a sequence. Otherwise
it is called uncountable. Two sets A and B are called equivalent (have equal cardi-
nality) if there is one-to-one correspondence between the elements of A and of B. It is
clear that equivalent innite sets are either both countable or uncountable.
17.11 Problem.
(1) Explain why = 0, 1
N
uncountable.
(2) Explain that R is equivalent to = 0, 1
N
and thus uncountable.
(3) Explain why the power set of a countable set is equivalent to = 0, 1
N
and thus
uncountable.
17.2 Sets and functions
Let X and Y be non-empty sets.
A function f : X Y is a set of pairs (x, f(x)) X Y such that for every
x X there is exactly one f(x) Y . X is the domain of f and Y is the range of f.
A function f : X Y is injective if f(x
1
) = f(x
2
) implies x
1
= x
2
. It is
surjective if for every y Y there is x X such that f(x) = y. If a function is
injective and surjective then it is bijective.
If A X then f(A) := f(x) : x A is the image of A under f. If B Y then
f
1
(B) := x : f(x) B is the inverse image of B under f.
17.12 Problem. Show that:
(a) f
1
(B
1
B
2
) = f
1
(B
1
) f
1
(B
2
).
(b) f
1
(B
1
B
2
) = f
1
(B
1
) f
1
(B
2
).
(c) f
1
(B
c
) = (f
1
(B))
c
(d) Extend (a) and (b) to families of sets.
(a) f(A
1
A
2
) = f(A
1
) f(A
2
).
(b) f(A
1
A
2
) f(A
1
) f(A
2
).
(c) Give an example where inequality holds in (b).
(d) Show that for injective functions equality holds in (b).
(e) Extend (a) and (b) to families of sets.
(a) f(f
1
(B)) = f(X) B
(b) f
1
(f(A)) A
Let f : X Y and g : Y Z. Then the composition g f is the function from
X to Z such that (g f)(x) = g(f(x)).
17.3. THE SET OF REAL NUMBERS 157
17.15 Problem. Let f : X Y and g : Y Z. Show that (g f)
1
(C) =
f
1
(g
1
(C)), C Z.
17.3 The set of real numbers
The set R of real numbers is well-known, at least regarding its basic algebraic opera-
tions. Let us talk about topological properties of R.
The following is not intended to be an introduction to the subject, but a check-
list which should be well understood or otherwise an introductory textbook has to be
consulted.
A subset M R is bounded from above if there is an upper bound of M. It is
bounded from below if there is a lower bound. It is bounded if it is bounded both
from above and from below.
The simplest subsets of R are intervals. There are open intervals (a, b) where the
boundary points a and b are not included, closed intervals [a, b] where the boundary
points are included, half-open intervals [a, b) or (a, b], and so on. Intervals which are
bounded and closed are called compact. Unbounded intervals are written as (a, ),
(, b], and so on.
If a set M is bounded from above then there is always a uniquely determined least
upper bound sup M which is called the supremum of M. This is not a theorem
but the completeness axiom. It is an advanced mathematical construction to show
that there exists R, i.e. a set having the familiar properties of real numbers including
completeness.
Any set M which has a maximal element max M is bounded from above since the
maximum is an upper bound. The maximum is also the least upper bound. The set M
need not have a maximum. The existence of a maximum is equivalent to sup M M.
If M is bounded from below then there is a gretest lower bound inf M called the
inmum of M.
A (open and connected) neighborhood of x R is an open interval (a, b) which
contains x. Note that neighborhoods can be very small, i.e. can have any length > 0.
An (innite) sequence is a function form N R, denoted by n x
n
, for short
(x
n
), where n = 1, 2, . . .. When we say that an assertion holds for almost all x
n
then
we mean that it is true for all x
n
, beginning with some index N, i.e. for x
n
with n N
for some N.
A number x R is called a limit of (x
n
) if every neighborhood of x contains
almost all x
n
. In other words: The sequence (x
n
) converges to x: lim
n
x
n
= x or
x
n
x. A sequence can have at most one limit since two different limits could be put
into disjoint neighborhoods.
A fundamental property of Ris the fact that any bounded increasing sequence has
a limit which implies that every bounded monotone sequence has a limit.
An increasing sequences (x
n
) which is not bounded is said to diverge to (x
n

), i.e. for any a we have x
n
> a for almost all x
n
. Thus, we can summarize:
An increasing sequence either converges to some real number (iff it is bounded) or
diverges to (iff it is unbounded). A similar assertion holds for decreasing sequences.
A simple fact which is an elementary consequence of the order structure says that
every sequence has a monotone subsequence.
Putting terms together we arrive at a very important assertion: Every bounded
sequence (x
n
) has a convergent subsequence. The limit of a subsequence is called
an accumulation point of the original sequence (x
n
). In other words: Every bounded
sequence has at least one accumulation point. An accumulation point x can also be
explained in the follwing way: Every neighborhood of x contains innitely many x
n
,
but not necessarily almost all x
n
. A sequence can have many accumulation points, and
it is not necessarily bounded to have accumulation points. A sequence has a limit iff it
is bounded and has only one accumulation point, which then is necessarily the limit.
If a sequence is bounded from above then the set of accumulation points is also
bounded from above. It is a remarkable fact that in this case there is even a maxi-
mal accumulation point limsup
n
x
n
called limit superior. Similarly a sequence
bounded from from below has a minimal accumulation point liminf
n
x
n
called
limit inferior. A sequence has a limit iff both limit inferior and limit superior exist
and are equal.
There is a popular criterion for convergence of a sequence which is related to the
assertion just stated. Call a sequence (x
n
) a Cauchy-sequence if there exist arbitrar-
ily small intervals containing almost all x
n
. Cleary every convergent sequence is a
Cauchy-sequence. But also the converse is true in view of completeness. Indeed, ev-
ery Cauchy-sequence is bounded and can have at most one accumulation point. By
completeness it has at least one accumulation point, and is therefore convergent.
The set R = [, ] is called the extended real line. If a sequence (x
n
) R
diverges to then we say that lim
n
x
n
= . If it has a subsequence which
diverges to then we say that limsup
n
x
n
= . In both cases we have sup x
n
=
.
There is a interesting convergence criterion which is important for martingale the-
ory.
17.16 Theorem. A sequence (x
n
) R is convergent in R iff it crosses every interval
(a, b) at most a nite number of times.
Proof: Note, that we always have liminf x
n
limsup x
n
where equality holds
iff the sequence is convergent in R. Thus, the sequence is not convergent in R iff
liminf x
n
< limsup x
n
. The last inequality means that for any a < b such that
liminf x
n
< a < b < limsup x
n
the interval (a, b) is crossed innitely often. 2
17.4. REAL-VALUED FUNCTIONS 159
17.4 Real-valued functions
In this section we give an overview over basic facts on real-valued functions as far
these are required for understanding the ideas of integration theory.
Basic denitions
Let ,= by any set. For a subset A the indicator function of A is dened to be
1
A
(x) =
_
1 if x A,
0 if x , A
A function is a simple function if it has only nitely many different values. Every
linear combination of indicator functions is a simple function. A linear combination
of indicator functions is canonical if the sets supporting the indicators are a partition
of and the coefcients are pairwise different.
17.17 Problem. Show that every simple function has a uniquely determined canoni-
cal representation.
17.18 Problem. Let f and g be simple functions. Express the canonical representa-
tion of f + g in terms of the canonical representations of f and g.
17.19 Problem. Show that the set of all simple functions is a vector space (closed
under linear combinations).
Let I R be an interval. A simple function f : I R is a step function if it is a
linear combination of indicators of intervals.
17.20 Problem. Show that every step function f : [a, b] R can be written as
f =
m
i=0
a
i
1
t
i
+
m
i=1
b
i
1
(t
i1
,t
i
)
where a = t
0
< t
1
< . . . < t
m
= b is a subdivision of [a, b].
A function f : [a, b) R has a limit from right at x [a, b) if for every sequence
x
n
x the function values (f(x
n
)) converge to a common limit
f(x+) := lim
n
f(x
n
) R
A function f : (a, b] R has a limit from left at x (a, b] if for every sequence
x
n
x the function values (f(x
n
)) converge to a common limit
f(x) := lim
n
f(x
n
) R
A function is regulated on [a, b] it has limits from right on [a, b) and from left on (a, b].
17.21 Problem. Show that step functions are regulated.
Note, that function limits need not coincide with function values.
A function is continuous from right at x [a, b) if f(x+) = f(x) for every
x [a, b). It is cadlag (continuous from right with limits from left) if it is regulated
and continuous from right on [a, b].
A function is continuous from left at x (a, b] if f(x) = f(x) for every x
(a, b]. It is caglad (continuous from left with limits from right) if it is regulated and
continuous from left on [a, b].
17.22 Problem. Which step functions are cadlag, which are caglad ?
A function is continuous at x (a, b) if f(x+) = f(x) = f(x).
17.23 Problem. Let f be continuous at x (a, b). Show that x
n
x implies
f(x
n
) f(x).
Many facts of integration theory rely on approximation arguments where compli-
cated functions are approximated by simple functions. There are a lot of different
kinds of approximation.
17.24 Denition. A sequence of functions f
n
: R is pointwise convergent to
f : R if
lim
n
f
n
(x) = f(x) for every x .
A sequence of functions f
n
: R is uniformly convergent to f : R if
lim
n
sup
x
[f
n
(x) f(x)[ = 0.
It is convenient to dene
[[f[[
u
= sup
x
[f(x)[
This is called the uniform norm (or the norm of uniform convergence).
Continuous functions
A function f : [a, b] R is continuous on [a, b] if it is continuous at every point of
[a, b]. Let C([a, b] be the set of all continuous functions on [a, b].
17.25 Problem. Show that C([a, b]) is a vector space (is closed under linear combi-
nations).
The following assertion is basic. Note that a function is called bounded if the set
of its function values is bounded.
17.26 Theorem. A continuous function f : [a, b] R (dened on a compact
interval) is bounded. Moreover, there are points x
max
and x
min
such that
f(x
max
) = max
x[a,b]
f(x) and f(x
min
) = min
x[a,b]
f(x)
17.27 Problem. Give an example of an unbounded continuous function, dened on
a bounded interval.
17.28 Theorem. Every continuous function f : [a, b] R (dened on a compact
interval) is even uniformly continuous, i.e.
lim
n
sup
|st|<1/n
[f(s) f(t)[ = 0.
Under pointwise convergence continuity properties of the converging sequence are
not inherited by the limit.
17.29 Problem. Give an example of a sequence of continuous functions which
converges pointwise to a non-continuous limit.
Under uniform convergence continuity properties of the converging sequence are
inherited by the limit.
17.30 Theorem. If a sequence of continuous functions (f
n
) is uniformly convergent
to a limit f then the limit is continuous, too.
Thus we can say that the set C([a, b]) is closed under uniform convergence. More-
over, there are several simple subsets of C([a, b]) which are dense in C([a, b]), i.e. they
can approximate every every element in C([a, b]) via uniform convergence.
17.31 Theorem. (Weierstrass approximation theorem)
Every continuous function f : [a, b] R is the uniform limit of some sequence of
polynomials.
For integration theory it is more important to know how to approximate continuous
functions by stepfunctions. Let f : [a, b] R be any function. The basic idea is to
use subdivisions a = t
0
< t
1
< . . . < t
k
= b and to dene linear combinations of the
form
g =
k
i=1
f(
i
)1
I
i
where the intervals I
i
form an interval partition of [a, b] with separating points t
i
and
[t
i1
, t
i
]. Let us call such a stepfunction a Riemannian approximator of f. Of
special importance are left adjusted approximators
g =
k
i=1
f(t
i1
)1
I
i
and right adjusted approximators
g =
k
i=1
f(t
i
)1
I
i
Note that we leave it completely open which kind of intervals I
i
(open, closed, half-
open) are used.
A sequence of subdivisions a = t
0
< t
1
< . . . < t
k
n
= b is called a Riemannian
sequence of subdivisions if k
n
and max [t
i
t
i1
[ 0.
17.32 Theorem. Let f : [a, b] R be continuous. Then for every Riemannian
sequence of subdivisions any sequence of Riemannian approximators converges uni-
formly to f.
Regulated functions
17.33 Problem. Show that the set of regulated functions on [a, b] is a vector space.
Things are a bit more complicated with regulated functions than with continuous
functions. The good news is that regulated functions always can be approximated by
stepfunctions with respect to uniform convergence. The less good news is that there is
no universal approximation by Riemannian approximators like it is true for continuous
functions.
Let us consider the details.
17.34 Theorem. Let f : [a, b] R be a regulated function. Then for every > 0
there exists a subdivision a = t
0
< t
1
< . . . < t
k
n
= b such that
g =
k
i=0
f(t
i
)1
{t
i
}
+
k
i=1
f(
i
)1
(t
i1
,t
i
)
with
i
(t
i1
, t
i
) satises [[f g[[
u
< .
The idea of the preceding assertion is to choose the points of the subdivision in
such a way that between those points the function variation is small.
There are several important consequences.
17.35 Corollary. Every regulated function on a compact interval is bounded.
17.36 Problem. Discuss the question whether a regulated function has a maximum
or minimum.
17.37 Corollary. Every regulated function is the limit of some uniformly convergent
sequence of Riemannnian approximators.
17.38 Theorem. Let f : [a, b] R be any regulated function. Then every se-
quence of left adjusted Riemannian approximators dened on a Riemannian sequence
of subdivisions converges pointwise to the caglad function f
: t f(t).
Note, that the limit is f if f is continuous fro left.
The variation of functions
Let f : [a, b] R be any function.
17.39 Denition. The variation of f on the interval [s, t] [0, T] is
sup
n
j=1
[f(t
j
) f(t
j1
)[
where the supremum is taken over all subdivisions a = t
0
< t
1
< . . . < t
n
= b and all
n N.
A function f is of bounded variation on [a, b] if V
b
a
(f) < . The set of all
functions of bounded variation is denoted by BV ([a, b]).
17.40 Problem. Let f be differentiable on [a, b] with continuous derivative. Then
f BV )[a, b]) and
V
b
a
(f) =
_
b
a
[f
(u)[du
Hint: Apply the mean value theorem.
17.41 Problem. Show that BV ([a, b]) is a vector space.
17.42 Problem. Show that monotone functions are BV([a,b]) and calculate their
variation.
17.43 Problem. Show that a BV-function f BV ([a, b]) has at most countably
many jumps.
17.44 Problem. Show that any function f BV ([a, b]) can be written as f = g h
where g, h are increasing and satisfy V
t
a
(f) = g(t) + h(t).
Hint: Let g(t) := (V
t
a
(f) + f(t))/2 and h(t) := (V
t
a
(f) f(t))/2.
17.45 Problem. Which BV-functions are Borel-measurable ?
There are continuous functions on compact intervals which are not of bounded
variation. The construction of such functions is complicated.
17.5 Banach spaces
Let V be a vector space.
17.46 Denition. A norm on V is a function v [[v[[, v V , satisfying the
following conditions:
(1) [[v[[ 0, [[v[[ = 0 v = o,
(2) [[v + w[[ [[v[[ +[[w[[, v, w V ,
(3) [[v[[ [[ [[v[[, R, v V .
A pair (V, [[.[[) consisting of a vector space V and a norm [[.[[ is a normed space.
17.47 Example. (1) V = R is a a normed space with [[v[[ = [v[.
(2) V = R
d
is a normed space under several norms. E.g.
[[v[[
1
=
d
i=1
[v
i
[, [[v[[
2
=
_
d
i=1
v
2
i
_
1/2
(Euclidean norm), [[v[[
= max
1id
[v
i
[
(3) Let V = C([0, 1]) be the set of all continuous functions f : [0, 1] R. This is
a vector space. Popular norms on this vector space are
[[f[[
= max
0s1
[f(s)[
and
[[f[[
1
=
_
1
0
[f(s)[ ds
The distance of two elements of V is dened to be
d(v, w) := [[v w[[
This function has the usual properties of a dstance, in particular satises the triangle
inequality. A set of the form
B(v, r) := w V : [[w v[[ < r
is called an open ball around v with radius r. As sequence (v
n
) V is convergent
with limit v if [[v
n
v[[ 0.
17.6. HILBERT SPACES 165
A sequence (v
n
) is a Cauchy-sequence if there exist arbitrarily small balls contain-
ing almost all members of the sequence, i.e.
> 0 N() N such that [[v
n
v
m
[[ < whenever n, m N()
17.48 Denition. A normed space is a Banach space if it is complete, i.e. if every
Cauchy sequence is convergent.
It is clear that R and R
d
are complete under the usual norms. Actually they are
complete under any norm. The situation is completely different with innite dimen-
sional normed spaces.
17.49 Problem. Show that C([0, 1]) is complete under [[.[[
.
17.50 Problem. Show that C([0, 1]) is not complete under [[.[[
1
.
The latter fact is one of the reasons for extending the notion and the range of the
elementary integral.
17.6 Hilbert spaces
A special class of normed spaces are inner product spaces. Let V be a vector space.
17.51 Denition. An inner product on V is a function (v, w) < v, w >, v, w V ,
satisfying the following conditions:
(1) (v, w) < v, w > is linear in both variables,
(2) < v, v) 0, < v, v >= 0 v = o.
A pair (V, < ., . >) consisting of a vector space V and an inner product < ., . > is
an inner product space.
An inner product gives rise to a norm according to
[[v[[ :=< v, v >
1/2
, v V.
17.52 Problem. Show that [[v[[ :=< v, v >
1/2
is a norm.
17.53 Example. (1) V = R is an inner product space with < v, w >= vw. The
corresponding norm is [[v[[ = [v[.
(2) V = R
d
is an inner product space with
< v, w >=
d
i=1
v
i
w
i
The corresponding norm is [[v[[
2
.
(3) Let V = C([0, 1]) be the set of all continuous functions f : [0, 1] R. This is
an inner product space with
< f, g >=
_
1
0
f(s)g(s) ds
The corresponding norm is
[[f[[
2
=
_
_
1
0
f(s)
2
ds
_
1/2
17.54 Denition. An inner product space is a Hilbert space if it is complete under
the norm dened by the inner product.
17.55 Problem. Show that C([0, 1]) is not complete under [[.[[
2
.
Inner product spaces have a geometric structure which is very similar to that of R
d
endowed with the Euclidean inner product. In particular, the notions of orthogonality
and of projections are available on inner product spaces. The existence of orthogonal
projections depends on completeness, and therefore requires Hilbert spaces.
17.56 Problem. Let C be a closed convex subset of an Hilbert space (V, < ., . >)
and let v , C. Show that there exists v
0
C such that
[[v v
0
[[ = min[[v w[[ : w C
Hint: Let := inf[[v w[[ : w C and choose a sequence (w
n
) C such that
[[v w
n
[[ . Apply the parallelogram equality to show that (w
n
) is a Cauchy
sequence.
Bibliography
[1] Heinz Bauer. Probability theory. Translated from the German by Robert B. Bur-
ckel. de Gruyter Studies in Mathematics. 23. Berlin: Walter de Gruyter. xv, 523
p. , 1996.
[2] Heinz Bauer. Measure and integration theory. Transl. from the German by Robert
B. Burckel. de Gruyter Studies in Mathematics. 26. Berlin: de Gruyter. xvi, 230
p. , 2001.
[3] Tomasz R. Bielecki and Marek Rutkowski. Credit risk: Moldelling, valuation
and hedging. Springer Finance. Berlin: Springer. xviii, 500 p. , 2002.
[4] Tomas Bjoerk. Arbitrage Theory in Continuous Time. Oxford University Press,
2004.
[5] Pierre Bremaud. Point processes and queues. Martingale dynamics. Springer
Series in Statistics. New York - Heidelberg - Berlin: Springer- Verlag. XIX, 354
p. , 1981.
[6] Pierre Brmaud. An introduction to probabilistic modeling. Undergraduate Texts
in Mathematics. New York etc.: Springer-Verlag. xvi, 207 p. , 1988.
[7] Pierre Brmaud. Markov chains. Gibbs elds, Monte Carlo simulation, and
queues. Texts in Applied Mathematics. New York, NY: Springer. xviii, 444 p.
, 1999.
[8] Jean Dieudonn. Foundations of modern analysis. Enlarged and corrected print-
ing. New York-London: Academic Press. XV, 387 p. , 1969.
[9] Michael U. Dothan. Prices in nancial markets. New York etc.: Oxford Univer-
sity Press. xv, 342 p. , 1990.
[10] Edwin Hewitt and Karl Stromberg. Real and abstract analysis. A modern treat-
ment of the theory of functions of a real variable. 3rd printing. Graduate Texts
in Mathematics. 25. New York - Heidelberg Berlin: Springer-Verlag. X, 476 p. ,
1975.
167
168 BIBLIOGRAPHY
[11] John C. Hull. Options, futures, and other derivatives. 5th ed. Prentice-Hall
International Editions. Upper Saddle River, NJ: Prentice Hall. xxi, 744 p. , 2003.
[12] P.J. Hunt and J.E. Kennedy. Financial derivatives in theory and practice. Revised
ed. Wiley Series in Probability and Statistics. Chichester: John Wiley & Sons.
xxi, 437 p. , 2004.
[13] Albrecht Irle. Financial mathematics. The evaluation of derivatives. (Finanz-
mathematik. Die Bewertung von Derivaten) 2., berarbeitete und erweiterte Au-
age. Teubner Studienbcher Mathematik. Stuttgart: Teubner. 302 S. , 2003.
[14] Jean Jacod and Albert N. Shiryaev. Limit theorems for stochastic processes. 2nd
ed. Grundlehren der Mathematischen Wissenschaften. 288. Berlin: Springer.,
2003.
[15] Ioannis Karatzas and Steven E. Shreve. Brownian motion and stochastic calculus.
2nd ed. Graduate Texts in Mathematics, 113. New York etc.: Springer-Verlag.
xxiii, 470 p. , 1991.
[16] Ioannis Karatzas and Steven E. Shreve. Methods of mathematical nance. Ap-
plications of Mathematics. Berlin: Springer. xv, 407 p. , 1998.
[17] Marek Musiela and Marek Rutkowski. Martingale methods in nancial mod-
elling. 2nd ed. Stochastic Modelling and Applied Probability 36. Berlin:
Springer. xvi, 636 p. , 2005.
[18] Salih N. Neftci. Introduction to the mathematics of nancial derivatives. 2nd ed.
Orlando, FL: Academic Press. xxvii, 527 p. , 2000.
[19] Philip Protter. Stochastic integration without tears (with apology to P. A. Meyer).
Stochastics, 16:295325, 1986.
[20] Philip E. Protter. Stochastic integration and differential equations. 2nd ed. Ap-
plications of Mathematics 21. Berlin: Springer. xiii, 2004.
[21] A.N. Shiryaev. Probability. Transl. from the Russian by R. P. Boas. 2nd ed. Grad-
uate Texts in Mathematics. 95. New York, NY: Springer-Verlag. xiv, 609 p. ,
1995.
[22] Paul Wilmott. Paul Wilmott on Quanitative Finance, Volume One. John Wiley
and Sons, 2000.
[23] Paul Wilmott. Paul Wilmott on Quanitative Finance, Volume Two. John Wiley
and Sons, 2000.

Introduction To Probability Theory and Stochastic Processes

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Introduction To Probability Theory and Stochastic Processes

Cargado por

Copyright:

Formatos disponibles

VIENNA GRADUATE SCHOOL OF FINANCE (VGSF)

((a, b]) = (b) (a).

(with respect to the Lebesgue

, [f[, and every

d < . If f is -integrable then

) where the measure is de-

are -nite. For -nite measures densitites are

(t) dt and E(g X) =

is nite but not bounded ! The duration of a gambling system

involves innitely many periods n N.

= b) can be obtained immediately from (6.3).

since from (6.3) we also know that

has only two values a and b. In this way we can calculate

). Our idea is to apply Walds

). However, we proved Walds equation only for integrable

is integrable. Thus, in order to

by bounded stopping times. Dene

n is a stopping time. Since it is bounded we may apply Walds

we have to think about the question whether the

) for all bounded stopping times .

) the assertion follows. 2

be a rst passage time of a symmetric random walk with a

Proof: Applying 7.17 to A T

proves the assertion. 2

) for all bounded

) for all bounded

7.3. SOME THEOREMS ON MARTINGALES 69

n. The increments are

: P(FG) = 0 for some G T

. The internal history of the Wiener process does

is measurable and integrable and we have

) the assertion follows. 2

is complete, i.e. that all subsets of negligible sets are

. The whole theory developed so far is not affected by such a completion.

is complete and if the ltration satises the

whenever < . Since X

A whenever < . This

is called the past of .

Note that the nontrivial part is to show T

. The trick is to observe that on ( )

-measurable and bounded. Then Z(M

. Then the process

and f is any bounded continuous function. But this is

is the same as the distribution of a process +

Note, that the second expression means (H X)

Hint: Apply the denition of the stochastic integral.

Hint: Apply 11.1 and part (a) with X replaced by 1

In other words the assertion is true for H = 1

(the market price of risk is constant).

M. If we choose in such a way that H

is a square integrable martingale.

También podría gustarte