3E1 - Probability Notes

Lecture Notes
on
Probability and Statistics
hOg
ain
Joe O
E-mail: johog@maths.tcd.ie
Main Text: Kreyszig; Advanced Engineering Mathematics
Other Texts: Schaum Series, Robert B. Ash, Hayter
Online Notes:
Hamilton.ie EE304, Prof. Friedman Lec-
tures
Probability Function
Definition:
An experiment is an operation with well-defined
outcomes.
Definition:
The sample space S of an experiment is set of
all possible outcomes.

Toss a coin: S = {H, T }
Examples:
Throw a die: S = {1, 2, 3, 4, 5, 6}

Toss a coin twice: S = {HH, HT, T H, T T }.
Toss a coin until H appears and count the number of times it
is tossed: S = {1, 2, 3, ..., }, where means that H never
appears.
Definition:
Any subset of S is called an event.
Let P(S) be the set of all events in S i.e. the collection of all
subsets of S.
Definition:
A probability function on S is a function
P : P(S) [0, 1] such that

(i)
P (S) = 1
(ii) P (A1A2...An...) = P (A1) + P (A2) + ... + P (An) + ...,

2
where A1.A2, ..., An, ... are mutually exclusive i.e.AiAj =

for all i 6= j.
Theorem 1:
Proof:
P () = 0.
For any AS, A = A and A = , so P (A) =
P (A) = P (A) + P (); hence P () = 0.

Theorem 2:
Proof:
P (Ac) = 1 P (A).
AAc = S and AAc = , so 1 = P (S) = P AAc =
P (S) + P (Ac).
Theorem 3:
Proof:
From Theorem we get 2 P (A) = 1 P (Ac) 1.
Theorem 4:
Proof:
P (A) 1, for all AS.
AB P (A) P (B).
B = A(B A) P (B) = P (A) + P (B A)
and P (B A) 0.
Finite Sample Space

An event containing one element is a singleton. If S contains
n elements x1, x2, ..., xn say, and each one has the same probability p of occuring, then 1 = P (S) = P ({x1, x2, ..., xn}) =
P ({x1}{x2}...{xn}}) = P ({x1})+P ({x2})+...+P ({xn}) =
p + p + ... + p = np, so p = n1 . Then, for any AS we have
P (A) =
|A|
|S| ,
where |A| means the number of elements of A.
Conversely, if we define P on S by this formula, then it is easy

to check that P gives a probability function on S.
Example:
A card is selected from a pack of 52 cards. Let
A = { hearts }, B = { face cards }. Then P (A) =

12
52 , P (AB)
3
52
and P (AB) =
13
52
12
52
3
52
13
52 , P (B)
22
52 ,
using
Theorem 6 ( or just count them).

In general singletons need not be equiprobable. Then let P (xi) =
pi for 1 i n. (We write P (xi) for P ({xi}) for convenience). We have P (A) =
n
P
P (xi) and 1 = P (S) =
xi A
P (xi.) We can form a table, called a probability distribu-
i=1
tion table
4
where pi = P (xi) for all 1 i n. Again, going backwards,

the table defines a probability function on S.
Example: Three horses A,B,C race against each other. A is
twice as likely to win as B and B is twice as likely to win as C.
Assuming no deadheats find P (A), P (B), P (C) and P (AC).
Let P (C) = p. Then P (B) = 2p, P (A) = 4p. Hence p + 2p +
4p = 1, so 7p = 1 or p = 71 . Then P (C) = 17 , P (B) =
P (A) = 47 . Also P (BC) = P (B) + P (C) = 73 .
2
7
and
Countably Infinite Sample Space

In this case S = {x1, x2, ..., xn, ...} with P (xi) = pi for all
i 1. Then 1 =
pi .
i=1
Example:
Toss a coin until H appears and count the num-
ber of times it is tossed: S = {1, 2, 3, ..., }, where means

that H never appears. Set P (1) = 21 , P (2) = 14 , ..., P (n) =
1
2n
for all n. (P () = 0.)

Let A = {1, 2, 3}. Then P (A) =
1
2
+ 14 +
1
8
7
8
= the proba-
bility of H in the first 3 throws.

If B = {2, 4, 6, ...}, then the probability of H on an even throw
is P (B) =
1
22
1
24
Note that P (S) =
1
26
+ ... =
1
22
1 12
2
1 1
1
2 + 22 + 24 +...
of H on an odd throw is 23 .
= 13 .
1
2
1 21
= 1 and the probability
Conditional Probability
Let A, E be events in S with P (E) 6= 0.
Definition: The conditional probability of A given E is defined by P (A|E) =
Example:
P (AE)
P (E)
If S is a finite equiprobable space, then P (A|E) =
|AE|
|S
|E|
|S|
Example:
P (AE)
P (E) .
|AE|
|E| .
Pair of dice. S = {(1, 1), (1, 2), ..., (6, 6)}. Find
the probability that one die shows 2 given that the sum is
6. Let A be the probability that one die is 2 and E be the
probability that the sum is 6. Then P (A|E) = 52 .
Note that, in general, P (A E) = P (E)P (A|E).
Example:
Let A, B be events with P (A) = 6, P (B) = 3
and P (A B) = 2. Then
P (A|B) =
2
3 , P (B|A)
2
6 , P (A
B) = 6 + 3 2 =
7, P (Ac) = 1 6 = 4, P (B c) = 1 3 = 7, P (Ac B c) =
3
c c
P ((A B)c) = 1 7 = 3, P (Ac|B c) = 3
7 , P (B |A ) = 4 .
Example:
The probability that a certain flight departs on
time is 8 and the probability that it arrives on time is 9. The

7
probability that it both departs and arrives on time is 78.

Find the probability that
(i)
it arrives on time given that it departed on time,
(ii)
does not arrive on time given that it did not depart on
time.
The sample space may be taken as {(D, A), (D, Ac), (Dc, A), (Dc, Ac)}.
Let B = {(D, A), (D, Ac)} and let C = {(D, A), (Dc, A))}.
Then P (B) = 8, P (C) = 9 and
78
8 ,
(i)
P (C|B) =
(ii)
P (C c|B c) =
P (C c B c )
P (B c )
P (CB)c
P (B c )
1P (CB)
1P (B)
1(8+9+78)
.
18
Suppose that S = A1 A2 ... An, where Ai Aj = for all

i 6= j. We say that the Ai are mutually exclusive and form a
partition of S. Let E S. Then E = E S = E (A1 A2
... An) = (E A1) (E A2) ... (E An), disjoint, So
P (E) = P (EA1)+P (EA2)+...P (EAn) =
n
P
P (EAi).
i=1
Now P (E Ai) = P (Ai)P (E|Ai), for each i, so

P (E) =
n
P
P (Ai)P (E|Ai). This is called the Law of Total
i=1
Probability. We also have P (Aj |E) =

8
P (Aj E)
p(E)
P (EAj )
p(E)
P (Aj )P (E|Aj )
P (E)
P (Aj )P (E|Aj )
n
P
for all j. This is known as Bayes
P (Ai )P (E|Ai )
i=1
Formula or Theorem. We use it if we know all the P (E|Ai).

Example: Three machines X, Y, Z produce items.X produces 50%, 3% of which are defective. Y produces 30%, 4%
of which are defective and Z produces 20%, 5% of which are
defective. Let D be the event that an item is defective. Let
an item be chosen at random.
(i)
(ii)
Find the probability that it is defective.

Given that it is defective, find the probability that it
came from machine X.

Let A1 be the event consisting of elements of X, let A2 be
the event consisting of elements of Y and let A3 be the event
consisting of elements of Z. Then P (A1) = 5, P (A2) = 3
and P (A3) = 2. Also P (D|A1) = 03, P (D|A2) = 04 and
P (D|A3) = 05.
(i)
P (D) = (5)(03) + (3)(04) + (2)(05) = 037 = 3 7%.
(ii)
P (A1|D) =
P (A1 )P (D|A1 )
P (D)
We often use a tree diagram:

9
(5)(03)
037
= 405 = 40 5%.
Example: A hospital has 300 nurses. During the past year

48 of the nurses got a pay rise. At the beginning of the year
the hospital offered a training seminar which was attended by
138 of the nurses. 27 of the nurses who got a pay rise attended
the seminar. What is the probability that a nurse who got a
pay rise attended the seminar?
Let A be the event consisting of nurses who attended the seminar and Let B be the event consisting of nurses who got a
pay rise. Then P (A) =
27
c
138 , P (B |A)
138
300
and P (Ac) =
111
c
138 , P (B|A )
21
162
162
300 .
Also P (B|A) =
and P (B c|Ac) =
141
300 .
138 27
21
Therefore P (B) = ( 300
)( 138 ) + ( 162
300 )( 162 ) =
48
300 ,
obvious from the beginning. Also P (A|B) =
P (A)P (B|A)
P (B)
138 )( 27 )
( 300
138
48
300
27
48 .
10
which is
=
Exercise:
In a certain city 40% vote Conservative, 35%
vote Liberal and 25% vote Independent. During an election

45% of conservative, 40% of Liberal and 60% of Independent
voted. A person is selected at random. Find the probability
that the person voted. If the person voted, find the probability
that the voter is (i) Con. (ii) lib. and (iii) Ind.
11
Independent Events
Definition
Two events A, B S are independent if
P (A B) = P (A)P (B).
If A and B are independent then P (A|B) =
P (AB)
P (B)
P (A)P (B)
P (B)
P (A) i.e. the conditional probability of A given B is the same

as the probability of A. The converse is obviously true.
Note: A, B are mutually exclusive if and only if A B = ,
if and only if P (A B) = 0, hence A, B are not independent
unless either P (A) = 0 or P (B) = 0.
Example:
Pick a card. let A be the event consisting of
hearts and let B be the event consisting of face-cards. Then

P (A) =
13
52 , P (B)
12
52
and P (AB) =
3
52 .
Hence P (AB) =
P (A)P (B) and so A and B are independent events.

Example:
Toss a fair coin three times.
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }

Let A be the event where the first toss is heads, B be the event
where the second toss is heads and let C be the event where
there are exactly two heads in a row. Then P (A) = 84 , P (B) =
12
4
8 , P (C)
= 82 , P (A B) = 28 , P (A C) = 18 , P (B C) = 28 .
Hence A, B are independent, A, C are independent but B, C

are not independent.
Example:
The probability that A hits a target is 41 . The
probability that B hits the target is 25 . Assume that A and B

are independent. What is the probability that either A or B
hits the target?
P (A B) = P (A) + P (B) P (A B) = P (A) + P (B)
P (A)P (B) = 14 + 52 14 25 =
Exercise:
11
20 .
Show that A, B independent A, B c inde-
pendent and hence Ac, B c independent.
13
Product Probability and Independent Trials

Let S = {a1, a2, ..., as} and T = {b1, b2, ..., bt} be the sample
spaces for two experiments. LetPS (ai) = pi and PT (bj ) = qj
for all 1 i s and 1 j t, where PS and PT are the
probability functions on S and T respectively.
Let S T = {(ai, bj )|ai S, bj T }. Define a function P on
P(S T ) by P ({(ai, bj )}) = piqj and addition. Then P is a
probability function on S T :
(i)
piqj 0 for all i, j.
(ii)
P (S T ) = p1q1 +p1q2 +...+p1qt +...+psq1 +psq2 +...+
psqt = p1(q1 + ... + qt) + ... + ps(q1 + ... + qt) = p1 + ... + ps = 1.

(iii)
Obvious by addition.
P above is called the product probability on S T . It is not

the only probability function on S T. We can extend this
definition to the product of any finite number of sample spaces.
Suppose that S T has the product probability P .
Let A = {ai} T and B = S {bj }.
Then P (A) = P ({ai} T ) = PS (ai) PT (T ) = pi 1 = pi
14
and P (B) = P (S {bj }) = PS (S) PT (bj ) = qj . Now

A B = {(ai, bj )} so P (A B) = P {(ai, bj )} = piqj =
P (A)P (B) and hence A and B are independent. Similarly,
any two events of the form C T and S D are independent,
where C S and D T.
Conversely, suppose that P is a probability function on S T
such that P ({ai} T ) = pi and P (S {bj }) = qj for all
ai S and bj T and all sets of this form are independent.
Then P {(ai, bj )} = P (({ai} T ) (S {bj })) =
P ({ai} T ) P (S {bj }) = piqj , so that P must be the
product probability.
We deduce that the product probability is the unique probability on S T with these two independence properties.
Example:
When three horses A, B, C race against each
other their respective probabilities of winning are always 21 , 13

and 16 . Suppose they race twice. Then, assuming independence,
the probability of C winning the first race and A winning the
second race is
1
6
21 =
1
12
etc.
15
Now suppose that we perform the same experiment a number

of times. The sample space S S ... S consists of tuples.
If we assume that the experiments are independent, then the
probability function on this sample space is the product probability. If we do it n times we say that we have n independent
trials.
Example:
P (HT H) =
Toss a coin three times as before. Now e.g.

1
2
12
1
2
1
8
etc. This is the same probability
function as assuming that all triples are equiprobable, as before. Hence we can consider the problem in either of the two
ways.
16
Counting Techniques
Suppose we have n objects. How many permutations of size
1 r < n can be made?
n!
Using the Fundamental Principle of counting gives (nr)!
, which
is written nPr .
How many combinations of size 1 r < n can be made?
Let the answer be nCr . Each of these combinations gives r!
permutations, so nCr r! = nPr . Hence nCr =
nP
r!
n!
(nr)!r! .
If r = 0 or r = n the answer is 1, so if we agree tthat 0! = 1,

we can use the formula for all 0 r n.
A lot of problems in finite probability can be done from first
principles i.e. using boxes.
Example:
The birthday problem: How many people do
we need to ensure that the probability of at least two having

the same birthday is greater than 21 ?
Let the answer be n. Then 1
365 P
n
(365)n
> 12 , so n = 23. For
more on perms. and combs. see Schaum Series.

17
Random Variables
Definition:
Let S be a sample space with probability func-
tion P . A random variable (R.V.) is a function X : S R.

The image or range of X is denoted by RX . If RX is finite
we say that X is a finite or finitely discrete R.V., if RX is
countably infinite we say that X is a countably infinite or
infinitely discrete R.V. and if RX is uncountable we say that
X is an uncountable or continuous R.V.
Example:
(i)
Throw a pair of dice. S = {(1, 1), ..., (6, 6)}.
Let X : S R be the maximum number of each pair and

let Y : S R be the sum of the two numbers. Then RX =
{1, 2, 3, 4, 5, 6} and RY = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. X
and Y are finite R.V.s
(ii)
Toss a coin until H appears. S = {H, T H, T T H, T T T H, ...}.
Let X : S R be the number of times the coin is tossed

or (the number of T s) +1. Then RX = {1, 2, 3, ...}. X is a
countably infinite R.V.
(iii)
A point is chosen on a disc of radius 1.

18
S = {(x, y)|x2 + y 2 1}. Let X R be the distance of

the point from the centre (0, 0). Then RX = [0, 1] and X is a
continuous R.V.
19
Finite Random Variables

Let X : S RX be finite with RX = {x1, x2, ..., xn} say.
X induces a function f on RX by f (xk ) = P (X = xk ) =
p({s S|X(s) = xk }). f is called a probability distribution
function (p.d.f.). Note that f (xk ) 0 and
n
P
f (xk ) = 1.
k=1
We can extend f to all of R by defining f (x) = 0 for all

x 6= x1, x2, ..., xn. We often write f using a table:
Recall the idea of a discrete frequency distribution:
If we let f (xk ) =
fk
,
n
P
fi
the relative frequency, we get a proba-
i=1
bility distribution. Its graph is a bar-chart:
Note:
We are usually interested in the distribution of a
particular R.V. rather than the underlying sample space e.g.

20
the heights or ages of a given set of people.

Example:
X, Y from example (i) above:
Exercise:
A fair coin is tossed three times. Let X be the
number of heads. RX = {0, 1, 2, 3}. Draw the distribution

table for X.
21
Definition:
Let
be the probability distribution of a R.V. X. The expectation or mean of X is defined by

E(X) = x1f (x1) + x2f (x2) + ... + xnf (xn) =
n
P
xk f (xk ).
k=1
This is the same definition as the mean in the case of a frequency distribution.
Example:
X, Y from (i) above again:
1
3
E(X) = 1 36
+ 2 36
+ ... + 6 11
36 = 4 47,
2
1
1
+ 3 36
+ ... + 12 36
= 7.
E(Y ) = 2 36
Note that E(X) need not belong to RX .

We write X or simply , if there is no confusion, for E(X).
E(X) is the weighted average where the weights are the probabilities. We can apply this definition to games of chance:
a game of chance is an experiment with n outcomes, a1, a2, ..., an
and corresponding probabilities p1, p2, ..., pn. Suppose the payout for each ai is wi. We define a R.V. X by X(ai) = wi.
22
Then the average payout is E(X) =
n
P
wipi. We would play
i=1
the game if E(X) > 0.

Example:
A fair die is thrown. If 2, 3 or 5 occur we win
that number of euros. If 1, 4 or 6 occur we lose that number

of euros. Should we play?
We have a distribution table
Then E(X) = 2 16 +3 61 +5 16 1 61 4 16 6 16 = 16 .
Dont play!
Definition X a finite R.V. The variance of X is defined by
var(X) = E((X)2) = (x1 )2f (x1)+(x2 )2f (x2)+...+
2
n
P
(xk )2f (xk ). The standard deviation

k=1
p
of X is defined to be X = var(X). If X is understood, we
(xn ) f (xn) =
just write .
Note:
=
n
P
var(X) = E((X ) ) = =
n
P
(xk )2f (xk )
k=1
(x2k 2xk + 2)f (xk )
k=1
23
n
P
x2k f (xk )
n
P
xk f (xk ) +
k=1
2
k=1
n
P
f (xk )
k=1
= E(X 2) 22 + = E(X)2 2 = E(X 2) (E(X))2.

Example:
2
E(X ) =
6
P
k=1
X, Y from (i) above again:

3
1
x2k f (xk ) = 12 36
+22 36
+...+62 11
36 = 2197.
Hence var(X) = E(X 2) (E(X))2 = 21 97 20 25 = 1 99.

2
E(Y ) =
11
P
k=1
1
2
1
yk2 f (yk ) = 22 36
+32 36
+...+122 36
= 548.
Hence var(Y ) = E(Y 2) (E(Y ))2 = 54 8 49 = 5 8.

Theorem:
If X is a finite R.V. and a, b R, then
(i)
E(aX) = aE(X),
(ii)
E(X + b) = E(X) + b.
Proof: (i)
E(aX) =
n
P
axk f (xk ) = a
k=1
n
P
xk f (xk ) =
k=1
aE(X).
(ii)
E(X+b) =
n
P
(xk +b)f (xk ) =
k=1
n
P
xk f (xk )+b
k=1
n
P
k=1
E(X) + b.
Hence E(aX + b) = E(aX) + b = aE(X) + b.
Theorem:
If X is a finite R.V. and a, b R, then
(i)
var(aX) = a2var(X),
(ii)
var(X + b) = var(X).
24
f (xk ) =
Proof: (i)
var(aX) = E((aX)2)(E(aX))2 = E(a2X 2)
(aE(X))2 = a2E(X 2)a2(E(X))2 = a2(E(X 2)(E(X))2 =

a2var(X). We could also just use the definition of var(X).
var(X + b) = E((X + b)2) (E(X + b))2
(ii)
=
n
P
k=1
n
P
2b
(xk + b)2f (xk ) (E(X) + b)2
(x2k
k=1
n
P
+ 2bxk + b )f (xk ) (E(X) + b) =
xk f (xk ) + b
k=1
n
P
2
n
P
x2k f (xk ) +
k=1
f (xk ) (E(X))2 2bE(X) b2
k=1
= E(X 2) + 2bE(X) + b2 (E(X))2 2bE(X) b2

= E(X 2) (E(X))2 = var(X).
Hence var(aX + b) = var(aX) = a2var(X) and aX+b =
|a|X .
Definition:
If X is a R.V. with mean and standard
deviation , then the standardized R.V. associated with X is

defined as Z =
X
.
E(Z) = E( X
) =
1
var(X)
2
2
2
E(X)
= 0 and var(Z) =
= 1.
So Z has mean 0 and standard deviation 1.
25
Definition:
Let X be a finite R.V. with p.d.f. f (x). The
function F : R R defined by F (x) = P (X x) =

P
f (xk ) is called the cumulative distribution function (c.d.f.)
xk x
of X.
Example:
Suppose X is defined by
Then F (2) = 14 , F (1) = 83 , F (2) = 78 , F (4) = 1.F (x) is obvious for all other x.
F is a step function. In general F is always an increasing

function since f (xk ) 0 for all xk .
Note:
The c.d.f. is more important for continuous distri-
butions (see later).
26
The Binomial Distribution

One of the most important finite distributions is the binomial
distribution. Suppose we have an experiment with only two
possible outcomes, called success and failure i.e. S = {s, f }.
Let P (s) = p and P (f ) = q. Then p + q = 1. Each such
experiment is called a Bernouilli trial. Suppose we repeat
the experiment n times and assume that the trials are independent. The sample space for the n Bernouilli trials is
S S ...S and a typical elelment in the sample space looks
like (a1, a2, ..., an), where each ai = s or f . Define a R.V. X
on this sample space by X(a1, a2, ..., an) = the number of successes in (a1, a2, ..., an). Then RX = {0, 1, 2, ..., n}. The p.d.f.
of X, f (x), is given by f (0) = q n, f (1) = nC1pq n1, f (2) =
n
C2p2q n2 etc. In general f (k) = nCk pk q nk .
Then f (0)+f (1)+...+f (n) =n C0q n +n C1pq n1 +n C2p2q n2 +

... +n Ck pk q nk + ... +n Cnpn = (p + q)n = 1n = 1, by the Binomial Theorem. X is said to have the binomial distribution,
written as B(n, p).
27
Example:
Afair coin is tossed six times. Success is defined
to be heads. Here n = 6, p = 21 , q = 12 . Find the probability of

getting
(i)
exactly two heads,
(ii)
at least four heads,
(iii)
at least one head.
(i)
f (2) = 6C2( 21 )2( 12 )4 =
(ii)
f (4)+f (5)+f (6) = 6C4( 12 )4( 12 )2+ 6C5( 12 )5( 12 )1+ 6C6( 12 )6 =
15
64 .
22
64 .
(iii)
1 f (0) = 1 ( 12 )6 =
Example:
63
64 .
The probability of hitting a target at any time
is 13 . If we take seven shots, what is the probability of

(i)
exactly three hits?
(ii)
at least one hit?
(i)
f (3) = 7C3( 31 )3( 23 )4 = 0 26.
(ii)
1 ( 32 )7 = 0 94.
Example:
Find the number of dice that must be thrown
such that there is a better than even chance of getting at least

28
one six.
Here p = 16 , q = 65 . Let n be the required number. Then
1
( 56 )n
>
1
2,
so
( 65 )n
<
1
2
or
n ln( 56 )
<
ln( 12 )
i.e. n <
ln( 12 )
.
ln( 56 )
Hence n = 4.
Example:
If 20% of the bolts produced by a machine are
defective, find the probability that out of 10 bolts chosen at

random,
(i)
none
(ii)
one
(iii)
greater than two
bolts will be defective. Here n = 10, p = 2, q = 8.

(i) f (0) = (8)10,
10
C1(2)1(8)9,
(ii)
f (1) =
(iii)
1 (f (0) + f (1) + f (2)) = 1 ((8)10+ 10C1(2)1(8)9+
10
C2(2)2(8)8).
29
Continuous Random Variables

Suppose that X is a R.V. on a sample space S whose range RX
is an interval in R or all of R. Then X is called a continuous
R.V. If there exists a piece-wise continuous function
Rb
f : R R such that P (a X b) = f (x)dx for any
a
a < b, where a, b S, then f is called the probability distribution function (p.d.f.) or density function for X.
R
f must satisfy f (x) 0 for all x and
f (x)dx = 1. Note
that P (X = a) = P (a X a) =
define E(X) =
R
Ra
f (x)dx = 0. We
xf (x)dx and var(X) = E((X )2) =
(x )2f (x)dx, where = E(X). As for the finite case,
we can easily show that var(X) = E(X 2) (E(X))2 =

R 2
x f (x)dx 2. Again 2 = var(X).
The cumulative distribution function (c.d.f.) of X is defined

Rx
by F (x) =
f (t)dt. Then P (a x b) = F (b) F (a).
Note that The Fundamental Theorem of Calculus implies that

0
F (x) = f (x) for all x.

30
Example:
X is a R.V. with p.d.f.
x, 0 x < 2
2
f (x) =
0, x < 0, 2 < x
P (1 X 1 5) =
15
R
1
R2
0
x2
2 dx
4
3
x
2 dx
5
16 .
E(X) =
xf (x)dx =
and var(X) = E(X ) (E(X)) =
( 43 )2 = 29 , so =
2
3 .
If x < 0, then F (x) =
R2
Rx
t
2 dt
If 0 x 2, then F (x) =
If 2 < x, then F (x) =
Rx
x3
2 dx
= 0.
t
2 dt
x
R t
2 dt
Rx
0
2
R
0
t
2 dt
t
2 dt
x2
4.
= 1, as we would
expect!
Recall the idea of a grouped frequency distribution:

Example The heights of 1,000 people.
62 63 means 62 h 6331 etc. The relative frequencies
are
50
1,000
P
= 05 etc.
(rel. freqs.) = 1. The graph is now a
histogram:
The area of each rectangle represents the rel. freq. of that

group. The proportion of people of height less than any number is the sum of the areas up to that number. Joining the
midpoints of the tops of the rectangles gives a bell-shaped
curve. For large populations the (relative) frequency function
approximates to such a curve.
The most important continuous random variable is the normal
distribution whose p.d.f. has a bell shape.
Definition
A R.V. is said to be normally distributed if its
p.d.f has the form f (x) =
1 x 2
1 e 2 ( ) .
2
We say that X is
N (, 2) if this f (x) is its p.d.f. Note that f (x) is symmetric

32
about x = and the bigger the the wider the graph of f (x)
is.
The following theorem shows that this f (x) gives a well-defined

p.d.f.
R
Theorem: (i)
(ii)
x2
2
2
12 ( x
)
dx =
dx =
2.
2.
(iii)
If X is N (, 2), then E(X) = and var(X) = 2.
Proof: See tutorial 3.

2
Suppose X is N (, ). Its c.d.f. is F (x) =

and P (a X b) = F (b) F (a) =
1
2
1
2
Rb
Rx
1 v )2
e 2 (
dv
1 v )2
e 2 (
dv.
These integrals cant be found analytically, so they are tabulated numerically. This would have to be done for all values of
33
and . We do so for the case = 0 and = 1 and then use

the standardized normal R.V. Z =
X
.
For = 0, = 1 we write Z for the R.V. and z for the real

variable. We denote the p.d.f of Z by (z) =
Rz 1 u2
1
c.d.f. by (z) = 2
e 2 du.
Consider F (x) =
u =
v
; dv
v = x, u =
1
2
Rx
1 v )2
e 2 (
1 2
1 e 2 z
2
and its
dv, the c.d.f. of X. Let
= du; as v , u and when
x
.
Therefore F (x) =
1
2
1 2
e 2 u du = ( x
).
a
Hence P (a X b) = F (b) F (a) = ( b
) ( ).
From the tables we have: P ( X + ) =

(1) (1) = P (1 Z 1) = 682,
P ( 2 X + 2) =
(2) (2) = P (2 Z 2) = 954, and
P ( 3 X + 3) =
(3) (3) = P (3 Z 3) = 997, etc.
34
The tables give values of (z) for 0 z 3, in steps of 0 01.

(z) = P ( < Z z) and (0) = P ( < Z 0) = 5.
Then (z) = 1 (z) for z < 0.
Suppose that a < b.

(i)
0 a < b.
P (a Z b) = (b) (a).
(ii)
a < 0 < b.
P (a Z b) = (b) (a)
= (b) (1 (a))
= (b) + (a) 1.
(iii)
a < b < 0.
P (a Z b) = (b) (a)
= 1 (b) (1 (a))
= (a) (b).
Example:
For N (0, 1) find

35
(i) P (Z 2 44)
(ii)
P (Z 1 16)
(iii)
P (Z 1)
(iv)
P (2 Z 10)
(i) P (Z 2 44) = (2 44) = 9927.

P (Z 1 16) = 1 P (Z 1 16) =
(ii)
1 (1 16) = 1 877 = 123.

P (Z 1) = 1 P (Z < 1) = 1 (1) = 1 8413 =
(iii)
1587.
(iv)
P (2 Z 10) = (10) (2) = 1 9772 = 0228.
Example:
For N (0, 1) find c if
(i)
P (Z c) = 10%
(ii)
P (Z c) = 5%
(iii)
P (0 Z c) = 45%
(iv)
P (c Z c) = 99%.
(i)
P (Z c) = 1 P (Z c) = 1 (c), so 1 (c) = 1
or (c) = 9, giving c = 1 282.

(ii)
P (Z c) = (c), so (c) = 05, giving c = 1 645.

36
(iii) P (0 Z c) = (c) 5, so (c) = 95, giving

c = 1 645.
(iv)
P (c Z c) = (c) (c) = 2(c) 1, so
2(c) = 1 99 or (c) = 995, giving c = 2 576.

Example:
For X = N (8, 4) find
(i) P (X 2 44)
(ii)
P (X 1 16)
(iii)
P (X 1)
(iv)
P (2 X 10)
(i)
(ii)
P (X 244) = F (244) = ( 2448

) = (82) = 7939.
2
P (X 1 16) = F (1 16) = ( 1168
)=
2
( 98) = 1635.
(iii)
P (X 1) = 1P (X 1) = 1F (1) = 1( 18
2 ) =
1 (1) = 4602.
(iv)
28
P (2 X 10) = F (10)F (2) = ( 108
2 )( 2 ) =
(4 6) (6) = 2743.
Example:
Assume that the distance an athlete throws a
shotputt is a normal R.V. X with mean 17m and standard

37
deviation 2m. Find

(i)
the probability that the athlete throws a distance greater
than 18 5m,
(ii)
the distance d that the throw will exceed with 90% prob-
ability.
(i)
P (X > 18 5) = 1 P (X 18 5) = 1 F (18 5) =
) = 1 (75) = 2266.
1 ( 18517
2
(ii)
P (X > d) = 95, so 1 F (d) = 95 or F (d) = 05.
Hence ( d17
2 ) = 05, so
Example:
d17
2
= 1 645 giving d = 13 88m.
The average life of a stove is 15 years with stan-
dard deviation 2 5 years. Assuming that the lifetime X of the

stoves is nomally distributed find
(i)
The percentage of stoves that will last only 10 years or
less,
(ii)
The percentage of stoves that will last between 16 and
20 years.
(i)
P (X 10) = F (10) = ( 1015

25 ) = (2) = 1 (2) =
0228 = 2 28%.
38
(ii)
P (16 X 20) = F (20) F (16) = ( 2015

25 )
( 1615
25 ) = (2) (4) = 3218.
39
Jointly Distributed Random Variables

Let X, Y be finite R.V.s on the same sample space S with probability function P. Let the range of X be RX = {x1, x2, ..., xn}
and the range of Y be RY = {y1, y2, ..., ym} respectively.
Consider the pair (X, Y ) defined on S by (X, Y )(s) = (X(s), Y (s)).
Then (X, Y ) is a R.V. on S with range RX RY =
{(x1, y1), ..., (x1, ym), (x2, y1), ..., (x2, ym), ..., (xn, y1), ..., (xn, ym)}.
We sometimes call (X, Y ) a vector R.V.
Let Ai = {s S | X(s) = xi} = {X = xi} and
Bj = {s S | Y (s) = yj } = {Y = yj }. We write
Ai Bj = {X = xi, Y = yj }. Define a function
h : RX RY R by h(xi, yj ) = P (Ai Bj )
= P (X = xi, Y = yj ). Then h(xi, yj ) 0 and
h(xi, yj ) =
i,j
1, since the Ai Bj form a partition of S. h is called the joint

probability distribution function of (X, Y ) associated with the
probability function P and X, Y are said to be jointly distributed. Suppose that f and g are the p.d.f.s of X and Y
respectively. What is the connection between f, g and h?
40
m
m
S = m
j=1 Bj , so Ai = Ai S = Ai (j=1 Bj ) = j=1 (Ai Bj ),
disjoint. Therefore f (xi) = P (Ai) = P (m

j=1 (Ai Bj )) =
m
P
P (Ai Bj ) =
j=1
m
P
h(xi, yj ). Similarly g(yj ) =
j=1
n
P
h(xi, yj ).
i=1
f and g are sometimes called the marginal distributions of h.

We often write the joint distribution in a table:
Example:
Throw a pair of dice. LetX(a, b) = max{a, b}
and Y (a, b) = a + b.
41
Definition:
X and Y are independent if h(xi, yj ) = f (xi)g(yj )
for all i, j.
This means that P (X = xi, Y = yj ) = P (X = xi)P (Y = yj )
or P (Ai Bj ) = P (Ai)P (Bj ) for all i, j.
Note that in the above example X and Y are not independent.
If G : R2 R, then we define a R.V.
Definition:
G(X, Y ) on S by G(X, Y )(s) = G(X(s), Y (s)) = G(xi, yj )

with p.d.f. h.
We now define the expectation and variance of G(X, Y ) as
E(G(X, Y )) =
G(xi, yj )h(xi, yj ) and
i,j
var(G(X, Y )) = E(G(X, Y )2) (E(G(X, Y ))2.

Example:
G(x, y) = x + y. Then (X + Y )(s) = X(s) +
P
Y (s) = xi + yj and E(X + Y ) = (xi + yj )h(xi, yj ), etc.
i,j
Theorem: (i)
For any R.V.s X and Y we have E(X+Y ) =
E(X) + E(Y ).
(ii)
If X and Y are independent, then var(X + Y ) =
var(X) + var(Y ). (This is not true in general).

Proof: (i)
E(X + Y ) =
P
(xi + yj )h(xi, yj )
i,j
42
PP
xih(xi, yj ) +
xi
h(xi, yj )+
PP
j
yj
yj h(xi, yj )
h(xi, yj ) =
xif (xi)+
yj g(yj )
= E(X) + E(Y ).
(ii)
First we show that E(XY ) = E(X)E(Y ).
E(XY ) =
xiyj h(xi, yj ) =
i,j
xiyj f (xi)g(yj )
i,j
P
P
= ( xif (xi))( yj g(yj )) = E(X)E(Y ).
i
Now var(X + Y ) = E((X + Y )2) (E(X + Y ))2

= E(X 2 + 2XY + Y 2) (E(X) + E(Y ))2 = E(X 2) +
2E(X)E(Y ) + E(Y 2) (E(X))2 (E(Y ))2 2E(X)E(Y )
= E(X 2) (E(X))2 + E(Y 2) (E(Y ))2 = var(X) + var(Y ).
Important Example:
Consider the Binomial Distribu-
tion B(n, p). The sample space is S S ... S, n times,

where S = {s, f }. For 1 trial, n = 1 define the R.V. X by
X(s) = 1 and X(f ) = 0. Then E(X) = 1 p + 0 q = p
and var(X) = E(X 2) (E(X))2 = p p2 = p(1 p) = pq.
For n trials define X1(a1, a2, ..., an) = X(a1), ...,
Xn(a1, a2, ..., an) = X(an), so that Xi is 1 if s is in the ith
place and 0 if f is in the ith place for all 1 i n.
43
Then E(Xi) = E(X) = p and var(Xi) = var(X) = pq.

Now Let Y = X1 + X2 + ... + Xn, so that Y gives the total
number of successes in the n trials. Then E(Y ) = E(X1) +
E(X2) + ... + E(Xn) = p + p + ... + p = np and var(Y ) =
var(X1) + var(X2) + ... + var(Xn) = npq.
44
Sampling Theory
Suppose that we have an infinite or very large finite sample
space S. This sample space is often called a population. Getting information about the total population may be difficult,
so we consider much smaller subsets of the population, called
samples. We want to get information about the population
by studying the samples. We consider the samples to be random samples i.e. each element of the population has the same
probability of being in a sample.
Example:
Consider the population of Ireland. Pick a per-
son at random and consider the age of this person. Do this

n times. This gives a random sample of size n of the ages of
people in Ireland.
Mathematically the situation is described in the following way:
Let X be a random variable on a sample space S with probability function P and let f (x) be the probability distribution
function of X. Consider the sample space = S S ...S,
(n times) with the product probability function P i.e.
45
P(A1 A2 ... An) = P (A1)P (A2)...P (An).

For each 1 i n define a random variable Xi on by
Xi(s1, s2, ..., sn) = X(si) where (s1, s2, ..., sn) . Then the
probability distribution function of Xi is also f (x) for each
i. The vector random variable (X1, X2, ..., Xn) defined by
(X1, X2, ..., Xn)(s1, s2, ..., sn) = (X1(s1), X2(s2), ..., Xn(sn)) =
(x1, x2, ..., xn) is a random variable on with joint distribution
P(X1 = x1, X2 = x2, ..., Xn = xn) = f (x1)f (x2)...f (xn).
Choosing a sample is simply applying the vector random variable (X1, X2, ..., Xn) to to get a random sample (x1, x2, ..., xn).
Each Xi has the same mean and variance 2 as X and
they are independent, by definition. They are called independent identically distributed random variables (i.i.d.). Functions of the X1, X2, ..., Xn and numbers associated with them
are called statistics, while functions of the original X and associated numbers are called parameters. Our task is to get
information about the parameters by studying the statistics.
The mean and variance 2 of X are called the population
46
mean and variance. We define two important statistics, the

sample mean and sample variance:
Definition:
Sample mean X =
2
sample variance S =
n
P
(Xi X)2
n1
Then X(s1, s2, ..., sn) =

2
S (s1, s2, ..., sn) =
x1 +x2 +...+xn
n
n
P
(xi x)2
n1
X1 +X2 +...+Xn
,
n
= x = S and
We have
Theorem: (i) Expection of X = E(X) = , the population mean,
(ii)
2
Variance of X = X
=
2
n,
the population variance over
n.
Proof: (i) E(X) = E( X1+X2n+...+Xn ) =
++...+
n
(ii)
n
n
E(X1 )+E(X2 )+...+E(Xn )

n
= .
2
X
= var(X) = var( X1+X2n+...+Xn ) = var( Xn1 )+var( Xn2 )+
... + var( Xnn ) =
var(X1 )
n2
var(Xn )
2)
+ var(X
+
=
n
=
2
2
n
n
n2
2
n.
The reason for the n 1 instead of n in the definition of S 2 is

given by the following result.
47
Theorem:
E(S 2) = 2, the population variance.
Proof:
E(S ) = E(
n
P
(Xi X)2
n1
)=
1
n1 E(
n
P
2
1
2
(X
2X
X
+
X
)) =
E(
i
i
n1
1
n1 [
1
n1 [
n
P
1
n
P
1
E(Xi2)
2E((
n
P
1
2
E(Xi2) 2E((nX)(X)) + nE(X )] =

2
+ 2) 2nE(X ) + nE(X )] =
1
2
n1 [n(
+ 2) nE(X )] =
Note:
Xi)X) + nE(X )] =
1
2
n1 [n(
1
n1 [(n
n
P
(Xi X)2) =
1
2
n1 [n(
+ 2) n( n + 2)] =
1) 2)] = 2.
If the mean or expectation of a statistic is equal to
the corresponding parameter, the statistic is called an unbiased estimator of the parameter. Hence X and S 2 are unbiased estimators of and 2 respectively. An estimate of
a population parameter given by a single number is called a
point estimate e.g. if we take a sample of size n and calculate
S =
x1 +x2 +...+xn
n
and S =
n
P
(xi x)2
n1
, then these are unbiased
point estimates of and 2 respectively. We shall, however,

concentrate on interval estimates, where the parameter lies
within some interval, called a confidence interval.
48
Confidence Intervals for

Suppose we have n i.i.d. random variables X1, X2, ..., Xn with
E(Xi) = and var(Xi) = 2 for each 1 i n. Then if
X=
X1 +X2 +...+Xn
,
n
we get E(X) = and var(X) =
2
n.
As before, X1, X2, ..., Xn are jointly distrbuted random variables defined on the product sample space. We have the very
important result:
Central Limit Theorem:
For large n ( 30) the prob-
ability distribution of X is approximately normal with mean

and variance
2
n
i.e. N (, n ) or, in other words,
is
approximately N (0, 1). The larger the n the better the approximation.
Note:
X1, X2, ..., Xn or X need not be normal. If, however,
they are normal, then
is approximately N (0, 1) for all
values of n.
Recall N (0, 1).
49
If we want P (z1 Z z1) = 95%, then (z1) = 97.5%, so

z1 = 1.96. We say that 1.96 Z 1.96 is a 95% confidence
interval for N (0, 1).
Hence P (1.96
1.96) = 95%
P (1.96 n X 1.96 n ) = 95%

P (1.96 n X 1.96 n X) = 95%
P (1.96 n + X 1.96 n + X) = 95%
P (X 1.96 n X + 1.96 n ) = 95%.
If we know this gives us a 95% confidence interval for i.e.
given any random sample there is a 95% probability that lies
within the above interval or we can say with 95% confidence
that is between the two limits of the interval. Put another
way, 95% of samples will have in the above interval.
Example:
A sample of size 100 is taken from a population
with unknown mean and variance 9. Determine a 95% confidence interval for if the sample mean is 5.
Here X = 5, = 3 and n = 100.
P (X 1.96 n X + 1.96 n ) = 95%
50
3
3
P (5 1.96 10
5 + 1.96 10
) = 95%
P (4.412 5.588) = 95%.

Example:
A sample of size 80 is taken from the workers
in a very large company. The average wage of the sample of

workers is 25,000 euro. If the standard deviation of the whole
company is 1,000 euro, construct a confidence interval for the
mean wage in the company at the 95% level.
Here X = 25, 000, = 1, 000 and n = 80.
P (X 1.96 n X + 1.96 n ) = 95%
P (25, 000 1.96
1,000
80
25, 000 + 1.96
1,000
)
80
95% P (24, 781 25, 219) = 95%.

We can have different confidence intervals:
Let be a small percentage ( 5% above). Then
P (X z 2
X + z 2
)
n
= 1 gives an
confidence interval, where (z 2 ) = 1 2 .
51
Note:
For a 95% interval we have (z 2 ) = .975, so z 2 =
1.96 and for a 99% interval (z 2 ) = .995, so z 2 = 2.575.

Example:
Determine a 99% confidence interval for the
mean of a normal population if the population variance is 2 =

4.84, using the sample 28, 24, 31, 27, 22.
(Note that we need normality here since the sample size < 30).
X = 28+24+31+27+22
=
26.4
and
=
4.84 = 2.2. Then
5
P (26.4 2.5752.2
26.4 2.5752.2
) = .99
5
5
P (23.867 28.933) = .99.

Example:
If we have a normally distributed population
with 2 = 9, how large must a sample be if the 95% confidence

interval has length at most 0.4?
In general the length of the confidence interval is
(X + z 2
)
n
(X z 2
)
n
= 2z 2
.
n
For the 95%
3
n
.4, which
confidence interval Z 2 = 1.96, so 2 1.96

gives
21.963
.4
or n = 865.
In all the previous examples we knew 2, the population variance. If that is not so and n 30 we can use S 2 as a point
52
estimate for 2 and assume that X

is approximately N (0, 1).
S
n
Example:
A watch-making company wants to investigate
the average life of its watches. In a random sample of 121

watches it is found that X = 14.5 years and S = 2 years.
Construct a (i) 95%, (ii) 99% confidence interval for .
(i)
2
2
14.5 + 1.96 11
14.5 1.96 11
14.14 14.86.
(ii)
2
2
14.5 2.575 11
14.5 + 2.575 11
14.03 14.97.
Note that the greater the confidence the greater the interval.
If n is small (< 30) this is not very accurate, even if the original
X is normal. In this case we must use the following:
Theorem:
If X1, X2, ..., Xn are independent normally dis-
tributed random variables, each with mean and variance 2,

then the random variable
X
S
n
is a t-distribution with n 1
degrees of freedom.
We denote the number of degrees of freedom n 1 by . For
each the t-distribution is a symmetric bell-shaped distribu53
tion.
For = we get the standard normal distribution N (0, 1).

The statistical tables usually read P (|T | > k) for each .
Example:
= 5. P (|T | > k) = 0.01 Then k = 2.015, so
P (2.015 T 2.015) = 99%.
Example:
A certain population is normal with unknown
mean and variance. A sample of size 20 is taken. The sample

mean is 15.5 and the sample variance is 0.09. Obtain a 99%
confidence interval for , the population mean.
54
Since n = 20 < 30 we must use the t-distribution with =

n 1 = 19. We have X = 15.5 and S 2 = 0.09. For = 19 we
have P (|T | > k) = 0.01 giving k = 2.861.
Now
X
S
20
is a t-distribution with 19 degrees of freedom, so
P (2.861
15.5
0.3
20
2.861) = 99%
P (15.5 2.8610.3
15.5 + 2.8610.3
) = 99%
20
20
P (15.308 15.692) = 99%.

Example:
Five independent measurements, in degrees F,
of the flashpoint of diesel oil gave the reults 144, 147, 146, 142, 144.
Assuming normality, determine a (i) 95%, (ii)99% confidence
interval for the mean flashpoint.
Since n < 30 we must apply the t-distribution. n = 5, so
= 4. We have X =
S2 =
(i)
144+147+146+142+144
5
(.6)2 +(2.4)2 +(1.4)2 +(2.6)2 +(.6)2

5
= 144.6. Also
= 3.8, so S = 1.949.
2.7761.949
P (144.6 2.7761.949
144.6+
) = 95%
5
5
P (142.18 147.02) = 95%.

(ii)
P (144.6
4.6041.949
144.6 +
99% P (140.59 148.61) = 99%.

55
4.6041.949
)
5
Hypothesis Testing
Suppose that a claim is made about some parameter of a population, in our case always the population mean . This claim
is called the null hypothesis and is denoted by H0. Any claim
that differs from this is called an alternative hypothesis, denoted by H1. We must test H0 against H1.
Example:
H0 : = 90.
Possible alternatives are

H1 : 6= 90
H1 : > 90
H1 : < 90
H1 : = 95.
We must decide whether to accept or reject H0. If we reject
H0 when it is in fact true we commit what is called a type I
error and if we accept H0 when it is in fact false we commit a
typeII error. The maximum probability with which we would
be willing to risk a type I error is called the level of significance
of the test, usually 10%, 5% or 1%. We perform a hypothesis
56
test by taking a random sample from the population.

Suppose that we are given H0 : = 0, some fixed value.
(i)
We suspect that 6= 0. This is our H1.
We take a random sample X. We might have X > 0, X < 0

or X = 0. Now if the mean is 0, then X is approximately
N (0, 2) or
X0
There is a 5% probability that X is in either of the end regions of N (0, 2) or, equivalently,
regions of N (0, 1). If our
X0
X0
is in either of the end
is in this rejection region
we reject H0 at th 5% significance level. Otherwise we do not

reject H0. This is called a two-tailed test.
(ii)
We suspect that > 0. This is our H1.
Our X is now > 0 (this is why we suspect that > 0.) We

only check for probability on the right-hand side.
57
Again if the mean is 0, then if our
X0
is in this rejection
region we reject H0 at th 5% significance level. Otherwise we

do not reject H0. This is called a one-tailed test.
Note that a bigger 0 may push X into the non-rejection region.
(iii)
We suspect that < 0. This is our H1.
This is the same as (ii) but on the left.
58
Example:
A battery company claims that its batteries
have an average life of 1,000 hours. In a sample of 100 batteries it was found that X = 985 hours and S = 30 hours.
Test the hypothesis H0 : = 1, 000 hours against the alternative hypothesis H1 : 6= 1, 000 hours at the 5% significance
level, assuming that the lifetime of the batteries is normally
distributed.
n = 100 > 30 so we can take S for . If = 1, 000, then
X
S
n
is approximately N (0, 1). We are interested in extreme values

of X on both sides of = 1, 000 so we use a two-tailed test.
Values of
X
S
n
will be between -1.96 and 1.96 95% of the time.
For our sample
X
S
n
9851,000
30
100
= 5, which is (deep) in the
rejection region. So we reject H0 at the 5% significance level.

There is a 5% probability of a type I error.
59
Example: A researcher claims that 10 year old children

watch 6.6 hours of television daily. In a sample of 100 it was
found that X = 6.1 hours and S = 2.5 hours. Test the hypothesis H0 : = 6.6 hours against the alternative H1 : 6= 6.6
hours at the (i) 5%, (ii)1% significance levels.
n = 100 > 30, so we can take S for .
Then
(i)
If = 6.6, then
S
n
probability 95%. But
is between -1.96 and 1.96 with
S
n
X
S
n
6.16.6
2.5
10
= 2 is in the rejection
region. We reject H0 at the 5% level.
(ii)
If = 6.6, then
X
S
n
is between -2.575 and 2.575 with
probability 99%. But, as above,
X
S
n
6.16.6
2.5
10
= 2, which
now is in the non-rejection region. We do not reject H0 at the

1% level.
60
Example: A manufacturer produces bulbs that are supposed to burn with a mean life of at least 3,000 hours. The
standard deviation of 500 hours. A sample of 100 bulbs is
taken and the sample mean is found to be 2,800 hours. Test
the hypothesis H0 : 3, 000 hours against the alternative
H1 : < 1, 000 hours at the 5% significance level.
In this case if our X value is greater than 3,000 we do not reject
it since it agrees with H0, so we are only interested in extreme
values on the left. We use a one-tailed test. Again
is approximately N (0, 1) and
1.645 with a proba-
bility of 95%. But X = 2, 800, n = 100 and = 500, so

X
2,8003,000
500
10
= 4. Hence we reject H0 at the 95% level.
We also need to use the t-distribution.

Example: We need to buy a length of a certain type of wire.
The manufacturer claims that the wire has a mean breaking
61
limit of 200 kg or more. We suspect that the mean is less. We

have H0 : 200 and H1 : < 200. We take a random
sample of 25 rolls of wire and find that X = 197 kg and
S = 6 kg. Test H1 against H0 at the 5% level, assuming the
breaking limit of the wire is normally distributed.
Here n = 25 < 30, so we must use a t-distribution with = 24.
If the mean is , then
X
S
n
is at-distribution with 24 degrees
of freedom.
P (|T | > 1.711) = 10% and
X
S
n
197200
in the rejection region. We reject H0.
62
6
25
= 2.5, which is

3E1 - Probability Notes

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

3E1 - Probability Notes

Cargado por

Copyright:

Formatos disponibles

Lecture Notes

Hamilton.ie EE304, Prof. Friedman Lec-

An experiment is an operation with well-defined

The sample space S of an experiment is set of

all possible outcomes.

Throw a die: S = {1, 2, 3, 4, 5, 6}

Any subset of S is called an event.

A probability function on S is a function

P : P(S) [0, 1] such that

(ii) P (A1A2...An...) = P (A1) + P (A2) + ... + P (An) + ...,

where A1.A2, ..., An, ... are mutually exclusive i.e.AiAj =

For any AS, A = A and A = , so P (A) =

P (A) = P (A) + P (); hence P () = 0.

AAc = S and AAc = , so 1 = P (S) = P AAc =

From Theorem we get 2 P (A) = 1 P (Ac) 1.

P (A) 1, for all AS.

B = A(B A) P (B) = P (A) + P (B A)

Finite Sample Space

where |A| means the number of elements of A.

Conversely, if we define P on S by this formula, then it is easy

A card is selected from a pack of 52 cards. Let

A = { hearts }, B = { face cards }. Then P (A) =

Theorem 6 ( or just count them).

P (xi) and 1 = P (S) =

P (xi.) We can form a table, called a probability distribu-

where pi = P (xi) for all 1 i n. Again, going backwards,

Countably Infinite Sample Space

Toss a coin until H appears and count the num-

ber of times it is tossed: S = {1, 2, 3, ..., }, where means

for all n. (P () = 0.)

bility of H in the first 3 throws.

Note that P (S) =

= 1 and the probability

If S is a finite equiprobable space, then P (A|E) =

Let A, B be events with P (A) = 6, P (B) = 3

The probability that a certain flight departs on

time is 8 and the probability that it arrives on time is 9. The

probability that it both departs and arrives on time is 78.

it arrives on time given that it departed on time,

does not arrive on time given that it did not depart on

Suppose that S = A1 A2 ... An, where Ai Aj = for all

Now P (E Ai) = P (Ai)P (E|Ai), for each i, so

P (Ai)P (E|Ai). This is called the Law of Total

Probability. We also have P (Aj |E) =

for all j. This is known as Bayes

Formula or Theorem. We use it if we know all the P (E|Ai).

Find the probability that it is defective.

came from machine X.

P (D) = (5)(03) + (3)(04) + (2)(05) = 037 = 3 7%.

We often use a tree diagram:

Example: A hospital has 300 nurses. During the past year

obvious from the beginning. Also P (A|B) =

In a certain city 40% vote Conservative, 35%

vote Liberal and 25% vote Independent. During an election

Two events A, B S are independent if

P (A) i.e. the conditional probability of A given B is the same

Pick a card. let A be the event consisting of

hearts and let B be the event consisting of face-cards. Then

P (A)P (B) and so A and B are independent events.

Toss a fair coin three times.

S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }

Hence A, B are independent, A, C are independent but B, C

The probability that A hits a target is 41 . The

probability that B hits the target is 25 . Assume that A and B