Documentos de Académico
Documentos de Profesional
Documentos de Cultura
on
Probability and Statistics
hOg
ain
Joe O
E-mail: johog@maths.tcd.ie
Main Text: Kreyszig; Advanced Engineering Mathematics
Other Texts: Schaum Series, Robert B. Ash, Hayter
Online Notes:
tures
Probability Function
Definition:
outcomes.
Definition:
Examples:
Let P(S) be the set of all events in S i.e. the collection of all
subsets of S.
Definition:
P (S) = 1
P () = 0.
P (Ac) = 1 P (A).
P (S) + P (Ac).
Theorem 3:
Proof:
Theorem 4:
Proof:
AB P (A) P (B).
and P (B A) 0.
|A|
|S| ,
3
52
and P (AB) =
13
52
12
52
3
52
13
52 , P (B)
22
52 ,
using
xi A
i=1
tion table
4
2
7
and
pi .
i=1
Example:
1
2n
1
2
+ 14 +
1
8
7
8
= the proba-
1
22
1
24
1
26
+ ... =
1
22
1 12
2
1 1
1
2 + 22 + 24 +...
of H on an odd throw is 23 .
= 13 .
1
2
1 21
Conditional Probability
Let A, E be events in S with P (E) 6= 0.
Definition: The conditional probability of A given E is defined by P (A|E) =
Example:
P (AE)
P (E)
|AE|
|S
|E|
|S|
Example:
P (AE)
P (E) .
|AE|
|E| .
Pair of dice. S = {(1, 1), (1, 2), ..., (6, 6)}. Find
the probability that one die shows 2 given that the sum is
6. Let A be the probability that one die is 2 and E be the
probability that the sum is 6. Then P (A|E) = 52 .
Note that, in general, P (A E) = P (E)P (A|E).
Example:
and P (A B) = 2. Then
P (A|B) =
2
3 , P (B|A)
2
6 , P (A
B) = 6 + 3 2 =
7, P (Ac) = 1 6 = 4, P (B c) = 1 3 = 7, P (Ac B c) =
3
c c
P ((A B)c) = 1 7 = 3, P (Ac|B c) = 3
7 , P (B |A ) = 4 .
Example:
(ii)
time.
The sample space may be taken as {(D, A), (D, Ac), (Dc, A), (Dc, Ac)}.
Let B = {(D, A), (D, Ac)} and let C = {(D, A), (Dc, A))}.
Then P (B) = 8, P (C) = 9 and
78
8 ,
(i)
P (C|B) =
(ii)
P (C c|B c) =
P (C c B c )
P (B c )
P (CB)c
P (B c )
1P (CB)
1P (B)
1(8+9+78)
.
18
n
P
P (EAi).
i=1
n
P
i=1
P (Aj E)
p(E)
P (EAj )
p(E)
P (Aj )P (E|Aj )
P (E)
P (Aj )P (E|Aj )
n
P
P (Ai )P (E|Ai )
i=1
(ii)
P (A1|D) =
P (A1 )P (D|A1 )
P (D)
(5)(03)
037
= 405 = 40 5%.
138
300
and P (Ac) =
111
c
138 , P (B|A )
21
162
162
300 .
Also P (B|A) =
and P (B c|Ac) =
141
300 .
138 27
21
Therefore P (B) = ( 300
)( 138 ) + ( 162
300 )( 162 ) =
48
300 ,
P (A)P (B|A)
P (B)
138 )( 27 )
( 300
138
48
300
27
48 .
10
which is
=
Exercise:
11
Independent Events
Definition
P (A B) = P (A)P (B).
If A and B are independent then P (A|B) =
P (AB)
P (B)
P (A)P (B)
P (B)
13
52 , P (B)
12
52
and P (AB) =
3
52 .
Hence P (AB) =
4
8 , P (C)
= 82 , P (A B) = 28 , P (A C) = 18 , P (B C) = 28 .
11
20 .
13
(ii)
Obvious by addition.
1
6
21 =
1
12
etc.
15
12
1
2
1
8
function as assuming that all triples are equiprobable, as before. Hence we can consider the problem in either of the two
ways.
16
Counting Techniques
Suppose we have n objects. How many permutations of size
1 r < n can be made?
n!
Using the Fundamental Principle of counting gives (nr)!
, which
is written nPr .
How many combinations of size 1 r < n can be made?
Let the answer be nCr . Each of these combinations gives r!
permutations, so nCr r! = nPr . Hence nCr =
nP
r!
n!
(nr)!r! .
365 P
n
(365)n
Random Variables
Definition:
(i)
19
n
P
f (xk ) = 1.
k=1
If we let f (xk ) =
fk
,
n
P
fi
i=1
Note:
Exercise:
21
Definition:
Let
n
P
xk f (xk ).
k=1
This is the same definition as the mean in the case of a frequency distribution.
Example:
1
3
E(X) = 1 36
+ 2 36
+ ... + 6 11
36 = 4 47,
2
1
1
+ 3 36
+ ... + 12 36
= 7.
E(Y ) = 2 36
n
P
i=1
Then E(X) = 2 16 +3 61 +5 16 1 61 4 16 6 16 = 16 .
Dont play!
Definition X a finite R.V. The variance of X is defined by
var(X) = E((X)2) = (x1 )2f (x1)+(x2 )2f (x2)+...+
2
n
P
just write .
Note:
=
n
P
var(X) = E((X ) ) = =
n
P
k=1
k=1
23
n
P
x2k f (xk )
n
P
xk f (xk ) +
k=1
2
k=1
n
P
f (xk )
k=1
E(X ) =
6
P
k=1
E(Y ) =
11
P
k=1
1
2
1
yk2 f (yk ) = 22 36
+32 36
+...+122 36
= 548.
(i)
E(aX) = aE(X),
(ii)
E(X + b) = E(X) + b.
Proof: (i)
E(aX) =
n
P
axk f (xk ) = a
k=1
n
P
xk f (xk ) =
k=1
aE(X).
(ii)
E(X+b) =
n
P
k=1
n
P
xk f (xk )+b
k=1
n
P
k=1
E(X) + b.
Hence E(aX + b) = E(aX) + b = aE(X) + b.
Theorem:
(i)
var(aX) = a2var(X),
(ii)
var(X + b) = var(X).
24
f (xk ) =
Proof: (i)
(ii)
=
n
P
k=1
n
P
2b
(x2k
k=1
n
P
xk f (xk ) + b
k=1
n
P
2
n
P
x2k f (xk ) +
k=1
k=1
X
.
E(Z) = E( X
) =
1
var(X)
2
2
2
E(X)
= 0 and var(Z) =
= 1.
25
Definition:
xk x
of X.
Example:
Suppose X is defined by
Then F (2) = 14 , F (1) = 83 , F (2) = 78 , F (4) = 1.F (x) is obvious for all other x.
26
Example:
(ii)
(iii)
(i)
(ii)
f (4)+f (5)+f (6) = 6C4( 12 )4( 12 )2+ 6C5( 12 )5( 12 )1+ 6C6( 12 )6 =
15
64 .
22
64 .
(iii)
1 f (0) = 1 ( 12 )6 =
Example:
63
64 .
(ii)
(i)
(ii)
1 ( 32 )7 = 0 94.
Example:
one six.
Here p = 16 , q = 65 . Let n be the required number. Then
1
( 56 )n
>
1
2,
so
( 65 )n
<
1
2
or
n ln( 56 )
<
ln( 12 )
i.e. n <
ln( 12 )
.
ln( 56 )
Hence n = 4.
Example:
none
(ii)
one
(iii)
C1(2)1(8)9,
(ii)
f (1) =
(iii)
10
C2(2)2(8)8).
29
a < b, where a, b S, then f is called the probability distribution function (p.d.f.) or density function for X.
R
f must satisfy f (x) 0 for all x and
f (x)dx = 1. Note
that P (X = a) = P (a X a) =
define E(X) =
R
Ra
f (x)dx = 0. We
Example:
x, 0 x < 2
2
f (x) =
0, x < 0, 2 < x
P (1 X 1 5) =
15
R
1
R2
0
x2
2 dx
4
3
x
2 dx
5
16 .
E(X) =
xf (x)dx =
( 43 )2 = 29 , so =
2
3 .
R2
Rx
t
2 dt
If 0 x 2, then F (x) =
If 2 < x, then F (x) =
Rx
x3
2 dx
= 0.
t
2 dt
x
R t
2 dt
Rx
0
2
R
0
t
2 dt
t
2 dt
x2
4.
= 1, as we would
expect!
are
50
1,000
P
= 05 etc.
(rel. freqs.) = 1. The graph is now a
histogram:
1 x 2
1 e 2 ( ) .
2
We say that X is
about x = and the bigger the the wider the graph of f (x)
is.
Theorem: (i)
(ii)
x2
2
2
12 ( x
)
dx =
dx =
2.
2.
(iii)
1
2
1
2
Rb
Rx
1 v )2
e 2 (
dv
1 v )2
e 2 (
dv.
These integrals cant be found analytically, so they are tabulated numerically. This would have to be done for all values of
33
X
.
Consider F (x) =
u =
v
; dv
v = x, u =
1
2
Rx
1 v )2
e 2 (
1 2
1 e 2 z
2
and its
x
.
Therefore F (x) =
1
2
1 2
e 2 u du = ( x
).
a
Hence P (a X b) = F (b) F (a) = ( b
) ( ).
34
0 a < b.
P (a Z b) = (b) (a).
(ii)
a < 0 < b.
P (a Z b) = (b) (a)
= (b) (1 (a))
= (b) + (a) 1.
(iii)
a < b < 0.
P (a Z b) = (b) (a)
= 1 (b) (1 (a))
= (a) (b).
Example:
(i) P (Z 2 44)
(ii)
P (Z 1 16)
(iii)
P (Z 1)
(iv)
P (2 Z 10)
(ii)
(iii)
1587.
(iv)
Example:
(i)
P (Z c) = 10%
(ii)
P (Z c) = 5%
(iii)
P (0 Z c) = 45%
(iv)
P (c Z c) = 99%.
(i)
P (Z c) = 1 P (Z c) = 1 (c), so 1 (c) = 1
(i) P (X 2 44)
(ii)
P (X 1 16)
(iii)
P (X 1)
(iv)
P (2 X 10)
(i)
(ii)
( 98) = 1635.
(iii)
P (X 1) = 1P (X 1) = 1F (1) = 1( 18
2 ) =
1 (1) = 4602.
(iv)
28
P (2 X 10) = F (10)F (2) = ( 108
2 )( 2 ) =
(4 6) (6) = 2743.
Example:
than 18 5m,
(ii)
the distance d that the throw will exceed with 90% prob-
ability.
(i)
P (X > 18 5) = 1 P (X 18 5) = 1 F (18 5) =
) = 1 (75) = 2266.
1 ( 18517
2
(ii)
Hence ( d17
2 ) = 05, so
Example:
d17
2
less,
(ii)
20 years.
(i)
0228 = 2 28%.
38
(ii)
( 1615
25 ) = (2) (4) = 3218.
39
h(xi, yj ) =
i,j
m
m
S = m
j=1 Bj , so Ai = Ai S = Ai (j=1 Bj ) = j=1 (Ai Bj ),
P (Ai Bj ) =
j=1
m
P
j=1
n
P
h(xi, yj ).
i=1
Example:
and Y (a, b) = a + b.
41
Definition:
for all i, j.
This means that P (X = xi, Y = yj ) = P (X = xi)P (Y = yj )
or P (Ai Bj ) = P (Ai)P (Bj ) for all i, j.
Note that in the above example X and Y are not independent.
If G : R2 R, then we define a R.V.
Definition:
i,j
P
Y (s) = xi + yj and E(X + Y ) = (xi + yj )h(xi, yj ), etc.
i,j
Theorem: (i)
E(X) + E(Y ).
(ii)
E(X + Y ) =
P
(xi + yj )h(xi, yj )
i,j
42
PP
xih(xi, yj ) +
xi
h(xi, yj )+
PP
j
yj
yj h(xi, yj )
h(xi, yj ) =
xif (xi)+
yj g(yj )
= E(X) + E(Y ).
(ii)
E(XY ) =
xiyj h(xi, yj ) =
i,j
xiyj f (xi)g(yj )
i,j
P
P
= ( xif (xi))( yj g(yj )) = E(X)E(Y ).
i
44
Sampling Theory
Suppose that we have an infinite or very large finite sample
space S. This sample space is often called a population. Getting information about the total population may be difficult,
so we consider much smaller subsets of the population, called
samples. We want to get information about the population
by studying the samples. We consider the samples to be random samples i.e. each element of the population has the same
probability of being in a sample.
Example:
Sample mean X =
2
sample variance S =
n
P
(Xi X)2
n1
x1 +x2 +...+xn
n
n
P
(xi x)2
n1
X1 +X2 +...+Xn
,
n
= x = S and
We have
Theorem: (i) Expection of X = E(X) = , the population mean,
(ii)
2
Variance of X = X
=
2
n,
n.
Proof: (i) E(X) = E( X1+X2n+...+Xn ) =
++...+
n
(ii)
n
n
= .
2
X
= var(X) = var( X1+X2n+...+Xn ) = var( Xn1 )+var( Xn2 )+
var(X1 )
n2
var(Xn )
2)
+ var(X
+
=
n
=
2
2
n
n
n2
2
n.
47
Theorem:
Proof:
E(S ) = E(
n
P
(Xi X)2
n1
)=
1
n1 E(
n
P
2
1
2
(X
2X
X
+
X
)) =
E(
i
i
n1
1
n1 [
1
n1 [
n
P
1
n
P
1
E(Xi2)
2E((
n
P
1
2
+ 2) 2nE(X ) + nE(X )] =
1
2
n1 [n(
+ 2) nE(X )] =
Note:
Xi)X) + nE(X )] =
1
2
n1 [n(
1
n1 [(n
n
P
(Xi X)2) =
1
2
n1 [n(
+ 2) n( n + 2)] =
1) 2)] = 2.
If the mean or expectation of a statistic is equal to
the corresponding parameter, the statistic is called an unbiased estimator of the parameter. Hence X and S 2 are unbiased estimators of and 2 respectively. An estimate of
a population parameter given by a single number is called a
point estimate e.g. if we take a sample of size n and calculate
S =
x1 +x2 +...+xn
n
and S =
n
P
(xi x)2
n1
X1 +X2 +...+Xn
,
n
2
n.
As before, X1, X2, ..., Xn are jointly distrbuted random variables defined on the product sample space. We have the very
important result:
Central Limit Theorem:
2
n
is
approximately N (0, 1). The larger the n the better the approximation.
Note:
values of n.
Recall N (0, 1).
49
1.96) = 95%
with unknown mean and variance 9. Determine a 95% confidence interval for if the sample mean is 5.
Here X = 5, = 3 and n = 100.
P (X 1.96 n X + 1.96 n ) = 95%
50
3
3
P (5 1.96 10
5 + 1.96 10
) = 95%
1,000
80
1,000
)
80
X + z 2
)
n
= 1 gives an
51
Note:
X = 28+24+31+27+22
=
26.4
and
=
4.84 = 2.2. Then
5
P (26.4 2.5752.2
26.4 2.5752.2
) = .99
5
5
)
n
(X z 2
)
n
= 2z 2
.
n
3
n
.4, which
21.963
.4
or n = 865.
In all the previous examples we knew 2, the population variance. If that is not so and n 30 we can use S 2 as a point
52
Example:
2
2
14.5 + 1.96 11
14.5 1.96 11
14.14 14.86.
(ii)
2
2
14.5 2.575 11
14.5 + 2.575 11
14.03 14.97.
Note that the greater the confidence the greater the interval.
If n is small (< 30) this is not very accurate, even if the original
X is normal. In this case we must use the following:
Theorem:
X
S
n
is a t-distribution with n 1
degrees of freedom.
We denote the number of degrees of freedom n 1 by . For
each the t-distribution is a symmetric bell-shaped distribu53
tion.
Example:
Example:
X
S
20
P (2.861
15.5
0.3
20
2.861) = 99%
P (15.5 2.8610.3
15.5 + 2.8610.3
) = 99%
20
20
of the flashpoint of diesel oil gave the reults 144, 147, 146, 142, 144.
Assuming normality, determine a (i) 95%, (ii)99% confidence
interval for the mean flashpoint.
Since n < 30 we must apply the t-distribution. n = 5, so
= 4. We have X =
S2 =
(i)
144+147+146+142+144
5
= 144.6. Also
= 3.8, so S = 1.949.
2.7761.949
P (144.6 2.7761.949
144.6+
) = 95%
5
5
P (144.6
4.6041.949
144.6 +
4.6041.949
)
5
Hypothesis Testing
Suppose that a claim is made about some parameter of a population, in our case always the population mean . This claim
is called the null hypothesis and is denoted by H0. Any claim
that differs from this is called an alternative hypothesis, denoted by H1. We must test H0 against H1.
Example:
H0 : = 90.
X0
There is a 5% probability that X is in either of the end regions of N (0, 2) or, equivalently,
regions of N (0, 1). If our
X0
X0
57
X0
is in this rejection
(iii)
58
Example:
have an average life of 1,000 hours. In a sample of 100 batteries it was found that X = 985 hours and S = 30 hours.
Test the hypothesis H0 : = 1, 000 hours against the alternative hypothesis H1 : 6= 1, 000 hours at the 5% significance
level, assuming that the lifetime of the batteries is normally
distributed.
n = 100 > 30 so we can take S for . If = 1, 000, then
X
S
n
X
S
n
X
S
n
9851,000
30
100
59
(i)
If = 6.6, then
S
n
S
n
X
S
n
6.16.6
2.5
10
= 2 is in the rejection
(ii)
If = 6.6, then
X
S
n
X
S
n
6.16.6
2.5
10
= 2, which
60
Example: A manufacturer produces bulbs that are supposed to burn with a mean life of at least 3,000 hours. The
standard deviation of 500 hours. A sample of 100 bulbs is
taken and the sample mean is found to be 2,800 hours. Test
the hypothesis H0 : 3, 000 hours against the alternative
H1 : < 1, 000 hours at the 5% significance level.
In this case if our X value is greater than 3,000 we do not reject
it since it agrees with H0, so we are only interested in extreme
values on the left. We use a one-tailed test. Again
is approximately N (0, 1) and
2,8003,000
500
10
X
S
n
of freedom.
X
S
n
197200
62
6
25
= 2.5, which is