Stats Sample Book

SUNDARAM PUBLISHERS
For free home delivery send us a mail

immediately:
e-mail: singh_shaleena@yahoo.com
MOST FAMOUS Categories
Motivational Strategy Marketing / Branding
Human Resource
Management B General
Management
Communication
Leadership Training
Skills
Finance /
Reference Business Life
Investment
Small Business /
Economics The Economist Books
Enterprenership
General Public Relations Accounting
Research International Industries / Profession
CRM Statistics Insurance
NEW RELEASES :
Common Admision Test (CAT)
Preparation Kit- 2 CDs-Rs. 295/-
IIT-JEE Preparation Kit- 2 CDs-Rs. 295/-
INTERVIEW SKILLS AND MISTAKES Preparation

Kit- 2 CDs-Rs. 295/-
GROUP DISCUSSION Preparation Kit- 2 CDs-Rs.

295/-
Ch - 1 : Page 1
UPSC Preparation Kit- 2 CDs-Rs. 295/-
How to play guitar in 12 days-2 CDs-Rs. 295/-
Hair and skin care guide-2 CDs-Rs. 295/-
The ITALIAN Language Speed Learning Course:

Speak ITALIAN Confidently … in 12 Days or
Less2- CDs-Rs. 295/-
The SPANISH Language Speed Learning

Course: Speak ITALIAN Confidently … in 12
Days or Less-2 CDs-Rs. 295/-
Learn Chess and Checkers-2 CDs-Rs. 295/-
Body Language Magic: How to Read and Make

Body Movements for Maximum Success-2 CDs-
Rs. 295/-
HOW TO STOP SMIKING IN A WEEK-2 CDs-Rs.

295/-
Run For Your Life:

A Joggers Handbook-2 CDs-Rs. 295/-
597 Ready To Use Sales Letters and Business

Forms
2 CDs-Rs. 295/-
How To Be An Ace Athlete: Peak Performance

Secrets Every Aspiring Athlete Should Know-2
CDs-Rs. 295/-
Ch - 1 : Page 2
Recipes From Around The World-2 CDs-Rs. 295/-
THE WONDER BOOK OF BIBLE STORIES-2 CDs-Rs.

295/-
The Speed Reading Course-2 CDs-Rs. 295/-
Lessons in Yoga Exercises-2 CDs-Rs. 295/-
Hathayoga-2 CDs-Rs. 295/-
1 Classification and
Tabulation of Data
1. Data and Data Collection

Collection of useful informations for a particular purpose or field is called data. They are
used for making analysis and interpretations. There are two kinds of data:
1. Primary Data: are the original data which the researcher directly find themselves.
They are not published.
2. Secondary Data: are compiled by one researcher and used by many. They are
published.
2. Classification of Data
Definition: Data is a meaningful information. These informations are related to a
particular field and, such informations are used to make interpretations. They are mostly
in large shape, size and number. It is difficult for a human being to draw conclusions out
of these raw data. Thus on the basis of their features, the data are divided into separate
classes, for easy understanding. This process of distribution of data is called
"Classification of data".
According to Prof. L.R. Connor, "Classification, is the process of arranging things in
groups or classes according to their resemblance and affinities and gives an expression to
the unity of attributes that may subsist among a diversity of individuals.
Basic features and characteristics of classification:
1. Classification is done on the basis of facts.
2. It is the division of whole data in different groups.
Ch - 1 : Page 3
3. Grouping depend upon uniformity of attributes.
4. It is real or imaginary.
Objectives of Classification:
1. To express the mutual relationship.
2. To show the data in a convenient and condensed form.
3. To make the comparative study easier.
4. To clarify the similarities and disparities of different data.
5. It becomes the base for further analysis and interpretation.
6. It makes the data easy to understand.
Types of Classification:
1. Qualitative
2. Quantitative
3. Temporal (Chronological)
4. Spatial (Geographical)
3. Data Presentation
There are three methods available for statistical presentation of data. They are as follows:
1. Text Presentation
2. Tabular Presentation
3. Graphical Presentation
1. Text Presentation: In this method of data presentation the data are shown by
combining together text and figures. Though, it has the advantage of directing attention
but it is not an effective device. It is so because the reader finds it difficult in reading and
understanding. It takes too much of time also.
2. Tabular Presentation or Tabulation of data: It is very essential to present the data in
an appropriate manner because it helps in understanding, making comparison and
drawing conclusions. Thus, to achieve this objective data are arranged in tabular form
after classification. This is known as Tabulation.
According to L.R. Connor, "Tabulation involves the orderly and systematic
resentation of numerical data in a form designed to elucidate the problem under
consideration".
Objects of Tabulation :
1. To present information given by data in a serialised and orderly manner.
2. To present the data in loss space.
3. With the help of tabulation analysis and comparison becomes easier.
4. To provide information at a glance.
5. To show the trend and accuracy of the data.
Guidelines for Tabulation :
The following general point should be kept in the mind while preparing table:
(1) Title (6) Body of the table
(2) Table Number (7) Totals
(3) Captions and stubs (8) Foot Notes
(4) Date (9) Source Note
(5) Head Note
Types of Tables :
Classification of table are done on the following basis:
1. On the basis of purpose, tables are of two types:
Ch - 1 : Page 4
(a) General purpose table: It contains information to be used by public in general
(b) Specific purpose table: It contains information relating to particular purpose or
organisation.
2. On the basis of origin:
(a) Primary Table: Also called original table. This table is made from the data
originally collected, for some purpose.
(b) Secondary table: Also called Derivative table. Interpretation and analysis are
presented in this table.
3. On the basis of construction, tables are of two types:
(a) Simple table: This table consists of the data which shows one characteristics
only.
(b) Complex table: It consists of the figures relating to several characteristics. This
table is further of three types: Double table, Treble table and Manifold table.
4. Frequency Distribution
To make the analysis of data in an easier way and to simplify the various statistical
calculations, the classified data are presented in an appropriate serial order in the form of
a frequency distribution.
In the words of Morris Humburg, "A frequency distribution or frequency table is
simply a table in which the data grouped into classes and the number of cases which fall
in each class are recorded. The number in each class are referred to as frequencies."
Following terms are related to frequency distribution:
Variables, Array, Range, Class Frequency, Class Interval, Class Mark, Size of the class
interval etc.
Methods of constructing class interval:
Class intervals are very essential part of frequency table. The various method of
constructing class interval are as follows:
1. Exclusive Method 3. Open End Class Interval
2. Inclusive Method 4. Class Interval with cumulative frequencies
Example:
Ch - 1 : Page 5
Class-Interval
Exclusive Inclusive Open-end C-I with cumulative

Method Method class interval frequency
Upper limit Value equal to In this

In this method
of one class the upper limit frequencies
the wor d
becomes the of the class- and
‘more than’
lower limit interval is measurement
and ‘less than’
of the next included. of C.I are
is used to a
class There remain cumulated.
a difference of specified limit The word like
one between a ‘below’
C.I and its ‘above’, ‘less
succeeding than’,
class Interval ‘morethan are
For used
e.g.
0 - 10, 10 -20, 0 - 9, 10 - 19, Less than 10, Below 10, Below 20,
20 - 30 and so 20 - 29, 30 -39 or More than Below 30 or Above
on. and so on 10. 10, Above 20, Above
30
5. Statistical Series
According to Prof. Connor. Statistical series means, "If two variable quantities, can be
arranged side by side so that the measurable differences in the one correspond with
measurable difference in the other the result is said to form statistical series." There are
three type of series:
1. Individual Series
2. Discrete Series
3. Continuous Series
1. Individual Series: When the data's are observed individually and are listed as
individual cases, it is a form of individual series. They may be placed in ascending or
descending order.
1Example:
1 Roll 1 2 3 4 5 6 7 8 9 10
No.
Marks 2 8 12 14 45 17 32 29 23 16
2. Discrete Series: In the words of Prof. Boddington, "A discrete variable is one where
the variates (individual value) differ from each other by definite amount".
1 Marks 1 2 3 4 5
Example No. of Students 2 4 7 3 8
3. Continuous Series: A continuity is maintained in value of the variables in a series.
The measurem ent lies between the class interval. According to Prof. Boddington, "In a
continuous series, the variable can take immediate value between the smallest the largest
value in the distribution.
Ch - 1 : Page 6
1 Marks 0-10 10-20 20-30 30-40
Example No. of Students 2 7 1 5
2 Measures of Central
Tendency and Dispersion
1. Measure of Central Tendency

In modern times, the averages have got too much importance in all the spheres of
research. According to Croxton and Cowden, "An average is a single value within the
range of the data that is used to represent all the values in series. Since an average is
somewhere within the range of the data, it is sometimes called a measure of central
value."
Spurr, Kellog and Smith have defined it as "An average is sometimes called a
measure of central tendency because individual values of the variable usually cluster
around it."
2. Types of Averages
In broad sense average can be divided into two kinds:
1. Mathematical Averages
2. Positional Averages
Mathematical Averages: It can further be divided into three types:
(a) Arithmetic Average or Mean.
(b) Geometric Average.
(c) Harmonic Average.
Positional Average: It can further be divided into two types:
(a) Median.
(b) Mode.
Arithmetic Average: Arithmetic average is the most popular and best undertaken measure
of central tendency for a quantitative set of data. The arithmetic mean of a series is
calculated by adding the values of all the scores of the series and divide the sum total by
the number of scores. For example. Past five years purchase are 40, 50, 60, 70, 80, which
increase by 10%
40 + 50 + 60 + 70 + 70 300
Increase in purchase = = = 60%
5 5
Types of Arithmetic Average: There are two types of arithmetic averages.
1. Simple Arithmetic Average: In this all the items are treated alike for calculating
averages.
2. Weighted Arithmetic Average: In this the items are assigned weights and then the
averages are calculated.
Ch - 1 : Page 7
Arithmetic Mean ( x )
Individual Series Discrete Series Continuous Series
Direct method Σx Σ fx Σ fx
(x) = (x) = (x) =
N Σf Σf
where x is the mid
value of the class
Short-cut method Σ fd x Σ fd x Σ fd x
(x) = +A (x) = +A (x) = +A
N Σf Σf
1where dx = x  A 1where dx = x  A where dx = x1  A and
x is the mid value of
the class
Step-deviation Σ fd x Σ fd x i Σ fd x i
method ( x ) = × i +A ( x ) = × i +A ( x ) = × i +A
N Σf Σf
X − A X − A X − A
where dxi = where dxi = where dxi =
i i i
x is the mid-value of
the class
Weighted Arithmetic Average can be calculated by two methods:
Σxw
1. Direct Method: x w =
Σw
where x w = weighted arithmetic average
xw = Sum of the products of variable and weights
w = Sum of the weights
2. Short Cut-method: Symbolically it is represented as:
Σdw
x w=a+
Σw
where, x w = weighted arithmetic mean
a = assumed mean
∑dw = sum of the product of deviation of variable x and weights.
∑w = Sum of the weights
Advantages of Arithmetic Average:
1. Calculation are easily understandable.
2. It is commonly used method.
3. Adaptable to algebric treatment.
4. Observations are the basis of the average.
Disadvantages:
1. Large number of items are there, so the values are distorted.
2. Calculations cannot be done if the items are missing.
Geometric Average G.M.
Geometric average is the proper average to get the real satisfaction. It deals with
quantities that change over a period of time, and we get to now the average rate of
change. The geometric mean g of a set of n positive number x1, x2, x3, .... xn is the nth root
of the product.
Ch - 1 : Page 8
In Symbols, g = n x 1 . x 2 . x 3 . . . . x n
Formula for G.M. in different series:
1. Individual Series:
 Σ lo g x 
g = Antilog  
 n 
2. Discrete and Continuous Series
 Σ f lo g x 
g = Antilog  
 n 
Merits:
1. All the observation are considered in G.M.
2. Algebric manipulation are possible.
3. It is a precise result as it is defined rigidly.
Demerits:
1. Calculation is difficult and is not generally understood.
2. Geometric becomes imaginary, where an item is zero or negative.
Harmonic Mean:
The harmonic mean is the total number of items of a variable divided by the sum of
reciprocals of the value of the variable. In symbols
n
h= 1 1 1 1
+ + + . . . . .+
x1 x2 x3 xn
Where, h is harmonic mean
x1, x2, x3, ....., xn = values of n variables.
n = Total number of variables.
Formula for calculating H.M. in different series:
n
1. Individual series → 1h = 1 1 1 1
+ + + . . . . .+
x1 x2 x3 xn
n
2. Discrete series → h =  1
Σ  f × 
 x
n
3. Continuous series → h =  1  where x is the mid-value of a class
Σ  f × 
 x
Merits:
1. Based upon observations.
2. Appropriate when more weights are to be given to small values.
3. Lends itself to algebric changes.
Demerits:
1. It is a complicated method.
2. No importance is given when both positive and negative values are included.
Median
Ch - 1 : Page 9
According to Connor, "The median is that value of the variable, which divides the group
into two equal parts. One part compromising all values greater and the others all values
less than the median."
Formulas used in determining median are:
1. Individual Series: The numbers are arranged in ascending and descending order
 n + 1  th
M = the size of   1 item
 2 
2. Discrete Series: In this series cumulative frequencies is taken out and then total of
frequencies is divided by two. The figure obtained after division is found out in
cumulative frequencies and then the item in front of it is taken as median.
C . F . o f th e ite m  n + 1
M= = 
2  2 
3. Continuous Series: This series is slightly difficult as we have to locate the class
interval in which the median lies by using the discrete series; but only difference is that
one is not added to it. Then after identifying the class-interval median is determined by
adopting the formula.
n
M = the size of   1th item
2
n
− c
Or M = l1 + 2
× t
f
Where, l1 = lower limit of class interval where group median belongs,
c = cumulative frequency upto the lower limit,
i = length of the class interval to which group median belongs,
f = frequency of the class interval to which group median belongs,n = ∑f
Mode
According to Croxton and Cowden, "The mode of a distribution is the value at the point
around which the items tend to be most heavily concentrated. It may be regarded as the
most typical value of a series.
In individual series → item which appear more than one time is taken as mode.
In discrete series → Grouping method is adopted in which frequencies are grouped
by grouping two items twice and three items thrice and then
analytical table is prepared.
In continuous series → the formula used is
∆1
z = l1 + × i OR
∆1 + ∆ 2
f1 − f0
z = l1 + × (l2 1l1)
2 f1 − f0 − f2
Where, l1 = Lower limit of modal class.
L2 = Upper limit of modal class.
F0 = Frequency of the class preceding the modal class.
f1 = Frequency of modal class.
F2 = Frequency of the succeeding modal class.
∆1 = f1 1 f0.
Ch - 1 : Page 10
∆2 = f1 1 f2.
i = l1 1 l2.
3. Need for finding out dispresion
The central tendency of the series are explained by the averages, but they are unable to
explain the deviations of items of a series from its central tendency. In future when values
are compared they give misleading conclusions. It has got several limitation. Thus for
proper and scientific analysis one has to go beyond the averages and study the measure of
dispersion.
4. Methods of Determining Dispersion
1. Range Method: Range is the most easiest and simplest method. In this method the
highest and lowest limit of the value is determined and the difference between the two is
termed as range.
R=LS
L − S
Coefficient of Range =
L + S
Where, L = Largest limit.
S = Smallest limit.
2. Mean Deviation: As the range method of dispersion does not consider all the
observations, so the mean deviation method is adopted.
Formula for computing mean deviation are:
In individual series:
Σ |d |
1δ =
n
Σ |d m |
1 δm =
n
Where, δ = delta = mean deviation from mean.
δm = M.D. from median.
|d | = deviations ignoring algebric signs.
n = number of items.
In discrete and continuous series:
Σ f |d |
δ =
n
Σ f |d m |
δm =
n
Σ f |d z |
δz =
n
Coefficient of all types of series are:
δ
Coefficient of δ =
a
δ
1 Coefficient of δm = m
M
δ
Coefficient of δz = z
Z
5. Standard Deviations
Ch - 1 : Page 11
It is the most popular and widely used technique of deriving dispersion. It removes the
drawbacks of other techniques. It is the square root of the second moment of dispersion
which is always calculated from arithmetic average.
Formula of standard deviation by direct method:
(i) Individual Series
Σ (d )2
S.D. = 1σ =
n
Where, σ = Standard deviation.
d = Deviation of item.
n = Number of items.
(ii) Discrete and Continuous Series
Σ f (d )2
S.D. = σ =
n
Formula of Standard deviation by short cut method:
(i) Individual Series
2
Σd 2
 Σd 
σ = −  
n  n 
Where, d = x 1a
∑d2 = sum of square of deviations.
∑d = sum of deviation.
n = number of items.
(ii) Discrete Series
When deviations are not taken:
2
Σ fx 2  Σ fx 
S.D. = σ = −  
n  n 
When deviations are taken:
2
Σ fd 2
 Σ fd 
S.D. = σ = −  
n  n 
(iii) Continuous Series
2
Σ fd 2
 Σ fd 
S.D. = σ = −   ×c
n  n 
6. Coefficient of Variation based on Standard Deviation
δ
Coefficient of variation or C.V. = × 100
x
σ = Standard Deviation
x = Mean
Ch - 1 : Page 12
3 Correlation
1. Correlation
We can define correlation as the relationship between two or more variables where with a
change in the value of one variable arises a change in the other variable also. According
to Croxton and Cowden, "When the relationship is of quantitative nature, the appropriate
statistical tool for discovering and measuring the relationship and expressing it in a brief
formula is known as Correlation."
2. Types of Correlation
Correlation are of three types:
1. Positive and Negative Correlation: When the values of two variables vary in the
same direction so that an increase in one variable leads to increase in other variable
and vice-versa. This correlation is said to be positive correlation.
On the other hand, if the values of variable moves in the opposite direction, than
it leads to negative correlation.
2. Linear and Curvi-linear Correlation: Linear correlation exists between the two
variables, when a change in one variable causes a change in other variable in same
ratio.
Whereas Curvi-linear Correlation exists when their is no constant ratio between
the two variables.
3. Simple, Multiple and Partial Correlation: When only two variables are taken into
consideration, there exists simple correlation. On the contrary, when more than two
variables are considered it is multiple correlation.
The correlation is partial when we study the correlation between two variables
neglecting the influence of some other variable in both the variables.
3. Measurement of Correlation
Correlation can be measured by any of the following given methods. They are as follows:
1. Scatter Diagram.
2. Two-way Table.
3. Karl Pearson's Coefficient of Correlation.
4. Spearmen's Rank Correlation.
5. Concurrent Deviation Method.
6. Method of Least Squares.
1. Scatter Diagram:
For Details [Refer Question No. 10 and 11 on Page 2B.64, 2B.65 of Q & A Zone]
2. Two-way Table:
The second method of measuring correlation is the two-way table. In this the correlation,
in a frequency distribution has to draw a table of double entry, showing the values of the
two variables and looks after the distribution of the frequencies in the cells of the table.
Merits and Demerits of Two-way Table

Merits:
Ch - 1 : Page 13
1. This is the simplest method to study the relationship.
2. Every item is deeply analysed and shown.
Demerits:
1. It is very difficult to draw and interpret as the number of items are more.
2. To give a precise degree of correlation is impossible.
3. It is not good for future mathematical treatment.
3. Karl Pearson's Coefficient of Correlation:
The above two discussed methods have no practical utility. Therefore, for practical and
applied purpose such methods are used which are able to express relationship in
quantitative terms. Karl Pearson's Coefficient of correlation is one of such methods. It is
most widely used in practice. It is expressed by the symbol 'r'. Pearson's Coefficient of
Correlation is based upon three assumptions:
1. A large variety of independent causes are operating in the series, so as to give a
normal distribution.
2. The forces so operating are related in a casual way.
3. The relationship between the two series is linear.
'r' can be calculated in various ways, depending upon the choice of the user.
CASE-I: Deviations are taken from arithmetic Mean

Σxy Σxy
r= =
N σ xσ y Σx2 × Σy 2
Where, x = (x − x )
y = (y − y)
x = Standard deviation of variable x
y = Standard deviation of variable y
n = Number of Observations
r = Correlation coefficient
CASE-II: Deviation taken from assumed mean

N Σ d x d y − [( Σ d x ) × ( Σ d y )]
r=
N Σdx 2 − (Σdx)2 N Σdy 2 − (Σdy)2
Where, x = (x − assumed mean).
y = (y − assumed mean).
dx = Sum of deviation of x series from its assumed mean.
dy = Sum of deviation of y series from its assumed mean.
dx2 = Sum of squares of deviation of x series from assumed mean.
dy2 = Sum of squares of deviation of y series from assumed mean.
dxdy = Sum of the product of deviation of x and y series from assumed mean.
CASE-III: Deviation are not taken at all.
N Σ x y − (Σ x )(Σ y )
r=
N Σx − (Σx)2 N Σy 2 − (Σy)2
2
Where, x = Sum of variable x.

y = Sum of variable y.
Ch - 1 : Page 14
xy = Sum of products of variable x and y.
x2 = Sum of squares of values of variable x.
y2 = Sum of squares of values of variable y.
N = Number of observation.
r = Coefficient Correlation.
CASE-IV: Calculation of correlation in grouped distribution.
(Σ x )(Σ y )
Σ fd x d y −
N
r=
( Σ fd x ) 2
( Σ fd y ) 2
Σ fd x −
2
Σ fd y −
2
N N
Where, dx = (x − mean).
dy = (y − mean).
dxdy = Sum of products of frequency with the deviation of both the series.
fdx = Sum of the products of frequency with the deviation of series x.
fdy = Sum of the products of frequency with the deviation of series y.
N = Total of frequencies.
r = Correlation Coefficient.
Merits and Demerits of Pearson's Coefficient of correlation.
Merits:
1. Good for further algebric treatment.
2. Both the direction and degree of the correlation between two variables are measured.
Demerits:
1. Linear relationship is assumed which is not correct.
2. It is tedious to calculate.
3. Chances of misinterpretation is more.
4. Spearman's Rank Coefficient of Correlation
In 1904, Charles Edward Spearman, developed a formula which consists or helps in
determining the coefficient of correlation between the ranks of individual in the two
attributes. It is also called as Rank Correlation.
In this method, the variable are given ranks on the basis of the size of numbers, than
the difference between ranks are obtained by deducting the rank of one series from ranks
of other series. After this correlation is calculated on the basis of squares of these
differences of ranks.
Rank Correlation Formula

CASE-I: When ranks are given:
6Σd 2
r = 1− 3
n − n
Where, d = difference of ranks of two series.
n = number of observations.
r = coefficient of correlation.
CASE-II: When ranks are not given.
6Σd 2
r = 1− 3
n − n
Rank both the series x and y according to magnitude of data and use the formula.
Ch - 1 : Page 15
CASE-III: When two or more value of series have same magnitude resulting in tied
ranks.
 1 1 
6 Σ d 2
+ (m 3
− m ) + (m 3
− m ) + . . . .
R = 1−  12 12 
n3 − n
Where, m = number of items whose ranks are common.
Merits and Demerits of Rank Correlation

Merits:
1. It is easy to apply.
2. It provides a check on the calculation, because the sum of the differences between
ranks is always equal to zero.
3. It is a way of studying qualitative data.
Demerits:
1. Convenient only when the n is small.
2. Not based on full set of information.
5. Concurrent Deviation Method
The cases where trends are not noticed in values or trends have no significance in values,
concurrent deviation methods are used. In most of the cases it gives accurate result. It is
suitable for studying the correlation between short-term fluctuations. In this only the
positive or negative direction are considered not the extent. The formula used under this
method is:
(2 c − n )
rc = ±
n
Where, rc = Coefficient of concurrent deviations.
c = the number of concurrent deviations.
n = the number of pairs of observations.
6. Methods of Least Squares
One of the assumption upon which the Karl Pearson's coefficient of correlation is based is
that there is linear relationship between the variables. By finding out the values of x and y
the linear relationship can be studied. Thus, the value of x and y is found out by least
squares method.
The least square method is used in the following ways:
1. Straight line equation is formed, y = a + bx
2. To find the values of a and b two equations are used:
∑(y) = n(a) + b(∑x) . . . . . . (i)
2
∑(xy) = a∑(x) + b∑(x ) . . . . . . (ii)
3. By putting the values of x and y in above equation we can get values for the equation
y = a + bx
Ch - 1 : Page 16
4. By this we find the linear values of y for value of x
5. Coefficient of correlation is found out by dividing standard deviation of the linear
coefficient of deviation at the original value of y.
4 Regression
1. Regression
Regression means "returning backward" or 'stepping down' or 'going back'. Sir Francis
Galton in 1877 was the first person to study the title of Regression. It is a statistical
technique due to which we are in a position to estimate the unknown values of the other
variable. 'Independent' or 'explaining' variable is the name given to predict the other
variable. The name allotted to the variable whose value is to be predicted is 'dependent' or
'explained' variable.
According to M.M. Blair, "Regression Analysis is a mathematical measure of the
average relationship between two or more variables in the term of the original unit of
data."
2. Kinds of Regression Analysis
There are two types of Regression Analysis:
1. Linear and Curvilinear Regression [Refer to Answer No. 4 on Page 2B.82]
2. Simple and Multiple Regression:
Simple regression is the study and analysis of two variables that is x and y, whereas
multiple regression means the analysis which is done between more than two
variables. In this only one variable is dependent and the others are independent.
3. Methods of Regression Analysis
There are two methods for the calculation of Regression Analysis:
1. Graphical Method 2. Algebraic Method
1. Graphical Method: The first step in this method is to draw a scatter diagram on
which every observation is shown by a dot. The dependent variables are shown on y-
axis and independent variables on x-axis. After this with the help of dots regression
lines are drawn. Regression lines are those lines which depicts the best mean value of
one variable corresponding to the mean values of the other. For example, x
corresponding to mean y and vice-versa. This line is the line which fits best in the
scatter diagram and is used to summarise the data.
2. Algebraic Method: In this method regression lines are shown with the help of
equation formulated that is the regression line of X on Y and the regression line of Y
on X.
These are y = a + bx and
x = a + by
where a is the intercept of the line and b is the slope.
Ch - 1 : Page 17
4. Coefficient of Regression
The coefficient of regression helps in determining the value by which one variable
increases for a unit increase in other variable. As there are two equations of regression
there are two coefficient of regression.
σ x
Coefficient of regression of x on y = r
σ y
σ y
Coefficient of regression of y on x = r
σ x
5. What is Coefficient of Determination?

Coefficient of determination is the square of the coefficient of correlation. It is the
measure of the proportion of explained variance with the total variance.
Coefficient of determination = r2
6. What is Coefficient of Non-determination?
Coefficient of non-determination is the measure of that proportion of total variance in the

dependent variable which is not explained by the independent variable. It is represented
by K2 and is
K2 = 1 − r 2
U n e x p la in e d v a r ia n c e
=
T o ta l v a r ia n c e
5 Probability
1. Probability
The chances of happening or non-happening of an event is termed as probability. The
statement depicts that, there is an element of uncertainty. A numerical measure of
uncertainty is provided by the theory of probability.
In the words of Morris Hamburg, "Probability measures provides the decision maker
in business an in government with the means for quantifying the uncertainties which
affect his choice of appropriate actions."
There are three approaches to the definition of Probability —
(1) Classical or Mathematical Definition
(2) Statistical or Empirical Definition
(3) Modern Definition
Ch - 1 : Page 18
2. Classical or Mathematical Definition of Probability
Definition
If an experiment has 'n' mutually exclusive, equally likely and exhaustive cases, out of
which 'm' are favourable to happening of event 'A' then the probability of happening of
'A' is defined as
m N o . o f c a s e s fa v o u r a b le to A
P(A) = =
n T o ta l ( e x h a u s tiv e ) n o . o f c a s e s
3. Statistical or Empirical Definition
The following statistical or empirical definition of probability has been given by Von
Mises.
If the experiment is repeated a large number of time under essential identical
conditions the limiting value of the ratio of the number of times the event A happens to
the total number of trials of the experiment as the number of trial increases indefinately is
called the probabilities of happening of the event A. Symbolically let P(A) denote the
probabilities of the occurrence of A. Let m be the number of times in which an event A
occurs in a series of n trails then
lim m
P(A) =
n → ∞ n
provide the limit is finite and unique.
Limitation of Statistical Definition:
1. Under the same condition, the experiment is required to be conducted a large number
of times.
2. The relative frequency may not obtain a unique value.
4. Modern Definition
Probability is the limit of the proportion of times that a certain event A will occur in
repeated trials of an experiment. Let S be sample space. Let be the class of events and
let P be a real valued function defined on . Then P is called a probability measure and
P(A) is called the probability of the events A if P satisfies the following axioms.
(1) 0 ≤ P(A) ≤ 1 for every event A belonging to∑.
(2) P(S) = 1
(3) For every finite or infinite sequence of disjoint events A1, A2, . . . ., P(A1 ∪ A2 ∪ ...)
= P(A1) + P(A2) + . . . . .
5. Types of Events
Sub set of a sample space is called an event. There are many types of events:
1. Sure event: On the performance of a random experiment, an event which is sure to
happen is called sure event.
2. Impossible event: On the performance of a random experiment, an event which is
impossible to take place is called impossible event.
3. Simple event: On the performance of a random experiment an event with a single
possible outcome is called simple event.
4. Elementary event: An element of a sample space which show all possible outcomes
of a random experiment is called elementary event.
5. Compound event: On the performance of a random experiment an event with joint
occurrence of two or more simple events is called compound event.
Ch - 1 : Page 19
6. Independent event: Two events are said to be independent, if occurrence or non-
occurrence of one event does not affect the occurrence of the other event.
7. Dependent events: When one event affects the probability of the other then the
second event is said to be dependent on the first.
8. Mutually Exclusive events: Two events are mutually exclusive events if the
occurrence of one prevents the occurrence of another.
9. Overlapping events: When two or more events takes place together we called it
overlapping events.
10. Equally likely events: Two or more events are said to be equally likely if each of
them has equal chance of happening or non-happening in preference to others.
11. Complementary events: An events which consists in the negation of another event is
called complementary event of the latter event. In the experiment of rolling a die the
complementary event of "multiple of 3" is obviously "not a multiple of 3".
P (E 1)P (R / E 1)
P(E1/R) = ….
P (E 1)P (R / E 1) + P (E 2)P (R / E 2)
6 Elements of Theoretical
Distribution
1. Theoretical Distribution
Distribution which are not obtained by actual observation or experiments but are deduced
mathematically based on certain assumption are called as elements of theoretical
distribution.
The following distribution which are most common in use are:
(1) Binomial distribution
(2) Poisson distribution
(3) Normal distribution
Binomial distribution
Assumptions –
(1) The number of trial or 'n' is finite and fixed
(2) In every trial there are two possible outcomes of the event which are mutually
exclusive.
(3) The probability of success p and the probability of failure q (= 1 − p) remains
constant in all the trials.
(4) All the trials are independent of each other i.e one trial is not affected by the other.
Ch - 1 : Page 20
Properties of a Binomial distribution are
1. Standard deviation = n p q
2. Means = np
3. Variance = npq
Poisson distribution
Assumptions –
1. Every event must be independent of any other event.
2. The probability of happening of more than one event in a very small interval is nil.
3. The probability of success for a short time interval is proportional to the length of
time interval.
Constant of Poisson Distribution are-
1. mean = m
2. Standard deviation = m
3. Variance = m
4. Mode = integral part of m; where m is not an integer.
Utility of Poisson-'Distribution-In variety of field poisson distribution is used
1. The number of defective material in a packing.
2. The number of bacteria per unit.
3. The number of accident in a city.
4. The number of person born deaf and dumb in a city.
5. The number of customers arriving at super market.
Normal distribution
Normal distribution is a distribution which takes different values of a continuous
random variable. No work is done upon probability, when dealing with a continuous
random variable. It is useful in business and economic applications because a wide range
of data-set takes the form of normal distribution.
Normal distribution of density function is given by
−(x−µ)
1 2
f(x) = e 2σ
σ 2π
where μ = mean σ = S.D.
7 Sampling & Estimation
1. Regression
Data's are very useful for the purpose of comparison and estimation. There are two ways
of collecting the statistical data. The first one is census enquiry and other is sampling
enquiry. In statistical language, the study of the aggregate of the objects is called
population or universe. This becomes the subject of investigation.
Ch - 1 : Page 21
Following are some of the definitions. Which are considered in the theory of
sampling.
Sample
The part of population which is examined with a view to estimate the characteristics of
that population is called sample.
Sample Survey
The way of choosing only a part of information and then examining it for the purpose of
inqu is known as sample survey.
Sampling
Sampling is a process of generalizing the results and findings from the sample survey and
makes them applicable to every field of enquiry. Sampling is based upon two statistical
Laws, which are as follows:
1. Law of statistical regularity.
2. Law of Incrtia of Large numbers.
1. Law of statistical regularity
This law states that when the samples are selected at random from the population, they
have the same characterstics as that of whole population.
2. Law of Inertia of Large numbers
This law states that when the samples are large in numbers, they are near to that of the
population.
2. Method of Sampling
For getting a sample, there are many methods. Broadly these methods are categorised in
two parts –
1. Random Sampling Methods
2. Non-random sampling methods.
These two methods are further classified again which are as follows:
1. Random Sampling Method
is further categorised in four methods:
(a) Random Sampling
(b) Strarified Sampling
(c) Systematic sampling
(d) Multistage Sampling
2. Non - Random Sampling Method
is further categorised in six methods
(a) Purposive Sampling
(b) Quota Sampling In this method of sampling a quota is arranged for every
investigator. He has to select sample from this quota by his personal skill and
judgement.
(c) Cluster Sampling In this method of sampling population is first divided into
clusters or block, after this from these clusters or blocks the samples are selected
for investigation purpose
(d) Area Sampling In this method of sampling, the total geographical area is
divided into smaller areas and then from these small area some are selected and
they become the sample.
(e) Sequential Sampling In this method of sampling, the sample are taken one after
the another from the population are arranged sequentially. After this they are
Ch - 1 : Page 22
detected one by on until and unless an acceptable result is achieved. The
samples which are unable to reach the result are rejected
(f) Convenience Sampling In this method of sampling, primary data become the
part of sample. They are not very sound.
3. Statistical estimation
Statistical estimation is a theory that deals with the way how to estimate a parameter
(like, dispersion, moments etc.) from the given sampled data. The main object behind the
theory of sampling is to estimate feate for the population from which the sample is
selected.
In this theory there are two types of estimates, such as
1. Point Estimate
2. Interval estimate
4. Sample size
In the purpose of estimation size of the sample plays a very important role, it should
neither be very small in size nor it should be large in size. Because if the size of the
sample are small it will not be beneficial in estimating correct population parameter
where as if the sample size is too large this will lead to wastage of time, money and
energy. Thus, it is good to have the right sample size.
Sample size for estimating mean is
2
 σ .z 
n = 
 E 
Where n = sample size
σ = S.D
z = Confidence level
E = Sampling Error.
Sample size for estimating proportion is
z 2 p (1 − p )
n=
E 2
Ch - 1 : Page 23

Stats Sample Book

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Stats Sample Book

Cargado por

Copyright:

Formatos disponibles

SUNDARAM PUBLISHERS

For free home delivery send us a mail

IIT-JEE Preparation Kit- 2 CDs-Rs. 295/-

INTERVIEW SKILLS AND MISTAKES Preparation

GROUP DISCUSSION Preparation Kit- 2 CDs-Rs.

How to play guitar in 12 days-2 CDs-Rs. 295/-

Hair and skin care guide-2 CDs-Rs. 295/-

The ITALIAN Language Speed Learning Course:

The SPANISH Language Speed Learning

Learn Chess and Checkers-2 CDs-Rs. 295/-

Body Language Magic: How to Read and Make

HOW TO STOP SMIKING IN A WEEK-2 CDs-Rs.

Run For Your Life:

597 Ready To Use Sales Letters and Business

How To Be An Ace Athlete: Peak Performance

THE WONDER BOOK OF BIBLE STORIES-2 CDs-Rs.

The Speed Reading Course-2 CDs-Rs. 295/-

Lessons in Yoga Exercises-2 CDs-Rs. 295/-

Hathayoga-2 CDs-Rs. 295/-

1. Data and Data Collection

Exclusive Inclusive Open-end C-I with cumulative

Upper limit Value equal to In this

1. Measure of Central Tendency

Merits and Demerits of Two-way Table

CASE-I: Deviations are taken from arithmetic Mean

CASE-II: Deviation taken from assumed mean

Where, x = Sum of variable x.

Rank Correlation Formula

Merits and Demerits of Rank Correlation

there are two coefficient of regression.

5. What is Coefficient of Determination?

Coefficient of non-determination is the measure of that proportion of total variance in the

7 Sampling & Estimation

También podría gustarte