Está en la página 1de 36

Chapter - 4

Fundamentals of Statistical Concepts & Techniques in Quality Control and Improvement


Basic Terminologies
Set of all items that possess a certain characteristic of interest Eg. Average thickness of the plastic cups produced in week no. 23 (10,000)

A subset of population Eg. Selecting 200 plastic cups from the week 23 output

A characteristic of a sample, which is used to make inferences on the population parameters that are unknown Eg. Average thickness of 200 plastic cups is 1mm

Is a characteristic of a population, which describes it Eg. Average thickness of 10,000 cups

Assigning Probabilities
Classical Method
Assigning probabilities based on the assumption of equally likely outcomes.

Relative Frequency Method

Assigning probabilities based on experimentation or historical data.

Subjective Method
Assigning probabilities based on the assignors judgment.

Basics of Probability
Probability of an event describes the chance of occurrence of that event A probability function is bound by 0 and 1
0 for non-occurrence, 1 for occurrence

Set of all outcomes of an experiment is called sample space (S) If each outcome in sample space is likely to happen, then the prob. of event A is given by P(A) = na / N and probability associated with sample space is P(S) = 1

Basics of Probability Contd..

Simple events cannot be broken into other events Compound events are made up of two or more simple events Complementary of an event, say A, implies the occurrence of everything except A. i.e. P(Ac) = 1 P(A)

Additive law defines the probability of the union of 2 or more events (say A & B), i.e. implies A may happen, B may happen or both P(A u B) = P(A) + P(B) P(A n B)

Basics of Probability Contd..

Laws contd..
Multiplicative law defines the probability of the intersection of 2 or more events (say A & B), i.e. implies all the events in the group occurs P(A n B) = P(A).P(B | A) = P(B).P(A | B) P(B | A) represents conditional probability, (i.e., probability that B occurs if A has)

Two events A & B are said to be independent, if the outcome of one has no influence on outcome of other P(B | A) = P(B) and hence P(A n B) = P(A).P(B)

Basics of Probability Contd..

Mutually Exclusive
Two events A & B are said to be mutually exclusive, if they cannot happen simultaneously. Probability of Intersection P(A n B) = 0 and probability of union P(A u B) = P(A) + P(B) For mutually exclusive, the events A & B are dependent. If A & B are independent, the additive rule will be P(A or B or both) = P(A) + P(B) P(A).P(B)

Statistics is the science that deals with the collection, classification, analysis and making of inferences from data

Descriptive Statistics
Describes the characteristics of product or process using information collected on it

Inferential Statistics
Draws conclusion on unknown process parameters based on sample information

Data Collection
Direct observation Indirect observation (Questionnaires)
No control over data and Chances of Error is more

It can be described by random variable continuous or discrete

Statistics Contd..
Continuous variable
Variable that can assume any value on a continuous scale within a range Eg. Viscosity of a resin

Discrete variable
Variable that can assume a finite number of values are called discrete Eg. No. of defect in a shirt They are classified as acceptable or not Continuous characteristic can also be viewed as discrete. Eg. Diameter of a hub in a tire

Refers to the degree of uniformity of the observations around a desired value, such that on average, target is realized

Refers to the degree of variability of observation


Measures of Scale
Nominal Scale
Data variables are simply labels to identify an attribute Eg. Critical / Major / Minor

Ordinal Scale
Data has the properties of nominal data Data ranks or orders the observation Eg. Grades, 1- outstanding, 5 poor

Interval Scale
Data has the properties of ordinal data and a fixed unit of measure describes the interval between observations Eg. Temp. of water in diff stages of cooling during 2 hrs interval

Ratio Scale
Data has the properties of Interval data and a natural zero exists for measurement scale Eg. Wt. of cement bag: 100 kg, 100.2kg.


Measures of Scale
A nominal scale is really a list of categories to which objects can be classified. For example, people who receive a mail order offer might be classified as "no response," "purchase and pay," "purchase but return the product," and "purchase and neither pay nor return." The data so classified are termed categorical data. An ordinal scale is a measurement scale that assigns values to objects based on their ranking with respect to one another. For example, a doctor might use a scale of 0-10 to indicate degree of improvement in some condition, from 0 (no improvement) to 10 (disappearance of the condition). While you know that a 4 is better than a 2, there is no implication that a 4 is twice as good as a 2. Nor is the improvement from 2 to 4 necessarily the same "amount" of improvement as the improvement from 6 to 8. All we know is that there are 11 categories, with 1 being better than 0, 2 being better than 1, etc

Measures of Scale
An interval scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, but where "0" on the scale does not represent the absence of the thing being measured. Fahrenheit and Celsius temperature scales are examples. A ratio scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, and where "0" on the scale represents the absence of the thing being measured. Thus a "4" on such a scale implies twice as much of the thing being measured as a "2."

Interval Data: Temperature, Dates (data that has has an arbitrary zero) Ratio Data: Height, Weight, Age, Length (data that has an absolute zero) Nominal Data: Male, Female, Race, Political Party (categorical data that cannot be ranked) Ordinal Data: Degree of Satisfaction at Restaurant (data that can be ranked)

Interval variables allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees, and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees. Ratio variables are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus, they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not only can we say that a temperature of 200 degrees is higher than one of 100 degrees, we can correctly state that it is twice as high. Interval scales do not have the ratio property. Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales.


Measures of Central Tendency

Simple average of the observations in a dataset Sample Mean, Population Mean (formulae)

Is the value in the middle, when observations are ranked It is more robust than mean, as it is not influenced by extreme values in dataset

Is the value that occurs more frequently in a dataset A dataset having more than one mode is called Multi-modal

Trimmed Mean
Obtained by calculating the mean of data, that remain after a proportion of high and low values being deleted (a% trimmed)

Measures of Dispersion
Provides information on the variability or scatter of the observations around a given value Range
Difference between largest and smallest value in the dataset R = XL - XS Measures the fluctuation of the observations around the mean Population variance, Sample Variance (formulae)


Why n-1 in sample variance?

To satisfy the property of unbiasedness i.e. average of sample variance (keeps varying between sample) should be equal to population variance which is constant

Measures of Dispersion
Standard Deviation
Mostly used measure of dispersion and has the same unit as the observation Measures the variability of the observation around the mean Population Standard Deviation, Sample Standard Deviation

Inter Quartile Range

Lower / First quartile (Q1) is the value such that 1/4th of the observations fall below it and 3/4th fall above it (Q1 = 0.25 (n+1)) Vice Versa for Third Quartile (Q3) (Q1 = 0.75 (n+1)) Difference between 3rd quartile and 1st quartile (IQR = Q3 Q1) Larger the value of IQR, greater the spread of data To find IQR, the data are ranked in ascending order and then Q1 and Q3 are calculated


Measures of Skewness & Kurtosis

Skewness coefficient (V1)
Describes the asymmetry of the dataset about the mean or indicates the degree to which distribution deviates from symmetry (formulae) Negatively skewed: V1= -ve, Mean < Median Positively skewed: V1= +ve, Mean > Median Not skewed: V1= 0, Mean = Median Is a measure of peakness of the dataset (formulae) Is also a measure of heaviness of the tails of distribution For normal distribution (Mesokurtic), V2 = 3 Leptokurtic, More peaked, V2 > 3 Platykurtic, Less peaked, V2 < 3

Kurtosis coefficient (V2)


Measures of Association
Indicates how two or more variables are related to each other Small values indicate weak relation and large value for strong Correlation coefficient (r)
Is a measure of the strength of the linear relationship between 2 variables Sample correlation coefficient is always between -1 and 1 Formulae 1 denotes perfect +ve linear relationship, -1 denotes perfect ve linear relationship and 0 denotes uncorrelated

Sample Problem



Researchers at the European Centre for Road Safety Testing are trying to find out how the age of cars affects their braking capability. They test a group of ten cars of differing ages and find out the minimum stopping distances that the cars can achieve. The results are set out in the table below:

Car ages and stopping distances Car Age (months) Minimum Stopping at 40 kph (metres) A 9 28.4 B 15 29.3 C 24 37.6 D 30 36.2 E 38 36.5 F 46 35.3 G 53 36.2 H 60 44.1 I 64 44.8 J 76 47.2


x-bar = 415/10 = 41.5 y-bar = 376.6/10 = 37.7 r = 10 x 16713.3 - 415 x 375.6 / {(10 x 21623 - 4152) (10 x 14457.72 - 375.62)} r = 11259 / (44005 x 3501.84) r = 11259 / 124.14 r = 0.91
r always lies in the range -1 to +1. If it lies close to either of these two values, then the dispersion of the scattergram points is small and therefore a strong correlation exists between the two variables. For r to equal exactly -1 or +1 must mean that correlation is perfect and all the points on the scattergram lie on the line of best fit (otherwise known as the regression line.) If r is close to 0, the dispersion is large and the variables are uncorrelated. The positive or negative sign on the value of r indicates positive or negative correlation.

Probability Distribution
Sample data can be described with histograms, while population data are described by probability distribution For discrete random variables, the probability distribution shows the value that the variable can take and their corresponding probabilities
P(xi) 1 for all i, P(xi) = P(X = xi); i = 1, 2, Sum of all P(xi) = 1

Continuous random variable can take a infinite number of values and hence probability distribution is expressed by Mathematical function
f(x) >= 0 for all x, where P(a x b) = baf(x)dx Integration from minus infinity to plus infinity is one

Example Rolling 2 Dice (Red/Green)

Y = Sum of the up faces of the two die. Table gives value of y for all elements in S


1 2 3

2 3 4

3 4 5

4 5 6

5 6 7

6 7 8

7 8 9

5 6

6 7

7 8

8 9

9 10

10 11

11 12

Rolling 2 Dice Probability Mass Function & CDF

y p(y) F(y)

3 4 5

2/36 3/36 4/36

3/36 6/36 10/36

# of ways 2 die can sum to y p( y ) # of ways 2 die can result in F ( y ) p(t )

t 2 y

7 8 9

6/36 5/36 4/36

21/36 26/36 30/36

11 12

2/36 1/36

35/36 36/36

Cumulative Distributive function

For a discrete random variable, F(x) = all i p(xi) for xi x For a continuous random variable, F(x) = ba f(x)dx F(x) is a non decreasing function of x such that for limit, x tending to infinity, it is 1 and it is 0 for minus infinity Expected value = E(x) = all i xi p(xi) , if x is discrete = E(x) = ba x f(x)dx, if x is continuous Variance of a random variable is given by 30 Var(X) = E[(X - )2] = E(X2) [E(X)]2

Discrete Distributions
Hyper geometric distribution
Useful in sampling from a finite population without replacement, where the outcomes are success or failure If we consider, getting a nonconforming item as success, the probability distribution of nonconforming item (x) is given by P(x) = Dcx . (N-D)c(n-x) / Ncn
D: no. of defects in population, x: no. of defects in sample N: Size of population, n: Size of sample

Mean = E(x) = nD/N Variance 2 = Var(x) = nD/N(1 D/N)((N-D)/N-1))


Discrete Distributions
Binomial distribution
Series of independent trials Useful in sampling from a large population without replacement, or to sample with replacement from a finite population Probability of success (p) on any trial is assumed to be a constant Let x denote the no. of successes, if n trials are conducted, probability of x successes is given by
P(x) = ncx . px (1-p)n-x , x = 0,1, 2..

Mean = E(x) = np Variance 2 = Var(x) = np(1 p)


Trials are independent in binomial, but not in hyper-geometric Probability of success on any trial remains constant in binomial but not in hypergeometric hyper-geometric approaches to binomial as N and D/N remains constant


Discrete Distributions
Poisson distribution
Used to model the no. of events that happen within a product unit, space or volume or time period. Eg. No. of machine breakdown per month Probability distribution function of the no. of events (x) is given by
P(x) = e- . x / x! , x = 0,1, 2..

Mean or average no. of events is given by Mean = Variance 2 = It is used as an approximation to the binomial, when n is large (n) and p is small (p0), such that np = is a constant or average no. of defects per unit is constant


Continuous Distribution
Normal Distribution
Most widely used Probability density function of a normal random variable is f(x) = 1/2 exp[-(x - )2/ 22], - < x < = population mean, = population std. deviation Change in mean changes the location of distribution. As increases, distribution shifts right and vice versa As variance increases, the spread about mean increases Normal distribution is symmetric and Mean = Median = Mode Proportion of population values that fall in the range of +/- is 68.26%, +/- 2 is 95.44%, +/- 3 is 99.74%, Shape of the density function changes for diff. values of & and hence it needs standardization


También podría gustarte