NSE Sample Test

Sample Test: CBA

Basic Data Exploration with Statistics
Question 1 (1)
When would we say that we have a left tailed distribution basis following observations in data?
I.
II.
III.
IV.

Mean < Median

Mean = Median
Mean > Median
Median = Mode

Question 2 (1)
Given the below sample data compute what would be the mean, median, mode and standard deviation in
the Age 17 34 23 28 20 25 37 21 11 19 39 37 37 32 32
I.
II.
III.
IV.

Mean = 27, Median = 28, Mode = 37, Standard Deviation = 8.7

Mean = 18, Median = 25, Mode = 37, standard deviation = 8.7
Mean = 27, Median = 28, Mode = 37, standard Deviation = 7.8
Mean = 27, Median = 28, Mode = 37, Standard Deviation = 8.7

Question 3 (2)
Temperature (In degrees Fahrenheit) as a variable in any study with sample observations like 35oF, 75 oF,
98.3 oF ,etc would be
I.
II.
III.
IV.

having a Interval scale of measurement as zero is assigned arbitrarily

having a Interval scale of measurement as we cannot take ratios of two measurements
having a Ratio scale of measurement as zero is absolute
both 1 & 2

Question 4 (3)
The Lower Quartile of a Box Plot created from a dataset with observations as {2,4,40,44,46,66,56,33,45}
is 33 while the Upper Quartile is 46. Where should we be seeing the Lower Whiskers?
I.
II.
III.
IV.

2
4
40
None of the Above

Sampling and Hypothesis Testing

Question 5 (1)
If you get some data related to the efficacy of a fertilizer based on tests on field wherein the quantity
administered and size of the field under study is different for different samples, then which of the following
would be an ideal process before you go ahead with data analysis
I.
II.
III.
IV.

Using percentage ratios

Transforming the data using normalization techniques
Use Average values and compare the tests
Use absolute values as is available so as to keep the data unchanged

Question 6 (1)
Probability of an event A is 0.4, and the probability of event B is 0.3. Assuming the two events are
independent of each other what is the conditional probability of A given B denoted by P(A|B)?
I.
II.
III.
IV.

0.4
0.12
0.1
None of the above

Question 7 (1)
A sampling distribution is the probability distribution for which one of the following?
I.
II.
III.
IV.

A sample
A population
A sample statistic
A population parameter

Question 8 (2)
Given that you have specified the confidence level at 95 %, if p value is less than then specify the and
maximum probability of Type I error respectively
I.
II.
III.
IV.

0.95 and 0.85

5 % and 0.95
0.05 and 0.05
None of the above

Question 9(3)
Select the hypothesis formulation and the corresponding best values for , in a Judiciary Scenario so as to
avoid punishing an innocent in lieu of which its okay to pronounce a real case of guilty as not guilty
I. H0 : Defendant is Guilty ,H1 : Defendant is not Guilty, = 10%

II. H0 : Defendant is Innocent, H1 : Defendant is not Innocent, = 5%

III. H0 : Defendant is Guilty, H1 : Defendant is not Guilty, = 1%
IV. H0 : Defendant is Innocent, H1 : Defendant is not Innocent, = 1%

Predictive Analytics: Linear Regression

Question 10(1)
The degree or strength of correlation between an independent variable age and dependent variable salary is
measured by
I.
II.
III.
IV.

Coefficient of determination
Coefficient of correlation
Standard error of estimate
All of above

Question 11(1)
Percent total variation of the dependent variable Y explained by the set of independent variables
X1,X2,...,Xn is measured by
I. Coefficient of correlation
II. Coefficient of skewness
III. Coefficient of determination
IV. Standard deviation
Question 12(1)
Coefficient of correlation between age and mortality rate is 0.9 indicating
I.
II.
III.
IV.

a weak relationship between age and mortality rate

a weak relationship between age and mortality rate which is positive
a strong relationship between age and mortality rate
a strong relationship between age and mortality rate which is positive

Question 13 (2)
Given the ANOVA output, compute the missing values
Source of
Variation
Regression
Error
Total
I.
II.
III.
IV.

Sum of
Squares

Degrees of Freedom

Mean Square F Ratio

321.5
???? 107.1666667
????
4
XXXXXX
420
7
Regression sum of squares is 300 and Degrees of freedom for Regression is 3
Regression sum of squares is 210 and Degrees of freedom for Regression is 6
Regression sum of squares is 80.5 and Degrees of freedom for Regression is 6
Regression sum of squares is 98.5 and Degrees of freedom for Regression is 3

4.351945854

Question 14(3)
From the following ANOVA output compute the total number of observations and number of variables
respectively
Source of
Variation
Regression
Error
Total

I.
II.
III.
IV.

Sum of
Squares

Degrees of Freedom
400
100
500

12
8
20

n = 20 and k =8
n = 21 and k = 12
n = 19 and k = 8
n = 18 and k = 12

Classification
Question 15(1)
Naive Bayes algorithm is a
I.
II.
III.
IV.

Supervised learning model

Unsupervised learning model
Both of the Above
None of the Above

Question 16(1)
Decision tree algorithm is a
I.
II.
III.
IV.

Supervised learning model

Unsupervised learning model
Both of the Above
None of the Above

Question 17(1)
Naive Bayes algorithm is a
I.
II.
III.
IV.

Prediction model
Classification model
Both of the Above
None of the Above

33.33333333
12.5

2.666666667

Question 18(2)
Which of these target variable types are used by CHAID for decision making?
I.
II.
III.
IV.

Numeric
Integer
Interval
Class

Question 19 (1)
Market Basket Analysis is a study of
I. Association between products
II. Link between products
III. Relation between numbers
IV. Association between dependent variable and independent variable
Question 20 (1)
Association rule mining is a
I. Supervised learning model
II. Unsupervised learning model
III. Classification model
IV. None of the above
Question 21 (1)
Market basket analysis is used for
I.
II.
III.
IV.

Up selling Only
Cross selling Only
Up selling and cross selling
None of the above

Question 22 (2)
Would you expect good number of rules in a transaction set of 100 records as compared to 100000
records?
I.
II.
III.
IV.

Yes
No
Cant Say
None of the Above

Question 23 (3)
At any point in time for a specific customer, is it possible to see more than one consequent as a
recommendation?
I.
II.
III.
IV.

Yes
No
Cant Say
None of the Above

Predictive Analytics: Forecasting Time Series Analysis

Question 24 (1)
What is the x axis of a time series data?
I.
II.
III.
IV.

Time
Sales
Both of the Above
None of the Above

Question 25(1)
What would we call an ordered set of data arranged in accordance with their time of occurrence?
I.
II.
III.
IV.

Arithmetic series
Time Series
Both of the Above
None of the Above

Question 26(1)
What would a time series indicate?
I.
II.
III.
IV.

Short term variation

Irregular variation
Both of the Above
None of the Above

Question 27(2)
What would be the systematic components of time series which follow regular pattern of variations?
I.
II.
III.
IV.

Noise
Signal
Correlation
None of the Above

Question 28(3)
Which of the following describes a time series as a weak stationary process?
a.
b.
c.
d.
I.
II.
III.
IV.

Constant mean
Constant variance
Constant auto covariance for given lags
Constant probability distributions
a
a&c
a,b & c
a,b,c & d

Clustering
Question 29(1)
What is the Clustering?
I.
II.
III.
IV.

Prediction of data
Classification of data
Partition of data
None of the Above

Question 30(1)
Do we identify a set of independent variables and a dependent variable when we do clustering?
I.
II.
III.
IV.

Yes
No
Cant say
None of the Above

Question 31(1)
Will we call cluster analysis a variable reduction technique?
I.
II.
III.

Yes
No
Cant say

Question 32 (2)
What of these would be a dependent variable in a clustering algorithm?
I.
II.
III.

Numerical
Categorical
Both of the Above

IV.

None of the Above

Logistic Regression
Question 33(1)
In a Logistic regression model, the level of significance for a variable in the model indicates
I.
II.
III.
IV.

The probability of accepting the null hypothesis when it is actually true

The probability of rejecting the null hypothesis when it is actually true
The probability of accepting the null hypothesis when it is actually false
The probability of rejecting the null hypothesis when it is actually false

Question 34(1)
What is the relation between level of confidence and the significance level ?
Level of confidence =
Level of significance = 1 -
Level of confidence = 1-
Level of confidence = Level of significance

I.
II.
III.
IV.

Question 35(2)
The likelihood term in logistic regression statistically
1.
2.
3.
4.

Is the probability of observing a particular parameter value given a set of data

Is same as p value
Is the parameter value which is most likely given the observed data
Minimises the difference between the model and the data