Está en la página 1de 21

The Test and Item Analyses Report

By: Owen Maphisa Ncube


BEd (Hons)
Student number 26336686
Table of contents

Topic Page

List of tables ii

List of figures iii

Acknowledgements iv

1. Introduction 1

2. Purpose of report 1

3 Test analyses 1

4. Item Analyses 5

5. Conclusion 7

6. References 9
7. Appendices 10

i
List of tables

Table 1 Number of students, total score, mean, standard deviation, mode and
mean
Table 2 Highest and lowest scores, range, number and size of intervals
Table 3 Number of questions, sum of products and reliability
Table 4 Question, correct and answered items, difficult index and acceptability
Table 5 Questions, number correct and answered, upper and lower level and
discrimination index

ii
List of figures

Figure 1 Histogram
Figure 2 Frequency polygon
Figure 3 Ogive

iii
Acknowledgements

I would like to thank Prof J. Knoetze for his tireless efforts and
remarkable advice for the production of this report. I would also
like to appreciate my beloved ones Thandi, Lungile, Nokwanda,
Ohayo and Owens for their incredible patience and amazing
support during the compilation of this report.

iv
1. INTRODUCTION

In academic institutions like schools, educators administer tests and


examinations for different reasons centred on decisions. Tests are
tools that attempt to provide objective data that can be used for
making diagnostic, instructional and grading decisions in the
classroom. At a larger scale tests can be used for other decisions like
selection, placement, counselling, guidance, curriculum, and
administration. A test informs the educator about where a student
stands compared to classmates and provides information about the
students’ level of proficiency and mastery of some skill or set of skills.

Statistical analysis can be a powerful technique available to


instructors for the guidance and improvement of instruction, tests and
examinations. Tests can be analysed by using mean, mode, median,
standard deviation and reliability. The quality of the test as a whole is
assessed by estimating its internal consistency. On the other hand
items can be analysed by finding the item difficulty and item
discrimination. For this to be so, the items to be analyzed must be
valid measures of instructional objectives designed by the educator
for his learners. Furthermore, the items must be diagnostic, that is,
knowledge of which incorrect options students select must be a clue
to the nature of the misunderstanding, and thus prescribe appropriate
remedial strategies. Instructors who set their own examinations or
tests may immensely improve the effectiveness of test items and the
validity of test scores.

This report dwells on issues surrounding the reliability of the multiple


choice test written by twenty five students, as well as analysis of
twenty five items contained in the test. Each of the items was labelled
and the correct options indicated.

2. PURPOSE OF REPORT

The purpose of this report is to disseminate information about the test


and item analyses of a multiple choice test containing twenty
questions which had four options each labelled A,B,C and D and
written by twenty-five learners.

3. TEST ANALYSES

3.1 Descriptive Statistics

Descriptive statistics are used for describing the basic features of the
data in a study and provide simple summaries about the sample and
the measures. In conjunction with simple graphics analysis, the
descriptive statistics forms the basis of virtually every quantitative

1
analysis of data. The measures of central tendency of a distribution
like mean, mode and median are employed in analysing a test. Each
of the items is recoded in preparation for quantitative test and item
analysis.

(i) To find out the mean which is the average student response to the
item, the number of points earned by all students on the item and
dividing that total by the number of students, the formula:

M = ∑ x was used, where:


N
M is the mean,

∑ is the summation
N is the total number of students who wrote the test.
In Table 1 the mean is 65.

(ii) To find the standard deviation this is a measure of the dispersion


of students’ scores on that item, the formula:

was used. Table 1 shows that the calculated standard deviation as


21.90.

(iii) To find the mode, which is the most frequently occurring score in
a distribution, the frequencies of each score are checked thoroughly.
Table 1 shows the mode as 65.

(iv)) To find the median (50th percentile) which is the exact midpoint of
a distribution, the scores are arranged in ascending or descending
order and the mid-mark is identified. In table one the median is 65.

Measure Result
Number of students
25
Total score 1645
Mean 65.79
Standard deviation 21.90
Mode 65
Median 65

Table 1: Number of students, total score, mean, mode and


median

2
3.2 Frequency Graphs

In Table 2 the highest and lowest scores are 100 and 15 respectively,
the range is 85 that is the difference between the highest and lowest
scores, the number of scores is 25, the size of the interval is 9 and
the intervals are 10. There is a wide gap in the poor and best
learners’ performances.
Highest score 100
Lowest score 15
Range 85
Number of scores 25
Number of intervals 10
Size of interval 9

Table 2:
Highest and lowest scores, range, number and size of
intervals

3.2.1 Histogram

A histogram is a bar graph of raw data that creates a picture of


data distribution. Histograms illustrate the process of
distribution which can be used for predictions. The bars
represent the frequency of the occurrence of classes of data. It
shows basic information like central location,, width of spread,
skewness and shape, which help one to decide on how to
improve the instruction or test. Figure 1 shows that the
histogram is negatively skewed because the rest of the scores
lie above the mean and median. The students did well in the
test.

3
HISTOGRAM

4
Frequency

0
15-24 25-34 35-44 45-54 55-64 65-74 75-84 85-94 95-104
Interval

Figure 1: Histogram

3.2.2 Frequency polygon

The frequency polygon is an alternative way of representing


data that has been grouped. Information obtained from the
frequency polygon is similar to the one from a histogram
because they are constructed from the same data. Figure 2
shows two peaks with a frequency of six and middle value of
69.5 and 89.5.

Frequency Polygon
7
Frequency

6
5
4
3
2
1
0

9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Middle values

Figure 2: Frequency polygon

4
3.2.3 Ogive

An ogive is a cumulative frequency polygon, mostly presented in


percentages. Cumulative frequencies show the running total,
thus the frequency below each class boundary, as shown in
Figure 3. The main use of an ogive is to estimate important
percentiles like the median (50%), lower quartile (25%) and
upper quartile (75%).

Cummulative Frequency

30
25
Frequency

20
15
10
5
0
24 34 44 54 64 74 84 94 104
Upper Limit

Figure 3: Ogive

3.2.4 TEST RELIABILITY


The reliability of a test measured by KR20 refers to the extent
to which the test is likely to produce consistent scores.

The formula: K = (
k
)(1 −
∑ pq ) for calculating KR20 was
k −1 ( Stdev) 2
used, where:
K the number of items
k-1 the difference
∑ the summation
P proportion correct
q proportion incorrect
Stdev the standard deviation

In Table 3 the number of items is 20, the difference 19, sum of


products 3.83 and reliability 1.04 which demonstrates consistency.

5
Number of questions k 20
Difference k -1 19
Sum of products ∑ pq 3.83
Reliability KR20 1.04

Table 3: Number of questions, sum of products and reliability

4. ITEM ANALYSES

Item analysis is a process, which examines learners’ responses


to individual test items in order to assess the quality and
accuracy of the items and the test as a whole. Item analysis is
especially quite valuable in improving the quality of the items
that can be used in later tests and eliminating ambiguous or
misleading questions. Additionally the item analysis improves
the educator’s skills in test or examination setting and
identifying specific areas of course content where there is a
need for emphasis or clarity. Each item can be analysed for its
difficult index and the discrimination index. Quantitative item
analysis was used for detecting the performance in this
multiple-choice test.

4.1 Difficulty Index (p)

The item difficulty index is relevant for determining whether


learners have learnt the concepts being tested. It plays an
important role in the ability of the question to discriminate
between learners who have learnt material that is being tested
and those who have not. It is a measure of the proportion of the
learners or examinees who answered the item correctly and
those who answered the item.

The p value is calculated by dividing the number of items


answered correctly by the total number of learners who
answered the item.

The high value indicates that a bigger proportion of learners


responded to the item correctly, hence an easier item. An item
with a p-value < 0.25 is very difficult hence it is not accepted, a
p-value > 0.75 is not acceptable because it is very easy but an
item with a value between 0.25 and 0.75 is acceptable because
it is neither difficult nor easy. Table 4 shows that the difficult
index varies from 0.33 to 0.92, so the questions are either too
easy or fine. Twelve items namely q3, q4, q6, q7, q8, q9 , q10,

6
q13, q17, q18, q19 and q20 are good questions whilst q1, q2,
q5, q11, q12, q14,q15 and q16 are bad questions.

Question #Correct #Answered p Comment Reason


q1 21 25 0.84 Unacceptable Too easy
q2 22 25 0.88 Unacceptable Too easy
q3 17 25 0.68 Acceptable Fine
q4 12 25 0.48 Acceptable Fine
q5 21 25 0.84 Unacceptable Too easy
q6 17 25 0.68 Acceptable Fine
q7 11 25 0.44 Acceptable Fine
q8 12 23 0.52 Acceptable Fine
q9 13 25 0.52 Acceptable Fine
q10 8 24 0.33 Acceptable Fine
q11 23 25 0.92 Unacceptable Too easy
q12 19 25 0.76 Unacceptable Too easy
q13 15 25 0.60 Acceptable Fine
q14 21 25 0.84 Unacceptable Too easy
q15 20 25 0.80 Unacceptable Too easy
q16 22 24 0.92 Unacceptable Too easy
q17 15 24 0.63 Acceptable Fine
q18 8 24 0.33 Acceptable Fine
q19 13 25 0.52 Acceptable Fine
q20 16 25 0.64 Acceptable Fine

Table 4: Question, number of correct and answered items, difficult index


acceptability

4.2 Discrimination Index (D)

Item discrimination index refers to the ability of an item to


differentiate among learners on the basis of how well they know
the content being tested. It is a measure of how well an item is
able to distinguish between learners/ examinees who are
knowledgeable and those who are not. D is calculated by
dividing the difference between the numbers correct in the upper
and the number correct in the lower divided by the larger
number in either group. A good item discriminates between
students who scored high or low on the examination as a whole.
There are three types of discrimination index namely positive
index in which applies to learners who did well on the overall
test and chose the correct answer for a particular item more
often than those who did poorly, negative index negative index
applies to students who did poorly but chose the correct answer
more than those who did well and zero index where the
numbers are equal. When interpreting the value of
discrimination it is important to be aware of the relationship

7
between an item difficulty index and its discrimination index. In
Table 6, the discrimination index varies between 0.13 and 0.73
and all of them discriminate positively which implies that all the
items can be kept.

Question #Upper level #Lower level D


q1 15 6 0.60
q2 15 7 0.53
q3 14 3 0.73
q4 8 4 0.27
q5 15 6 0.60
q6 12 5 0.47
q7 9 2 0.47
q8 10 2 0.53
q9 10 3 0.47
q10 8 0 0.53
q11 14 9 0.33
q12 14 5 0.60
q13 12 3 0.60
q14 15 6 0.60
q15 14 6 0.53
q16 15 7 0.53
q17 12 3 0.60
q18 5 3 0.13
q19 12 1 0.73
q20 11 5 0.40

Table 5: Questions, upper and lower levels and discrimination index

5. CONCLUSION

On the overall the test is very reliable and can be used


effectively and stored for later use. The results of the multiple-
choice test form a normal distribution curve with very few
learners who have done poorly and excellently on either side
of the curve. The rest of the learners performed averagely. The
scores are clustered around the mean as dictated by the
standard deviation. Sixty percent of the questions namely q3,
q4. q6, q7, q8, q9, q10, q13, q17, q18, q19 and q20 are very
good questions which discriminate positively. Forty percent of
the questions thus q1,q2,q5,q11.q12,q14,q15 and q16 are bad
items that must either be improved or discarded, although they
discriminate positively. The distracters seem not to have
played their roles.

I recommend that the statistics must always be interpreted in


the context of the type of test given, ambiguity of the items,
length of the test, the individuals being tested and the number
of learners as pointed out by Mehrens and Lehmann (1973).

8
6. REFERENCES

.1 Kubiszyn, T., & Borich, G. (2007). Educational Testing and Measurement:


Classroom Application and Practice.
Eight Edition.USA. Wiley/Jossey-Bass Education
.
.2 Interpreting Item Analysis (n.d.) Retrieved September 08,2007
from http://www.uleth.ca/edu/runte/tests/iteman/interp/interp.html

.3 Seock-Ho K.(1999). A computer program for classical item analysis.


Retrieved August 17,2007 from
http://shkim.myweb.uga.edu/epm99.htm,

4. Stroud, K, A. (2001) Engineering mathematics. Fifth Edition.


Palgrave.New York

5. Interpreting item analysis : Test Validation & construction Unit.


California State Personnel Board.

6. Kehoe, J. (1995). Basic item analysis for multiple-choice tests.


Practical Assessment, Research & Evaluation. Retrieved
September 21, 2007 from http:////PAREonline.net/getvn.asp?v=4&n=10

9
7. Appendices

Appendix A: Recoded of scores

STUDENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 0 0 1
4 1 1 1 1 1 1 1 1
5 1 1 1 1 1 0 0 1
6 1 1 1 0 1 1 1 1
7 1 1 1 0 1 1 1 1
8 1 1 1 1 1 1 1 1
9 1 1 1 0 1 1 0 1
10 1 1 1 0 1 1 0 1
11 1 1 1 1 1 1 1 0
12 1 1 1 0 1 1 0 0
13 1 1 1 0 1 1 1 0
14 1 1 0 1 1 0 1 0
15 1 1 1 0 1 1 0 0
16 1 1 0 0 1 1 1 0
17 1 0 1 1 0 1 0 0
18 1 0 1 1 0 1 0 0
19 0 1 0 0 1 1 0 0
20 0 1 0 0 1 1 0 0
21 0 1 0 0 1 0 1 0
22 1 1 1 1 1 0 0 1
23 1 1 0 0 0 0 0
24 1 1 0 0 0 0 0
25 0 0 0 1 1 0 0 1

10
Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 0 0 0 1 1 1 1
1 0 1 1 1 1 1 1
1 1 1 1 0 1 0 1
0 0 1 1 0 1 1 1
0 0 1 1 1 1 1 1
0 0 1 1 1 1 1 1
0 0 1 1 0 1 1 1
0 0 1 1 1 1 1 1
0 0 1 0 0 1 0 1
1 0 1 1 0 1 1 1
1 0 1 1 0 1 1 1
0 0 1 1 1 1 0 1
0 0 1 1 1 1 0 1
1 1 0 1 1 1
0 0 1 1 0 0 1 0
0 0 1 0 0 0 1 1
0 0 1 0 0 0 1 1
0 0 0 0 0 0 0 0

11
Q17 Q18 Q19 Q20
1 1 1 1
1 1 1 1
1 1 1 1
1 0 0 1
1 1 1 1
0 1 1
0 0 1 1
1 1 1 1
0 0 1 1
1 0 0 1
1 0 1 0
1 0 1 0
1 0 0 0
1 0 1 1
1 0 1 0
1 1 1 1
0 0 0 1
0 0 0 1
0 1 0 1
0 1 0 1
0 0 0
0 0 0 0
1 0 0 0
1 0 0 0
0 0 0 0

Appendix B: Interval, middle value and frequency

Interval Middle Value Frequency

15 - 24 19.5 1
25 - 34 29.5 2
35 - 44 39.5 0
45 - 54 49.5 4
55 - 64 59.5 3
65 - 74 69.5 6
75 - 84 79.5 1
85 - 94 89.5 6
95 - 104 99.5 2

12
Appendix C: Students’ responses to questions

Students #Correct #Answered % Correct


1 20 20 100.00
2 20 20 100.00
3 18 20 90.00
4 18 20 90.00
5 18 20 90.00
6 17 19 89.47
7 17 20 85.00
8 17 20 85.00
9 15 20 75.00
10 14 20 70.00
11 14 20 70.00
12 13 20 65.00
13 13 20 65.00
14 13 20 65.00
15 13 20 65.00
16 12 20 60.00
17 11 20 55.00
18 11 20 55.00
19 10 20 50.00
20 10 20 50.00
21 8 17 47.06
22 9 20 45.00
23 6 19 31.58
24 6 19 31.58
25 3 20 15.00

13
Appendix D: Question, numbers correct and incorrect, proportion
correct and product of p and q

Question #Correct #Incorrect Prop Prop pq


correct (p) incorrect
(q)
q1 21 4 0.84 0.16 0.13
q2 22 3 0.88 0.12 0.11
q3 17 8 0.68 0.32 0.22
q4 12 13 0.48 0.52 0.25
q5 21 4 0.84 0.16 0.13
q6 17 8 0.68 0.32 0.22
q7 11 14 0.44 0.56 0.25
q8 12 11 0.52 0.48 0.25
q9 13 12 0.52 0.48 0.25
q10 8 16 0.33 0.67 0.22
q11 23 2 0.92 0.08 0.07
q12 19 6 0.76 0.24 0.18
q13 15 10 0.60 0.40 0.24
q14 21 4 0.84 0.16 0.13
q15 20 5 0.80 0.20 0.16
q16 22 2 0.92 0.80 0.08
q17 15 9 0.63 0.38 0.23
q18 8 16 0.33 0.67 0.22
q19 13 12 0.52 0.48 0.25
q20 16 9 0.64 0.36 0.23
Total : 3.83

14
Appendix E: Class, upper limit, frequency and cumulative frequency

Class Upper Limit Frequency Cumulative


Frequency
15 - 24 24 1 1
25 - 34 34 2 3
35 - 44 44 0 3
45 - 54 54 4 7
55 - 64 64 3 10
65 - 74 74 6 16
75 - 84 84 1 17
85 - 94 94 6 23
95 - 104 104 2 25

Appendix F: Learners in upper and lower levels

Learners in upper 15
level
Learners in lower 10
level

15

También podría gustarte