Assessment: Center of Excellence For Teacher Education

Capitol University
Cagayan de Oro City

COLLEGE OF EDUCATION
Center of Excellence for Teacher Education
ASSESSMENT NOTES
Dr. Ma. Jessica P. Campano
Assessment
A process where test results are subject to critical study according to established
measurement principles.
Assessment decisions could substantially improved student performance, guide the
teachers in enhancing the teaching-learning process and assist policy makers in
improving the educational system.
Testing and Assessment

Testing
Assessment
1. Tests are developed or selected,

administered to the class, and scored.
1. Information is collected from tests and

other measurement instruments (portfolios and
performance assessments, rating scales,
checklists, and observations).
2. Test results are then used to make

decisions about a pupil (to assign a
grade, recommend for an advanced
program), instruction (repeat, review,
move on), curriculum (replace, revise),
or other educational factors.
2. This information is critically evaluated and

integrated with relevant background and
contextual information.
3. The integration of critically analyzed test

results and other information results in a
decision about a pupil.
Types of Educational Decisions
Instructional Decisions
Grading Decisions
Diagnostic Decisions
Selection Decisions
Placement Decisions
Counseling and Guidance Decisions
Program or Curriculum Decisions
Administrative Policy Decisions
Roles of Assessment
Summative tries to determine the extent to which the learning objectives for a course
are met and why.
Diagnostic determining the gaps in learning or learning processes, hopefully to be able
to bridge these gaps.
Formative allows the teacher to redirect and refocus the course of teaching a subject
matter.
Placement plays a vital role in determining the appropriate placement of a student both
in terms of achievement and aptitude.
Approaches in Assessment
Assessment of Learning
Assessment for Learning
Assessment as Learning
Modes of Assessment
Traditional
Alternative
Authentic
Comparing NRTs and CRTs

Dimension
Average number of
students who get an item
right
Compare a students
performance to
Breadth of content sampled
Comprehensiveness of
content sampled
Variability
Item construction
Reporting and interpreting
NRT
CRT
50%
80%
Performance of other
students
Broad, covers many
objective
Shallow, usually one or two
items per objective
The more spread of scores,
the better
Items are chosen to
promote variance or
spread. One aim is to
produce good distracter
options.
Percentile rank and
standard scores used
Standards indicative of
mastery
Narrow, covers a few
objectives
Narrow, covers a few
objective
Variability may be minimal
Items are chosen to reflect
the criterion behavior
Number succeeding or
failing or range or
acceptable performance
used
1.1 Cognitive Targets

Knowledge (remembering) refers to the acquisition of facts, concepts, and theories.
Comprehension (understanding) refers to the same concept as understanding. It is a
step higher than mere acquisition of facts and involves a cognition or awareness of the
interrelationships of facts and concepts.
Application (applying)- refers to the transfer of knowledge from one field of study to
another or from one concept to another concept in the same discipline
Analysis (analyzing) refers to the breaking down of a concept or idea into its
components and explaining the concept as a composition of these concepts.
Evaluation refers to valuing and judgment or putting the worth of a concept or principle
. Synthesis (creating) refers to the opposite of analysis and entails putting together the
components in order to summarize the concept.
Development of Assessment Tools
Planning a Test and Construction of TOS
1. Identifying test objectives
2. Deciding on the type of objective test to be prepared.
3. Preparing a TOS.
4. Constructing the draft of the test items.

5. Try out and validation
Table of Specification (TOS)
A map that guides the teacher in constructing a test.
It ensures that there is a balance between items that test lower thinking skills and those
which test higher order thinking skills
It conveys to the teacher the number of items to be constructed per objective, their level
in the taxonomy, and whether the test represents a balanced picture based on what was
taught.
Objective Test Format
True-False
Make statements clearly True or False.
Avoid specific determiners.
Do not arrange responses in a pattern.
Do not use textbook jargon.
Use relatively short statements and eliminate extraneous materials.
Keep true and false statements approximately the same length, and that there are
equal numbers of True and False Items.
Avoid using double-negative statements.
Avoid the following:
Verbal clues, absolutes, and complex sentences.

Broad, general statements
Terms denoting indefinite degree or absolutes
Matching Type Items
Use a homogeneous topic.

Put longer options in the left column.
Provide clear directions.
Use unequal number of entries in the two columns.
The matching lists should be located on one page.
Completion Items
Provide clear focus for desired answer.

Avoid grammatical clues.
Put blanks at the end.
Restrict the number of the blanks to one or two.
Blanks for answers should be equal in length.
Essay Items
Use several short essay questions rather than a long one.
Provide a clear focus for students questions.
Indicate limitations or scoring criteria to pupils.
ITEM ANALYSIS AND VALIDATION

3
Item Analysis a numerical method for analyzing test items employing student
response alternatives or options.
Criteria in determining the desirability and undesirability of an item:
a. Difficulty of an item
b. Discriminating power of an item
c. Measures of attractiveness
Difficulty Index (P)
proportion of the number of students in the upper and lower groups who answered
an item correctly.
P = UL + LL
2n
Level of Difficulty of an Item
Index Range
Difficulty Level
Recommendation
0.00- 0.20
Very difficult
NA
0.21 0.40
Difficult
LA
0.41- 0.60
Moderately Difficulty VA
0.61 0.80
Easy
LA
0.81 1.00
Very Easy
NA
Discrimination Index (D)

Measure of the extent to which a test item discriminates or differentiates between
students who do well on the overall test and those who do not.
There are 3 types of Discrimination Indexes:
1. Positive
2. Negative
3. Zero
D = UL-LL
n
Discrimination
Index
Item
Evaluation
Recommendation
0.40 & above
Very good item
VA
0.30 0.39
RGI but possibly subject LA

to improvement
0.20 - 0.29
MI, usually needing and LA

being subject to
improvement
Below 0.19
Poor Item
NA
Criteria for Item Analysis

Option Analysis P
Evaluation
VA
VA
VGI (RET)
GI
(RET)
LA
(RET)
LA
F(
RET)
NA
LA
LA
BF
(REV)
NA
NA
NA
(REJ)
Properties of Assessment Methods
Validity
Reliability
Fairness
Practicality and Efficiency
Ethics in Assessment
Validity
The degree to which a test or measuring instrument measures what it intends to
measure.
soundness (what the test measures and how well it could be applied)
Types of Validity
Content Validity- the extent to which the content or topic of the test is truly
representative of the course.
Depends on the relevance of the individuals responses to the behavior area
under consideration rather on the apparent relevance of item content.
Commonly used in evaluating achievement test.
Appropriate for the criterion-referenced measure.
Concurrent Validity - the degree to which the test agrees or correlates with a
criterion set up as an acceptable measure.
Applicable to test employed for the diagnosis of existing status rather than for the
prediction of further outcome.
E.g. validating a test made by the teacher by correlating with a previously proven
valid test.
Predictive Validity determined by showing how well predictions made from the test
are confirmed by evidence gathered at some subsequent time
Construct Validity the extent to which the test measures a theoretical trait.
Reliability
The extent to which a test is dependable, self-consistent and stable.
It is concerned with the consistency of responses from moment to moment.
A reliable test may not always be valid.
Methods in Testing the Reliability of Good Measuring In
Test-Retest Method the same measuring instrument is administered twice to
the same group of students and the correlation coefficient is determined.
Limitations: time interval, environmental conditions
Spearman rank correlation coefficient or Spearman rho is a statatistical tool used
to measure the relationship between paired ranks assigned to individual scores
on two variables, x and y.
rs = 1 rs
6 D
N3 - N
= Spearman rho
6 D = sum of the squared difference between ranks

N
= total number of cases
Steps
Step 1. Rank the scores of respondents from highest to lowest in the first set of
administration (X1) and mark this rank as Rx. The highest score receives the
rank of 1.
Step 2. Rank the second set of scores (Y) in the same manner as in Step 1 and mark as
Ry.
Step 3. Determine the difference in ranks for every pair of ranks.
Step 4. Square each difference to get D2.
Step 5. Sum the squared difference to find D .
Step 6. Compute Spearman rho (rs).
Frequency Distributions
any arrangement of the data that shows the frequency of occurrence of
different values of the variable falling within defined ranges of a class interval.
applicable if the total number of cases (N) is 30 or more.
Steps:
1. Find the absolute range by subtracting the lowest score from the highest
score.
R = HS LS
2. Find the class interval by dividing the Range by 10 and 20 in order that the
size of class limits may not be less than 10 and not more than 20. In
choosing the class interval, ODD number is preferable.
3. Set up the classes by adding C/2 to the highest score as the upper class limit
of the highest class and subtract C/2 to the highest score as the lower class
limit of the highest class. Set up the real and integral limits.
4. Tally the score.
5. Determine the Cumulative Frequency and the Cumulative Percentage
Frequency distributions.
6. Present the frequency polygon and histogram.
Measures of Central Tendency

A single value that is used to identify the center of the data, it is taught as the
typical value in a set of scores.
Mean Most common measure of center and it is also known as the arithmetic average.
Population mean
Sample mean
Properties of Mean
Easy to compute
May not be an actual observation in the data set.
Can be subjected to numerous mathematical computation
Most widely used
Each data contributes to the mean value
Easily affected by extreme values.
Median
A point that divides the scores in a distribution into two equal parts when the scores are
arranged according to magnitude.
Properties of Median
Not affected by extremes values

Applied to ordinal level of data
The middle most score in the distribution
Most appropriate when there are extreme scores
Mode- Refers to the score or scores that occurred most in the distribution.
Unimodal
Bimodal
multimodal
Properties of Mode
It is the score/s occurring most frequently
Nominal average
It can be used for qualitative and quantitative data
Not affected by extreme values
Measures of Variability
Refers to a single value that is used to describe the spread out of the scores in a
distribution, that is above or below the measures of central tendency.
Range, quartile deviation, sd

Assessment: Center of Excellence For Teacher Education

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Assessment: Center of Excellence For Teacher Education

Cargado por

Copyright:

Formatos disponibles

Capitol University

Cagayan de Oro City

Testing and Assessment

1. Tests are developed or selected,

1. Information is collected from tests and

2. Test results are then used to make

2. This information is critically evaluated and

3. The integration of critically analyzed test

Types of Educational Decisions

Comparing NRTs and CRTs

Reporting and interpreting

1.1 Cognitive Targets

4. Constructing the draft of the test items.

Avoid using double-negative statements.

Avoid the following:

Verbal clues, absolutes, and complex sentences.

Use a homogeneous topic.

Provide clear focus for desired answer.

ITEM ANALYSIS AND VALIDATION

Discrimination Index (D)

0.40 & above

Very good item

RGI but possibly subject LA

MI, usually needing and LA

Criteria for Item Analysis

Properties of Assessment Methods

6 D = sum of the squared difference between ranks

Measures of Central Tendency

Not affected by extremes values

También podría gustarte