Está en la página 1de 7

Capitol University

Cagayan de Oro City


COLLEGE OF EDUCATION
Center of Excellence for Teacher Education
ASSESSMENT NOTES
Dr. Ma. Jessica P. Campano

Assessment
A process where test results are subject to critical study according to established
measurement principles.
Assessment decisions could substantially improved student performance, guide the
teachers in enhancing the teaching-learning process and assist policy makers in
improving the educational system.

Testing and Assessment


Testing

Assessment

1. Tests are developed or selected,


administered to the class, and scored.

1. Information is collected from tests and


other measurement instruments (portfolios and
performance assessments, rating scales,
checklists, and observations).

2. Test results are then used to make


decisions about a pupil (to assign a
grade, recommend for an advanced
program), instruction (repeat, review,
move on), curriculum (replace, revise),
or other educational factors.

2. This information is critically evaluated and


integrated with relevant background and
contextual information.

3. The integration of critically analyzed test


results and other information results in a
decision about a pupil.

Types of Educational Decisions

Instructional Decisions
Grading Decisions
Diagnostic Decisions
Selection Decisions
Placement Decisions
Counseling and Guidance Decisions
Program or Curriculum Decisions
Administrative Policy Decisions

Roles of Assessment
Summative tries to determine the extent to which the learning objectives for a course
are met and why.
Diagnostic determining the gaps in learning or learning processes, hopefully to be able
to bridge these gaps.
Formative allows the teacher to redirect and refocus the course of teaching a subject
matter.

Placement plays a vital role in determining the appropriate placement of a student both
in terms of achievement and aptitude.
Approaches in Assessment

Assessment of Learning
Assessment for Learning
Assessment as Learning

Modes of Assessment
Traditional
Alternative
Authentic

Comparing NRTs and CRTs


Dimension
Average number of
students who get an item
right
Compare a students
performance to
Breadth of content sampled
Comprehensiveness of
content sampled
Variability
Item construction

Reporting and interpreting

NRT

CRT

50%

80%

Performance of other
students
Broad, covers many
objective
Shallow, usually one or two
items per objective
The more spread of scores,
the better
Items are chosen to
promote variance or
spread. One aim is to
produce good distracter
options.
Percentile rank and
standard scores used

Standards indicative of
mastery
Narrow, covers a few
objectives
Narrow, covers a few
objective
Variability may be minimal
Items are chosen to reflect
the criterion behavior

Number succeeding or
failing or range or
acceptable performance
used

1.1 Cognitive Targets


Knowledge (remembering) refers to the acquisition of facts, concepts, and theories.
Comprehension (understanding) refers to the same concept as understanding. It is a
step higher than mere acquisition of facts and involves a cognition or awareness of the
interrelationships of facts and concepts.
Application (applying)- refers to the transfer of knowledge from one field of study to
another or from one concept to another concept in the same discipline
Analysis (analyzing) refers to the breaking down of a concept or idea into its
components and explaining the concept as a composition of these concepts.
Evaluation refers to valuing and judgment or putting the worth of a concept or principle
. Synthesis (creating) refers to the opposite of analysis and entails putting together the
components in order to summarize the concept.
Development of Assessment Tools
Planning a Test and Construction of TOS
1. Identifying test objectives
2. Deciding on the type of objective test to be prepared.
3. Preparing a TOS.

4. Constructing the draft of the test items.


5. Try out and validation
Table of Specification (TOS)
A map that guides the teacher in constructing a test.
It ensures that there is a balance between items that test lower thinking skills and those
which test higher order thinking skills
It conveys to the teacher the number of items to be constructed per objective, their level
in the taxonomy, and whether the test represents a balanced picture based on what was
taught.
Objective Test Format
True-False
Make statements clearly True or False.
Avoid specific determiners.
Do not arrange responses in a pattern.
Do not use textbook jargon.
Use relatively short statements and eliminate extraneous materials.
Keep true and false statements approximately the same length, and that there are
equal numbers of True and False Items.

Avoid using double-negative statements.

Avoid the following:

Verbal clues, absolutes, and complex sentences.


Broad, general statements
Terms denoting indefinite degree or absolutes
Matching Type Items

Use a homogeneous topic.


Put longer options in the left column.
Provide clear directions.
Use unequal number of entries in the two columns.
The matching lists should be located on one page.

Completion Items

Provide clear focus for desired answer.


Avoid grammatical clues.
Put blanks at the end.
Restrict the number of the blanks to one or two.
Blanks for answers should be equal in length.

Essay Items
Use several short essay questions rather than a long one.
Provide a clear focus for students questions.
Indicate limitations or scoring criteria to pupils.

ITEM ANALYSIS AND VALIDATION


3

Item Analysis a numerical method for analyzing test items employing student
response alternatives or options.
Criteria in determining the desirability and undesirability of an item:
a. Difficulty of an item
b. Discriminating power of an item
c. Measures of attractiveness
Difficulty Index (P)
proportion of the number of students in the upper and lower groups who answered
an item correctly.
P = UL + LL
2n
Level of Difficulty of an Item
Index Range

Difficulty Level

Recommendation

0.00- 0.20

Very difficult

NA

0.21 0.40

Difficult

LA

0.41- 0.60

Moderately Difficulty VA

0.61 0.80

Easy

LA

0.81 1.00

Very Easy

NA

Discrimination Index (D)


Measure of the extent to which a test item discriminates or differentiates between
students who do well on the overall test and those who do not.
There are 3 types of Discrimination Indexes:
1. Positive
2. Negative
3. Zero
D = UL-LL
n
Discrimination
Index

Item
Evaluation

Recommendation

0.40 & above

Very good item

VA

0.30 0.39

RGI but possibly subject LA


to improvement

0.20 - 0.29

MI, usually needing and LA


being subject to
improvement

Below 0.19

Poor Item

NA

Criteria for Item Analysis


Option Analysis P

Evaluation

VA

VA

VGI (RET)

GI

(RET)

LA

(RET)

LA

F(

RET)

NA

LA

LA

BF

(REV)

NA

NA

NA

(REJ)

Properties of Assessment Methods

Validity
Reliability
Fairness
Practicality and Efficiency
Ethics in Assessment

Validity
The degree to which a test or measuring instrument measures what it intends to
measure.
soundness (what the test measures and how well it could be applied)
Types of Validity
Content Validity- the extent to which the content or topic of the test is truly
representative of the course.
Depends on the relevance of the individuals responses to the behavior area
under consideration rather on the apparent relevance of item content.
Commonly used in evaluating achievement test.
Appropriate for the criterion-referenced measure.
Concurrent Validity - the degree to which the test agrees or correlates with a
criterion set up as an acceptable measure.

Applicable to test employed for the diagnosis of existing status rather than for the
prediction of further outcome.
E.g. validating a test made by the teacher by correlating with a previously proven
valid test.
Predictive Validity determined by showing how well predictions made from the test
are confirmed by evidence gathered at some subsequent time
Construct Validity the extent to which the test measures a theoretical trait.

Reliability
The extent to which a test is dependable, self-consistent and stable.
It is concerned with the consistency of responses from moment to moment.
A reliable test may not always be valid.
Methods in Testing the Reliability of Good Measuring In
Test-Retest Method the same measuring instrument is administered twice to
the same group of students and the correlation coefficient is determined.
Limitations: time interval, environmental conditions
Spearman rank correlation coefficient or Spearman rho is a statatistical tool used
to measure the relationship between paired ranks assigned to individual scores
on two variables, x and y.
rs = 1 rs

6 D
N3 - N

= Spearman rho

6 D = sum of the squared difference between ranks


N
= total number of cases
Steps
Step 1. Rank the scores of respondents from highest to lowest in the first set of
administration (X1) and mark this rank as Rx. The highest score receives the
rank of 1.
Step 2. Rank the second set of scores (Y) in the same manner as in Step 1 and mark as
Ry.
Step 3. Determine the difference in ranks for every pair of ranks.
Step 4. Square each difference to get D2.
Step 5. Sum the squared difference to find D .
Step 6. Compute Spearman rho (rs).
Frequency Distributions
any arrangement of the data that shows the frequency of occurrence of
different values of the variable falling within defined ranges of a class interval.
applicable if the total number of cases (N) is 30 or more.
Steps:
1. Find the absolute range by subtracting the lowest score from the highest
score.
R = HS LS
2. Find the class interval by dividing the Range by 10 and 20 in order that the
size of class limits may not be less than 10 and not more than 20. In
choosing the class interval, ODD number is preferable.

3. Set up the classes by adding C/2 to the highest score as the upper class limit
of the highest class and subtract C/2 to the highest score as the lower class
limit of the highest class. Set up the real and integral limits.
4. Tally the score.
5. Determine the Cumulative Frequency and the Cumulative Percentage
Frequency distributions.
6. Present the frequency polygon and histogram.

Measures of Central Tendency


A single value that is used to identify the center of the data, it is taught as the
typical value in a set of scores.
Mean Most common measure of center and it is also known as the arithmetic average.
Population mean
Sample mean
Properties of Mean
Easy to compute
May not be an actual observation in the data set.
Can be subjected to numerous mathematical computation
Most widely used
Each data contributes to the mean value
Easily affected by extreme values.
Median
A point that divides the scores in a distribution into two equal parts when the scores are
arranged according to magnitude.
Properties of Median

Not affected by extremes values


Applied to ordinal level of data
The middle most score in the distribution
Most appropriate when there are extreme scores

Mode- Refers to the score or scores that occurred most in the distribution.
Unimodal
Bimodal
multimodal
Properties of Mode
It is the score/s occurring most frequently
Nominal average
It can be used for qualitative and quantitative data
Not affected by extreme values
Measures of Variability
Refers to a single value that is used to describe the spread out of the scores in a
distribution, that is above or below the measures of central tendency.
Range, quartile deviation, sd

También podría gustarte