Está en la página 1de 57

Guide to

Writing Objective Tests

A guide to writing selected response questions and creating


objective tests. Produced as part of the E-only project.

Version 1.2
April 2007
Page 1 Guide to Writing Objective Tests v1.2

INTRODUCTION TO SELECTED
SELE CTED RESPONSE QUESTIONS
QUESTI ONS

This guide was written as part of the E-Only project, which sought to develop SQA’s
first (fully) online qualification. A package of support materials was developed as part
of that project – and this document is part of that package.

It was written after an extensive literature review of UK and international (particularly


US) publications relating to objective testing. It is not a procedural guide for SQA
appointees. Specific sectors and subjects will have their own procedures for
producing objective tests and this guide does not seek to replace that guidance.
However, it does aim to describe best practice in the construction of objective tests
from a generic perspective.

This document has gone through a number of revisions since the first draft version
was written in July 2006. Special thanks to everyone who took the time to contribute
through SQA Academy (http://www.sqaacademy.com). The document is being
frequently revised. An online forum is available to discuss its contents at the
following URL, where the latest version of the Guide can be found:

http://groups.google.com/group/objectivetesting

Bobby Elliott (bobby.elliott@sqa.org.uk)


April 2007

PURPOSE OF THIS GUIDE

Although most SQA units employ “conventional” assessment, some subject areas
(mostly Science-related) have a tradition of using objective tests. For example,
Biology uses multiple choice questions at Intermediate, Higher and Advanced Higher
levels; and HNC Computing uses an objective test as part of the Graded Unit.

More recently, there has been greater emphasis on objective testing due to its
suitability for computer-based assessment; as a result, an increasing number of unit
specifications (at both National and Higher National levels) involve an element of
objective testing.

This guide will be of assistance to any SQA Officer (or appointee) involved in creating
objective tests. It has three objectives:

1. to provide advice about the construction of objective questions;

2. to explain how to combine questions into an objective test;

3. to provide guidance on authoring items.

A subsidiary objective is to standardise our vocabulary. Objective testing is a


technical area with lots of jargon – some of which is used inconsistently. This guide
is the result of a wide-ranging literature review and seeks to harmonise our
terminology with that used internationally.

Page 1 Introduction to Selected Response Questions


Page 2 Guide to Writing Objective Tests v1.2

Although some topics (such as item banking) overlap with computer-assisted


assessment (CAA), this guide focuses on the production of paper-based objective
tests – although much of the advice is directly transferable to a CAA environment.

The document has seven sections:

Section 1 Introduction to selected response questions (page 1)1)


Section 2 Types
Types of selected
selected response questions
questions (page 7) 7)
Section 3 Choosing selected response questions (page 15) 15)
Section 4 Writing multiple choice questions (page 22)22)
Section 5 Writing questions for higher level skills (page 33)
33)
Section 6 Item analysis (page 40)
40)
Section 7 Constructing tests (page
(page 44)
Section 8 Dealing with guessing (page 52).

While the focus of this guide is objective questions, it does not seek to promote one
type of assessment over another. Traditional forms of assessment remain as valid
today as they have ever been – but, where appropriate, objective approaches have a
role to play too. Neither does it seek to explain what you already know. Most SQA
staff have a good knowledge of objective testing - this guide simply seeks to provide
a single source of advice for busy Officers and appointees.

There are no rules for writing objective tests; there’s only advice. Do whatever you
think is right for your particular test. Assessment is an art – not a science. There is
no substitute for human judgement.

MISCONCEPTIONS ABOUT OBJECTIVE TESTING

Although this document does not seek to promote one type of assessment over
another, it does aim to dispel commonly-held, but inaccurate, views about objective
testing. Some of the most common misconceptions are rehearsed below.

1. Objective tests dumb-down education; objective tests are easy. Objective tests
are as “dumb” or “smart” as you choose to make them. Many high stakes tests
(such as university medical examinations in the UK and the SAT in the United
States) use objective tests.

2. Objective tests can only be used to assess basic knowledge. While this is largely
true in practice, there is nothing inherent in the design of objective tests to make
them unsuitable for assessing high level skills.

3. Objective tests encourage guessing. The problem of guessing can be resolved


through one of a number of recognised techniques.

4. Writing an objective test is easy. While most teachers can create simple
objective tests, the construction of high quality objective questions is highly
skilled and requires significant knowledge and experience.

Page 2 Introduction to Selected Response Questions


Page 3 Guide to Writing Objective Tests v1.2

5. Objective testing is only fashionable because of e-assessment. It’s true that


objective tests are well suited to computer-assisted assessment – but they are
also valid and reliable forms of assessment in their own right.

6. Objective tests aren’t appropriate for my subject. While objective tests have
traditionally been used in the physical and social sciences (such as Physics and
Psychology), they can be used in any subject.

QUESTION TYPES

SQA has traditionally employed a variety of question types within Unit and Course
assessment. These question types can be categorised under two headings:

• constructed response questions


• selected response questions.

Note:
Note Some of the terminology in this guide might not be
familiar to you. It has been used because it is widely
employed in international testing literature and it was
considered best to use “industry standard” nomenclature
rather than “Scottish” terminology.

CONSTRUCTED RESPONSE QUESTIONS

Constructed response questions (also known as “open-ended” questions) are


questions that require the candidate to create (“construct”) an answer. Examples of
constructed response questions (CRQs) include short answer questions and essays.

Example 1 ~ Constructed response question

Translate “Good morning mother” into Spanish.

Write here:

CRQs can be sub-divided into two sub-categories:

• restricted response questions


• extended response questions.

A restricted response question (RRQ) is a question


whose answer is limited to a few words. Examples of
RRQ ERQ RRQs include complete-the-sentence, missing word
and short answer questions (see Example 1 above).

Figure 1 - RRQs and ERQs

Page 3 Introduction to Selected Response Questions


Page 4 Guide to Writing Objective Tests v1.2

An extended response question (ERQ) is one whose answer requires the candidate
to write longer responses, normally consisting of two or more paragraphs. Examples
of ERQs include reports, essays and dissertations. There is no hard-and-fast rule
about where a restricted response question ends and an extended response
question begins.

Note that many SQA assessments use a combination of


restricted response questions and extended response
questions. Some question papers have two sections, one
employing RRQs and the other using ERQs.

SELECTED RESPONSE QUESTIONS

A selected response question (SRQ) is a question whose answer is pre-determined


and involves the candidate choosing (“selecting”) the response from a list of options.
Because the answer is pre-determined and there is only one correct answer, these
types of questions are often referred to as “objective” questions. Examples of SRQs
include true/false, multiple choice and matching questions.

Example 2 ~ Selected response question

The capital of the United States is New York. True/False

SQA’s question papers typically consist of constructed response questions. Lower


levels (up to SCQF level 4/5) generally use restricted response questions and higher
levels (SCQF level 5 and up) generally employ extended response questions
(although sometimes a mixed approach is used). A limited number of subjects
employ selected response questions.

This guide focuses on SRQs, which are becoming increasingly popular for a variety of
reasons.

ADVANTAGES OF SRQS

1. SRQs take less time to answer – reducing the amount of time that candidates
spend on assessment and increasing learning time.
2. SRQs are quick to mark – reducing the time teachers spend on assessment and
increasing teaching time.
3. SRQs are well suited to formative assessment – since candidates’ responses
can be analysed and used to provide detailed feedback.
4. SRQs are good for assessing breadth of knowledge - they are ideal for assessing
a broad range of topics in a short time.
5. SRQs are more reliable than CRQs - because they get around some of the
marking problems associated with written answers.

Page 4 Introduction to Selected Response Questions


Page 5 Guide to Writing Objective Tests v1.2

6. SRQs are well suited to computer-assisted assessment - and facilitate item


banking.

The low writing load of SRQs means that the focus is on the candidate’s knowledge
rather than the candidate’s writing or language skills – which is a common problem
with constructed response questions. Also, the speed of answering SRQs addresses
another common criticism of assessment – that it takes up too much time for both
students and teachers.

Research into the marking of CRQs and SRQs has shown significant differences in
the reliability of the two approaches – with objective tests proving to be significantly
more reliable than written tests. This has been the major reason for the widespread
adoption of objective tests in the United States, where testing organisations operate
in a more litigious environment.

The compatibility of objective tests with computer-assisted assessment is a major


driver for the renewed popularity of objective testing. SQA, along with other awarding
bodies, is in the process of building banks of questions (“item banks”) which can be
computerised and delivered to candidates over the Internet.

DISADVANTAGES OF SRQS
1. SRQs are not suitable for assessing certain abilities, such as communication
skills or creativity. They are also not appropriate when candidates are required
to construct an argument or provide an original response.
2. SRQs may be less valid than CRQs and suffer from low professional credibility.
3. SRQs that assess higher order skills are difficult (and time consuming) to
produce.
4. SRQs can be wordy and require high order reading skills.

The first and second disadvantages are linked. There is nothing inherent in the
design of SRQs to make them less valid than CRQs – but because they have often
been used inappropriately (to measure skills that cannot be properly measured by
this style of question) they have established a reputation for being invalid among
some practitioners.

Most teachers are comfortable with using SRQs to assess low order skills (such as
factual recall, typified by Example 2). They are less comfortable with their use in
assessing deeper knowledge and understanding. Most currently available examples
of SRQs re-affirm this view by focussing on the assessment of surface knowledge;
even examples of SRQs that are meant to assess deeper knowledge often only
assess surface knowledge – albeit less well known surface knowledge!

Traditionally, the costs of carrying out assessment come at the end of the process –
the setting of the question paper is relatively speedy, the time consuming part
comes when the papers have to be marked. Objective tests reverse this model – the
time consuming bit is the production of the questions, with marking taking very little

Page 5 Introduction to Selected Response Questions


Page 6 Guide to Writing Objective Tests v1.2

time. It is, therefore, something of a culture shock to move from traditional


assessment to objective testing.

Another criticism of SRQs is that they can atomise teaching and learning,
encouraging “teaching to the test” and surface learning. This, combined with their
efficiency in assessing large numbers of students in short periods of time, has
resulted in them acquiring a reputation as “weapons of mass instruction”, with poor
standing among many educationalists.

USES OF SELECTED RESPONSE QUESTIONS

As previously mentioned, objective tests are used in a number of SQA summative


assessments (such as Higher Physics and some HN units). This style of assessment
is well suited to rapid, focussed assessment and is traditionally employed to assess
factual recall and basic understanding. It is less commonly used to assess deeper
knowledge and understanding, and there are few examples (within SQA or
elsewhere) of objective tests being used to assess higher level skills. When used
summatively, objective testing tends to be used for low-stakes assessment rather
than high stakes assessment, which largely remains the preserve of constructed
response questions. However, some subjects (such as Advanced Higher Biology) do
employ objective testing and Higher Education has a long tradition in using objective
testing for high-stakes summative purposes in some fields (such as Medicine).

Objective testing is well suited to formative assessment since it is quick to


administer and assess (lack of time is often cited as the main reason for not using
formative assessment). It is particularly suited to diagnostic assessment since it can
be used to identify specific misunderstandings or weaknesses.

Historically, objective testing has been widely used for psychometric testing (testing
of intellect and attitudes) and, more recently, it has been widely applied to job
competence testing. It is also used in entry examinations for some professional
bodies (such as ACCA).

Objective tests are widely used internationally – including high stakes assessments
such as the SAT in the United States, which is used for university entry. They are also
widely used within vendor examinations (such as Microsoft’s global certification
programme). Awarding bodies in every country are focusing on computer-assisted
assessment, which has resulted in a renewed interest in objective testing. These
organisations share the view that the increasing popularity of e-learning will drive
demand for e-assessment – which will be underpinned by item banks consisting of
large numbers of selected response questions.

Page 6 Introduction to Selected Response Questions


Page 7 Guide to Writing Objective Tests v1.2

TYPES OF SELECTED RESPONSE


RE SPONSE QUESTIONS

There are several types of selected response questions (SRQs). Although they share
some common characteristics, they each have unique features and applications. But
they all share a fundamental characteristic – they have one unambiguously correct
answer.

TYPES OF SELECTED RESPONSE QUESTIONS

There are seven types of SRQ. These are:

1. true/false questions
2. matching questions

3. multiple choice questions (MCQ)

4. multiple response questions (MRQ)

5. ranking/sequencing questions

6. assertion/reason questions

7. Likert scale questions.

Each type of SRQ is now described and exemplified.

Note: This section simply introduces each type of


question. It does not aim to explain how or when to use
them.

TRUE/FALSE QUESTIONS

A true/false question (T/F) is a statement (not a question!) that is either true or


false. The candidate must select one of two possible responses - “true” or “false”.

Example 3 ~ True/false question

(x+1) is a factor of x2+2x-3 True/False

Because candidates have a 50/50 chance of answering these questions correctly,


this type of question is considered “easy” and is associated with low order
knowledge. However, true/false questions can assess higher order skills; and setting
an appropriate pass mark can eliminate the effects of guessing.

Page 7 Introduction to Selected Response Questions


Page 8 Guide to Writing Objective Tests v1.2

Note:
Note Any question that has one of two possible answers
is considered a true/false question (for example, the
responses might be “yes” or “no” rather than “true” or
“false”). These questions are also known as “alternative
response” items.

MATCHING QUESTIONS

This type of question requires candidates to match an object with one or more
associated characteristics.

Example 4 ~ Matching question

Match the list of storage technologies on the left with the list of memory characteristics on
the right. Match each technology (A, B, C or D) with one characteristic (1, 2, 3 or 4) only.

A. Hard disk 1. Non-volatile

B. Flash memory 2. Volatile

C. RAM 3. High capacity

D. ROM 4. Low cost.

A.

B.

C.

D.

The objects on the left are called “stimulators” and the matching statements on the
right are called “responses”. No more than seven stimulators should be included in
any one question.

This type of question is often used to assess candidates’ knowledge of the


characteristics of certain objects. It is particularly well suited to computer-based
assessment since it can be implemented as drag-and-drop (dragging each response
onto an associated stimulator).

Page 8 Introduction to Selected Response Questions


Page 9 Guide to Writing Objective Tests v1.2

MULTIPLE CHOICE QUESTIONS

A multiple choice question (MCQ) consists of a question (or incomplete statement)


followed by a list of possible responses from which candidates must select one.
There are normally three to five options with four being the most common.

Example 5 ~ Multiple choice question

In psychiatry, holding two contradictory views about the same thing is called:

A cognitive dissonance

B dementia

C dissociative disorder

D factitious disorder

Note that a multiple choice question with two options is


effectively a true/false question. Or, to put it more
accurately, a T/F question is a multiple choice question
with two options.

MCQs are the most common type of selected response question – and the one that
this guide focuses on in later sections.

MULTIPLE RESPONSE QUESTIONS

A multiple response question (MRQ) is similar to a multiple choice question (MCQ)


but has two or more correct responses (as opposed to an MCQ’s single correct
response).

Example 6 ~ Multiple response question

Which of the following statements about earthquakes is/are true?

A An earthquake generates seismic waves.

B The boundary of tectonic plates is called the fault plane.

C The point of origin of seismic waves is called its epicentre.

D The severity of an earthquake is measured by its magnitude and intensity.

Page 9 Introduction to Selected Response Questions


Page 10 Guide to Writing Objective Tests v1.2

There are some misconceptions about MRQs. They are not necessarily more difficult
than MCQs; they are as hard or as easy as you choose to make them. There is no
need to indicate the number of correct options; this only encourages guessing. And
there is nothing wrong with making every option correct; in fact, prohibiting this
possibility reduces the reliability of MRQs.

Note that MCQs normally begin: “Which


Which one of the
following..”, and MRQs usually begin: “Whi
Which
Which of the
following…”.

RANKING QUESTIONS

A ranking question involves ordering the options in some defined sequence. The
sequence can be an ordered list of numbers, chronological sequence or series of
events.

Example 7 ~ Ranking question

Rank the following countries in order of their population densities (lowest density first).

I France

II Germany

III Spain

IV United Kingdom

Ranking questions are easily implemented by computers using drag-and-drop.

ASSERTION/REASON QUESTIONS

This type of question consists of a statement (assertion) and a possible explanation


(reason). Candidates must decide if the assertion and reason are true, and whether
the reason is a correct explanation of the assertion.

Page 10 Introduction to Selected Response Questions


Page 11 Guide to Writing Objective Tests v1.2

Example 8 ~ Assertion-reason question

The following assertion and reason relate to World War II. Read the assertion and
associated reason and then choose a corresponding letter (A-E) to indicate whether the
assertion and/or reason is/are true.

Assertion Japan’s lack of raw materials was a cause of World War II in Asia.

Reason Japan lacked natural raw material except for small deposits of coal and
iron.

A Assertion is true and reason is true and the reason is a correct explanation of the
assertion.

B Assertion is true and reason is true but the reason is not a correct explanation of
the assertion.

C The assertion is true but the reason is false.

D The assertion is false but the reason is true.

E The assertion is false and the reason is false.

Assertion-reason questions are similar to multiple true-false questions.

LIKERT SCALE QUESTIONS

This type of SRQ was named after Rensis Likert who invented the scale in 1932. It is
widely used within questionnaires to gauge respondents’ attitudes. The classic Likert
scale consists of five possible responses:

1. Strongly disagree

2. Disagree

3. Neither agree nor disagree

4. Agree

5. Strongly agree

Some psychometricians add or remove options (the neutral option – “neither agree
nor disagree” – is often removed).

Page 11 Introduction to Selected Response Questions


Page 12 Guide to Writing Objective Tests v1.2

Example 9 ~ Likert Scale question

My manager supports me when necessary but otherwise allows me to work without


interference.

A Strongly disagree.

B Disagree.

C Neither agree nor disagree.

D Agree.

E Strongly agree.

This type of SRQ is almost exclusively used for attitudinal assessments and is rarely
employed within formal SQA assessments. It is not discussed further in this guide.

BEST ANSWER AND EXCEPTIONS

Although the existence of a single, unambiguous, correct response is a fundamental


feature of SRQs, the usefulness of SRQs can be extended through “best answer” and
“exception” type questions. These techniques increase the flexibility of SRQs at the
expense of some of their objectivity.

BEST ANSWER QUESTIONS

A “best answer” question is one whose answer is the closest (“best”) answer
selected from a list of possible answers of which more than one may be true. Used
carefully, best answer questions can be almost as objective as standard SRQs.

Example 10 ~ Best answer question

A user wishes to use a search engine to look for information relating to Celtic music that
originated in Scotland. Which one of the following queries is likely to produce the best
results?

A Celtic music Scotland

B “Celtic music” Scotland –football

C Scotland +celtic +music +originate

D “Celtic music that originated in Scotland”

Page 12 Introduction to Selected Response Questions


Page 13 Guide to Writing Objective Tests v1.2

Note that more than one of the responses is correct (in fact, they are all more-or-less
correct). But only one option is the best answer (B).

The use of best answer questions is particularly appropriate to the social sciences
and arts subjects, which tend not to have a definitive body of knowledge like the
physical sciences. Best answer questions can also be used to assess some higher
order skills since they frequently require an element of judgment.

EXCEPTION QUESTIONS

An exception question is one where all of the options are correct except one of the
possible responses. This type of question effectively reverses the logic of the
standard SRQ.

Example 11 ~ Exception question

Smoking is a contributory factor in the following conditions EXCEPT:

A diabetes.

B heart disease.

C lung cancer.

D Parkinson’s disease.

A question that includes “not” in the stem is effectively an exception question. For
example, the above question could be re-phrased: “Which one of the following
conditions is NOT caused by smoking?”.

Exception (and negative) questions are not ideal – but should not be completely
avoided since their use can simplify questions and/or increase the number of
questions that can be asked.

VARIANTS & CLONES

A question that assesses the same content as another question is known as a


variant.
variant The stems of variants are worded differently and the options may be
different – but, fundamentally, variants assess the same learning objective.

A question that is (almost) identical to another question is known as a clone


clone.
one Clones
differ only in their variables. For example, the question below is a clone of
Example 3, the only difference being the expression to be factorised.

Page 13 Introduction to Selected Response Questions


Page 14 Guide to Writing Objective Tests v1.2

Example 12 ~ Clone

(x+2) is a factor of x2+2x-3. True/False

Variants and clones have significant implications for e-assessment since they
provide a quick and simple way of rapidly populating an item bank.

Each type of SRQ has its strengths and weaknesses, and each has its best uses. The
next section looks at choosing SRQs for different purposes.

Page 14 Introduction to Selected Response Questions


Page 15 Guide to Writing Objective Tests v1.2

CHOOSING SELECTED RESPONSE


RE SPONSE QUESTIONS

The previous section explored the characteristics of different types of selected


response question. This section looks at how each type is best used.

TAXONOMIES OF LEARNING

One of the key determinants in the selection of SRQs is the kind of knowledge or
understanding that you are seeking to assess. For example, factual recall can be
adequately assessed using true/false questions; deeper understanding may require
more complex question types such as multiple response questions.

As a starting point, we need a method of classifying knowledge and understanding.


The most widely used classification system is Bloom’s Taxonomy.

BLOOM’S TAXONOMY

Benjamin Bloom wrote Taxonomy of Educational Objectives Book 1 Cognitive


Domain in 1956 in an attempt to standardise the terminology used by teachers to
describe academic abilities. Until the publication of this book, different people used
different words to describe the same thing; or, worse, used the same words to
describe different things.

His book described a classification system that could be used to categorise cognitive
abilities. The taxonomy (which became known as Bloom’s Taxonomy) is widely used
within the educational community.

Note:
Note Bloom’s Taxonomy is not the only way to classify
academic abilities. There are many alternative methods –
some linked to Bloom’s (but more up-to-date) and some
entirely different from Bloom’s. But Bloom’s Taxonomy
remains the most widely used classification system.

Bloom’s Taxonomy classifies academic abilities into six categories:

1. Knowledge
2. Comprehension

3. Application

4. Analysis

5. Synthesis

6. Evaluation.

A brief description of each cognitive skill follows.

Page 15 Introduction to Selected Response Questions


Page 16 Guide to Writing Objective Tests v1.2

Knowledge Knowledge involves the recall of specific facts and figures, or the recall of specific
methods and processes. Knowledge is the bottom of Bloom’s Taxonomy but
underpins the higher order abilities. There are three types of knowledge: knowledge
of specifics, knowledge of methods, and knowledge of universals. At the higher
levels (knowledge of methods and universals) it can be intellectually demanding.
This category includes: knowledge of terminology, knowledge of specific facts,
knowledge of conventions, knowledge of trends and sequences, knowledge of
classifications, knowledge of criteria, knowledge of methodology, knowledge of
principles and generalisations, and knowledge of theories and structures.

Comprehension Comprehension differs from knowledge in that it relates to the mental processes of
organising and re-organising information for a particular purpose. It includes:
translation, interpretation and extrapolation. Translation relates to the ability to
translate (or decode) a communication from one format (or language) to another.
Interpretation involves the explanation or summarisation of a communication.
Whereas translation involves a mechanistic, part-for-part rendering of a
communication, interpretation involves a more holistic, re-ordering or re-
arrangement of the information. Extrapolation involves extending trends or
sequences beyond the given data to infer consequences or corollaries.

Application This involves the use of knowledge and comprehension in specific situations. For
example, the use of knowledge of computing terminology and procedures combined
with an understanding of the principles of computer hardware and software can be
applied to the assembly of a computer system.

Analysis Analysis involves the breakdown of a communication into its constituent parts so
that the relationship between the elements is made clear. Analysis is intended to
clarify or explain communications or processes. This cognitive skill includes the
ability to: (1) analyse elements (identification of the components of the
communication); (2) analyse relationships (the ability to check the consistency or
accuracy of a hypothesis, and skills in comprehending the inter-relationships among
different ideas or concepts); and (3) analyse organisational principles (the ability to
recognise form and pattern in a communication, and the ability to recognise general
techniques used within a subject area).

Synthesis Synthesis involves combining the parts so as to form a whole. It involves combining
and arranging parts or pieces of a communication to create something new. It may
involve: (1) the production of a unique communication; (2) the production of a plan;
and (3) the derivation of a set of abstract relations to represent physical
phenomena.

Evaluation Evaluation involves making judgements about the value of particular phenomena for
given purposes. Evaluation is carried out using criteria and involves qualitative and
quantitative judgements based on these criteria. The criteria may be given or
created. This includes measuring the internal consistency of the communication
using criteria such as: quality of writing, accuracy of the information contained within
it, and consistency of argument; and measuring the external consistency of the
communication which requires the evaluator to have a detailed knowledge of type of
phenomena under review since it will be evaluated in terms of the general criteria
which are applied to phenomena of this type.

Table 1 - Bloom's Taxonomy

Page 16 Introduction to Selected Response Questions


Page 17 Guide to Writing Objective Tests v1.2

Bloom’s Taxonomy is a
hierarchy in that each
category builds on the one
below. For example,
application depends on
comprehension which in turn
depends on knowledge. Or, to
put it more simply: you can’t
apply something until you
understand it; and you can’t
understand something until
you know about it. The figure
opposite illustrates this
hierarchy -- with knowledge at
the bottom and evaluation at
the top. Figure
Figure 2 - Bloom's hierarchy

It is worth noting that, in practice, every level of Bloom’s


Taxonomy can be reduced to knowledge if the candidate
answers the question through rote learning. The most
sophisticated evaluation can be answered correctly if the
candidate has studied that specific scenario and learned
the correct response. Or, to put it another way, one
person’s evaluation is another person’s knowledge.

IDENTIFYING THE LEVEL OF A QUESTION

Bloom’s Taxonomy can be used to categorise the cognitive demands of a question.


For example, a question asking a candidate to “describe” something is normally
associated with the knowledge domain; another question asking the candidate to
“explain” something is normally associated with analysis. In fact, the verb in the
question can provide a clue to the question’s intellectual demands.

Table 2 associates some verbs with the levels within Bloom’s Taxonomy.

Page 17 Introduction to Selected Response Questions


Page 18 Guide to Writing Objective Tests v1.2

Level Verbs

Knowledge define, describe, label, list, name, recall, show, who, when, where
where

Comprehension compare, discuss, distinguish, estimate, interpret, predict, summarise

Application apply, calculate, demonstrate, illustrate, relate, show, solve

Analysis analyse, arrange, categorise, compare, connect, explain, infer, order,


separate

Synthesis arrange, combine, compose, create, design, formulate, hypothesize,


integrate, invent, modify, plan

Evaluation assess, compare, decide, defend, discriminate, evaluate, judge, justify,


measure, rank,
rank, recommend.

Table 2 - Verbs
Verbs associated with Bloom's
Bloom's Taxonomy

So, for example, a question that commences: “Define…” is likely to assess basic
knowledge; a question that begins “Compare…” is likely to assess analytical or
evaluative skills.

Note:
Note SQA does not formally use a recognised taxonomy
for assessments. However, when one is employed by
Officers or appointees, it is usually Bloom’s. Some SQA
question papers fall foul of Bloom’s Taxonomy, asking
candidates to “explain” something but actually awarding
marks for descriptions (or vice-versa).

DIFFICULTY AND DEMAND

Bloom’s Taxonomy provides an indication of the demand of a question – it does not


define its difficulty.
difficulty A question’s demand is a measure of its intellectual
requirements; its difficulty is “how hard” it is. Although difficulty and demand are
related (most demanding questions are difficult), a question can have high demand
and low difficulty - or low demand and high difficulty.

Example 13 ~ Low demand, high difficulty question

Describe the main processes that take place during nuclear fusion.

This question has low demand (relating to factual recall) but high difficulty because it
relates to a complex topic (nuclear fusion). Similarly, crossing the road involves
evaluation skills (Is the road clear? Is it safe to cross? How far away is that car?),

Page 18 Introduction to Selected Response Questions


Page 19 Guide to Writing Objective Tests v1.2

which are at the top of Bloom’s hierarchy – but is not a difficult task for most people.
So, merely climbing Bloom’s Taxonomy is no guarantee of difficulty.

The concept of difficulty and demand has important implications for question setting.
Most SQA tests employ low difficulty/low demand questions; but even the “more
demanding” questions may not be – they might simply assess knowledge in a more
difficult way (by, for example, assessing little known knowledge).

QUESTION TYPES AND DEMANDS

Each question type can be related to one or more levels in Bloom’s Taxonomy. While
it’s possible to use any one of the question types for almost any of Bloom’s levels,
some are better than others for specific levels as the following table describes.

True/False While mostly used to assess knowledge, T/F questions can, in fact, be
used to assess knowledge, comprehension and application levels.

Matching Again, mostly used to assess basic knowledge but can be used to
assess knowledge and comprehension.

MCQ MCQs are the most flexible type ofof SRQ and can assess all levels; they
are particularly suitable for knowledge, comprehension, application
and analysis.

MRQ MRQs can assess the same range of levels as MCQs – but have the
potential to create more difficult questions within each category.

Ranking Ranking questions are well suited to assessing application and


analysis.

Assertion Suitable for knowledge, comprehension and analysis.

Table 3 - Question types and demand

So, in theory, SRQs can assess all of Bloom’s levels. However, in practice, it is
uncommon to come across SRQs that assess anything other than knowledge and
comprehension. But this is not an inherent limitation in their design. Assessing
higher order skills can be done – but it is a time consuming and skilled task to do so.

Page 19 Introduction to Selected


Selected Response Questions
Page 20 Guide to Writing Objective Tests v1.2

ADVANTAGES & DISADVANTAGES OF QUESTION TYPES

As stated previously, each question type has its unique characteristics and uses. The
applications of each type are determined by its strengths and weaknesses.

Type Advantage(s) Disadvantage(s)

True/False Well suited to basic knowledge. Limited applications (best


suited to dichotomous
Easy to write. knowledge).
Rapid to mark.

Suited to dichotomous knowledge.

Good for formative assessment – especially


diagnostic assessment.

Matching Relatively easy to write. Limited to knowledge and


comprehension.
Quick to mark.
Best used for homogenous
Good for assessing knowledge of content i.e. classifying
characteristics/features or relationships types.
between variables.

Well suited to computerisation (drag-and-drop).

MCQ/MRQ
MCQ/MRQ Can assess a wide range of cognitive abilities Good MCQs (at any level)
(up to analysis). are difficult and time
consuming to construct.
Scenario-based questions can assess higher
order skills. MCQs that assess high level
abilities require skilled
Well suited to diagnostic assessment authors.
(distractors can target learning difficulties).
Unsuitable for assessing
Item analysis provides detailed feedback (to synthesis and evaluative
assessors and candidates). skills.
Simple MCQs are quick and easy to construct.

High re-usability of items.

Assertion Well suited to assessing relationships between Difficult to construct.


variables.
Limited applications
Well suited to assessing understanding of cause- (compared to MCQs).
and-effect.
Wordy – difficult to read and
Good for constructing demanding items. understand.

Table 4 - Advantages and disadvantages of question types

Page 20 Introduction to Selected Response Questions


Page 21 Guide to Writing Objective Tests v1.2

In practice, the main barriers to constructing high quality items are the skills and
experience of the authors. A talent for writing traditional question papers does not
necessarily translate to writing SRQs – so experienced setting teams may struggle to
create high quality item banks. Even after training, some writers don’t “get” SRQs –
while others are veritable question factories.

Note:
Note If a unit writer wishes to use objective testing, s/he
should not prescribe the particular type of SRQ in the unit
specification itself. It’s better to simply state that selected
response questions may be used – and leave the choice
of SRQ to the assessment writers (although the Support
Notes may suggest specific forms of SRQ).

Page 21 Introduction to Selected Response Questions


Page 22 Guide to Writing Objective Tests v1.2

WRITING MULTIPLE CHOICE QUESTIONS

This section focuses on the construction of a specific type of selected response


question (SRQ) – the multiple choice question (MCQ). However, much of the advice
is transferable to other forms of SRQs.

Multiple choice questions are the most common type of SRQ; they’re also the most
flexible and most difficult to construct. MCQs are used in all types of objective testing
(including high stakes assessment) and are the most common form of SRQ
employed by SQA.

ANATOMY OF AN MCQ

A single, complete multiple-choice question is called an item.


item It poses a question and
allows a candidate to select the correct answer from a list of possible options. An
MCQ has the following structure:

Figure 3 - Anatomy of an MCQ

Stem (or
(or stimulus):
stimulus) the question or problem.

Options (or responses or alternatives)


alternatives): the list of possible answers.

Key:
Key the correct (or best) answer.

Distractors:
Distractors the incorrect alternatives to the key.

Note the spelling of “distractor” – which is the US-English


spelling rather than the International English spelling
(“distracter”).

WRITING MULTIPLE CHOICE QUESTIONS

There is no formula for constructing high quality items. However, there is some
guidance that aids their construction.

Page 22 Introduction to Selected Response Questions


Page 23 Guide to Writing Objective Tests v1.2

THE ITEM

The key to writing good items (“authoring”) is to ensure that the question directly
relates to the underlying Arrangements, it is clearly presented, and free from
unnecessary details. A question should not be a test of reading ability; the focus
must be on the knowledge or skill that it is seeking to assess.

 Ensure that each item is relevant to the course/unit outcomes.

 Ensure that the level of language is appropriate to the target cohort.

 Assess one thing at a time (unless you intend to ask an integrative question).

 One correct answer only.

 Don’t write questions in isolation.

 Don’t include unnecessary words.

 Pre-test items whenever possible.

one
The most difficult part of writing an item is to ensure that there is only on e correct
answer. Having more than one potentially correct answer is the most common
complaint from teachers and candidates. It’s a challenge to write items with one
clearly correct answer – at least non-trivial items. It’s easy to be subjective or context
dependent (i.e. the key is correct is some circumstances but not others). One
solution is to spell out the context – but this may make the item clumsy or wordy or
gives clues to the correct answer. Another option is to use words and phrases like
“best” or “most likely” in the stem (it’s easier to argue that the key is the most likely
answer rather than the only answer).

Although the initial construction of questions has to be the work of an individual, it’s
vital that items are reviewed prior to being used operationally. It’s impossible for a
single author to both write and review items independently.

SRQs are well suited to pre-


pre-testing – which means trying them out on students
before using them operationally. Pre-testing will confirm the item’s suitability (or not).
It also generates valuable data about the question that can be used in item analysis
(see Section 6).

STYLE GUIDE

Each item should follow an agreed house-style to provide guidance on language use.
A style guide for item writing would normally include advice about:

• spelling

Page 23 Introduction to Selected Response Questions


Page 24 Guide to Writing Objective Tests v1.2

• punctuation

• use of emphasis

• prose style

• language.

For example, spelling advice would include the treatment of numbers (spelled in
words or written as digits?); punctuation advice would include information on the
punctuation to use within options (should they end with a period or without any form
of punctuation?); emphasis rules would include the use of bold and italics; prose
style and language would provide general advice about the type and level of
language to be used.

THE STEM

It’s best to phrase the stem as a self-contained question rather than a partial
statement – although the latter approach is neither uncommon nor invalid.

 Try to phrase the stem as a complete question (unless this is too contrived –
when an incomplete statement may be used).

 Use clear, straight-forward language – suitable for the target cohort in terms of
level of language.

 Place necessary wording in the stem – not in each of the options.

 Avoid irrelevant or unnecessary information.

 Avoid negative wording if possible – or use negatives sparingly.

 Specify any standards implied.

 Avoid the use of personal pronouns (“I”, “You” etc.).

 Avoid subjectivity e.g. “Which one of the following do you think is…” (what the
candidate “thinks” is subjective – and her response cannot be wrong).

Any words that would be repeated in each of the options should be included in the
stem. Options should not begin or end with identical words and phrases.

Page 24 Introduction to Selected


Selected Response Questions
Page 25 Guide to Writing Objective Tests v1.2

Example 14 ~ Repeated text

If the pressure of a certain amount of gas is held constant, what will happen if its volume is
increased?

A The temperature of the gas will decrease.

B The temperature of the gas will increase.

C The temperature of the gas will remain the same.

Example 15 ~ Repeated text removed

If the pressure of a certain amount of gas is held constant, what will happen to the
temperature if its volume is increased?

A Decrease.

B Increase.

C Remain the same.

Avoid words like “could” and “would”. For example, asking a candidate “What would
you do…” cannot be answered incorrectly (since only the candidate can know what
she would do in any given circumstance) – instead write: “What should you do…”.
The following example illustrates a poor question.

Example 16 ~ Using subjective wording

A computer is running slowly. What could be responsible?

A Insufficient memory

B Over-heating

C Small hard drive

D Virus

Page 25 Introduction to Selected Response Questions


Page 26 Guide to Writing Objective Tests v1.2

The author intends D to be the correct answer – but any of the options could be
correct. Here is an improved version.

Example 17 ~ Subjective wording removed

A computer suddenly runs slowly without any changes to its configuration. What is most
likely to be responsible?

A Insufficient memory

B Over-heating

C Small hard drive

D Virus

Notice the added contextual information in the stem to improve the clarity of the
question – and the replacement of “could” with “most likely”.

Specify any standards implied.. If an item calls for a judgment, specify the authority
or standard upon which the correct answer is based.

Example 18 ~ Standards specified

According to the American Medical Association, the diet of the average American provides
vitamins in amounts that are what?

A Adequate for normal consumption.

B Inadequate for normal consumption.

C In excess of normal requirements.

D Variable in relation to individual requirements.

The key to good stem construction is to keep the question (or statement) as short as
possible – consistent with providing sufficient information to clearly pose the
question. But don’t be tempted to reduce the length of the stem by moving
information into each of the options; this complicates the question and increases the
candidate’s reading time.

Negative wording is not prohibited but it’s better to word a question positively when
this is possible. Double negatives should be completely avoided i.e. two negatives in
the stem or a negative in the stem and a negative in the options. However, some

Page 26 Introduction to Selected Response Questions


Page 27 Guide to Writing Objective Tests v1.2

questions can be made unnecessarily complex by avoiding a single negative – in


which case, use negatives. When negatives are used, emphasise “NOT” (or whatever
construct is used) in the stem (or the options).

THE OPTIONS

 Provide between three and five options – four options is most common.

 Options should be internally consistent (e.g. all consisting of people’s names, not
three names and the measurement).

 All of the options should be plausible.

 All of the items should be quality equivalent.

 Ordering of the options should follow a consistent and logical sequence.

 The length of options should be comparable.

 Options should be mutually exclusive.

 Only one correct (or best) answer.

 The one correct answer (key) should be actually correct.

 The key should not be worded in a way that would make it likely to change over
time.

 Ensure that none of the distractors is conditionally correct (depending on


circumstances or context – unless these are defined in the stem).

 Do not create distractors that are too close to the key.

 Don’t use words such as “not”, “never” or “always” to make an option incorrect.

 Avoid the use of “All of the above”.

 “None of the above” should be used sparingly (and when used should be the
correct answer some of the time).

 Avoid pejorative language (such as “bad”, “low”, “ignore” etc.).

 Avoid syllogistic reasoning e.g. “Both A and B are correct”.

Some of the advice is conflicting – such as “Stems should


be short and simple” and “Move information to the stem
rather than repeat it in each option”. Dealing with these
tensions is the art of item construction!

Page 27 Introduction to Selected Response Questions


Page 28 Guide to Writing Objective Tests v1.2

The advice about pejorative language is quite subtle. Any option that uses words
such as “bad”, “low” and “ignore” is usually a distractor – authors rarely use such
words in the key.

At higher levels of understanding, it can be difficult to construct questions with one


objectively correct answer and it is a common error in such questions to offer
options that include more than one potentially correct answer. Careful wording
(“Which one of the following is likely to be the best answer…”) can get round this
potential problem.

SEQUENCING OPTIONS

Ordering of the options within an item should follow a logical order. If using numbers
or dates then they should be displayed numerically or chronologically in ascending or
descending order (normally ascending). Text answers should normally be sorted
alphabetically unless there is a “natural” sequence to the options, in which case the
natural sequence should be used in preference to alphabetical order. Do not order
the options to try to evenly distribute
distribute the answers (i.e. to ensure each option – A, B,
C and D – is used approximately the same number of times) nor attempt to avoid
clustering keys (e.g. A-B-B-B-C) since both of these strategies reduce the
randomness of the test.

USE OF “NONE OF THE ABOVE”

The option “None of the above” should be used sparingly. It is preferable to avoid the
use of “None of the above” as well as “All of the above” Studies have shown that
they decrease item discrimination and test score reliability (see Section 6). However,
“None of the above” can be used if authors ensure that:

• it is used in several items in a test

• it is sometimes the correct option (but not always)

• it is not used after a negative stem

• it is not used as “padding” (because you are short of other options).

“None of the above” may be particularly useful in questions that require candidates
to carry out calculations, since this option effectively mops-up a large range of
potential errors. But, if it’s used, it must sometimes be the key.

Page 28 Introduction to Selected Response Questions


Page 29 Guide to Writing Objective Tests v1.2

Example 19 ~ Good
Good use of “None of the above”

Which one of the following is the solution for x in the equation 5(x-1)=10.

A 0

B 2

C 4

D None of the above

ADVICE ON WRITING DISTRACTORS

The quality of distractors has a huge impact on the quality of the question.
Distractors have a particularly important role to play in formative assessment since
their careful selection can provide a wealth of diagnostic information about the
candidate’s present understanding. In summative assessment, carefully selected
distractors can catch out unprepared (or under prepared) candidates. Writing
distractors, therefore, requires as much thought as writing the key.

Distractors should be as plausible as the key; do not use unrealistic or humorous


distractors as this effectively reduces the number of (real) options.

 Distractors should be as plausible as the key – no silly distractors – although


some can be relatively weak.

 Common misunderstandings make good distractors.

 Incorrect paraphrasing of the question makes for good distractors.

 Correct sounding distractors are good for the poorly prepared candidate.

 True statements that do not answer the question are good distractors.

There is a balance to be struck between writing good distractors and trying to dupe
candidates. Distractors should not “entrap” candidates – that is, catch out
candidates through clever wording, very fine distinctions or tricks-of-the-trade. If you
want to write a difficult question then do so through the knowledge and skills
required to answer it – not by tricking the candidate into giving the wrong answer.

Page 29 Introduction to Selected


Selected Response Questions
Page 30 Guide to Writing Objective Tests v1.2

ADVICE ON AVOIDING CUEING

“Cueing” is the tendency for the stem (or the options) to infer the key. It is a common
problem with SRQs. The following question has only one option (A) which is
grammatically correct (the stem ends with “an” and only option A begins with a
vowel).

Example 20 ~ Cueing

A word used to describe a noun is called an:

A adjective

B conjunction

C pronoun

D verb

 The wording in the stem should not provide obvious clues to the correct answer.

 Don’t give clues to the correct answer by ensuring the options flow from the
stem, are in the same format and tense, and are grammatically correct.

 Don’t allow the wording of the options to provide obvious clues to the correct
answer.

 Avoid the use of “always” and “never” in the options since these responses are
rarely correct.

 Avoid the use of “sometimes” and “often” in the options since these responses
are often correct.

 Avoid using stereotypical language that could give away the answer.

 Avoid using phrases from textbooks.

 Avoid pejorative wording (“bad”, “low” etc.) since these words are rarely used in
the key.

 Avoid absolute language such as “always”, “never”, etc. since these are rarely
correct.

Page 30 Introduction to Selected Response Questions


Page 31 Guide to Writing Objective Tests v1.2

 Avoid complex language in one option compared with other options (this option
tends to be the correct answer).

 Avoid similar language in the stem and the options since the option with the
most similar language is most likely to be the key.

 Avoid visual cueing i.e. one option being much longer or standing out in some
other way from the other options – this one is likely to be the key.

The length of options should be similar. An option that stands out from the others
can indicate to a student that it is the right answer. If different lengths are
unavoidable then use two long options adjacent to each other and two short options
adjacent to each other.

The following example illustrates some of this guidance.

Example 21 ~ Advice in context

“Shakespeare wrote plays and they reflect both the depth of human emotion and the
complexity of human society.”

Which one of the following phrases improves the wording of the underlined fragment?

A “Shakespeare wrote plays who reflect…”

B “Shakespeare wrote plays that reflect…”

C “Shakespeare wrote plays which reflect…”

D “Shakespeare wrote plays being that they reflect…”

The question appears to be a valid assessment of candidates’ knowledge of English


grammar (presuming that this is what the author intended to assess) – although a
more familiar context could have assessed the same knowledge (the mere mention
of Shakespeare can disorientate candidates).

The question is clearly worded – although some of the language in the stem is
unnecessarily complex (words such as “fragment” could confuse candidates).

The options look homogenous, with none standing out (no visual cueing). They have
been ordered in a logical sequence (sentence length). They are all plausible to the
under-studied candidate. There is some repeated text in the options that a rewording
of the stem may avoid (but maybe not without making the question less clear). The
distractors have been chosen to reflect common misunderstandings among
candidates with respect to the use of “that”, “which” and “who”. And there is one
unambiguously correct option (B).

Page 31 Introduction to Selected Response Questions


Page 32 Guide to Writing Objective Tests v1.2

All in all, a reasonable (albeit, imperfect) question.

DISCLOSERS

A concept associated with cueing is disclosing.


disclosing A discloser is a question that
contains the answer to another question. Unless otherwise intended, every question
should be independent of every other question and should contain the minimum
information required to answer the question. However, it can happen that the stem
or options in one question inadvertently help candidates to answer another question.

Disclosure is a particular problem in item banking when it is impossible to predict


which items will be included in a particular instance of a test (such tests are usually
dynamically generated by a computer – and a computer is unlikely to spot the
subtleties of disclosure).

A checklist, summarising the advice for item construction, is provided in the


appendices.

Page 32 Introduction to Selected Response Questions


Page 33 Guide to Writing Objective Tests v1.2

WRITING QUESTIONS FOR


FO R HIGHER LEVEL SKILLS

Multiple choice questions (MCQs) have gained a reputation for being a quick-and-
dirty way of assessing low level knowledge. However, they can also be used to
assess higher level skills – but this requires a great deal more effort on the part of
the writer. This section explores the potential of MCQs to assess higher level skills.

As has been previously stated, MCQs can be used to assess all of the levels within
Bloom’s Taxonomy – although they are more suited to the lower levels.. This section
explores a couple of techniques for writing higher order questions and exemplifies
this against each level in Bloom’s Taxonomy.

Writing MCQs to assess higher order skills frequently contradicts some of the
previous advice about writing good items. For example, such questions often involve
long stems; complex language is frequently used; standards are often omitted (or
the question becomes one of knowledge of the standard); and they often require an
element of judgement on the part of the candidate (and, as a consequence, are less
objective).

Note: There is a fundamental distinction between writing


questions that assess higher level skills and writing items
that assess lower order skills in a “difficult” way. Writing a
question that assesses some esoteric piece of knowledge
is not a higher order item – although few candidates will
answer it correctly, it is still only assessing a low level
ability albeit in a difficult way (see previous discussion on
difficulty and demand).

TECHNIQUES FOR WRITING HIGHER ORDER QUESTIONS

Writing higher level questions is easier in some subjects than others. Some fields,
such as mathematics, are problem solving based and in such subjects it is relatively
straight-forward to produce questions that assess more than knowledge and
comprehension (see example 3 for a straight-forward application level question in
Maths). In other subjects it’s not so easy.

However, there are a few techniques that can be used to help authors produce more
demanding MCQs. We will look at two:

1. scenario questions
2. passage-based reading.

Before we do, there is a very simple technique that can be used to transform a
simple knowledge question into one that is more demanding. Instead of asking
“What…?”, ask “Why…?”. For example, in a Geography test, instead of asking:
“Which one of the following cities is the capital of the United States” (which assesses
basic knowledge), ask why Washington is the capital of the US (which requires an

Page 33 Introduction to Selected Response Questions


Page 34 Guide to Writing Objective Tests v1.2

explanation).

Example 22
22 ~ Upgrading questions

Why is Washington DC the capital of the United States?

A It is a planned city, capital by design.

B It is the largest city in the United States.

C It is located beside a large river and manufacturing base.

D It is located in a position safe from British troops during the American


Revolution.

This is a quick-and-dirty technique to generate more demanding questions,


upgrading basic knowledge questions to comprehension or analysis levels.

SCENARIO QUESTIONS

The main method of writing demanding items is to present a scenario to candidates


and then pose one or more related questions. The scenario can be anything from a
paragraph to a page (although a very long scenario really requires a number of
follow-on questions to justify its length). The associated question(s) may involve a
range of cognitive abilities including interpretation (comprehension), prediction
(comprehension), calculation (application), problem solving (application), explanation
(analysis), inference (analysis), categorisation (analysis) and decision making
(analysis and evaluation).

Scenarios can be used in all subjects but are particularly suitable in the social
sciences. Science subjects are inherently suited to problem solving and it is easier in
these areas to pose demanding questions without the need for lengthy scenarios.

The examples provided in this section are given without detailed comment. You are
encouraged to critically appraise each question yourself. When you do, you will
appreciate that no )non-trivial) question is without its weaknesses.

A scenario question has a straight-forward construction. It consists of some text,


which may be illustrated with a diagram or photograph, and one or more associated
questions. The scenario can take one of a number of forms including:

• a description of a specific environment

• a description of a specific situation

• a description of a principle or theorem

Page 34 Introduction to Selected Response Questions


Page 35 Guide to Writing Objective Tests v1.2

• a description of a problem

• an explanation of an event

• the results of an experiment (or the results of research).

Most scenario questions involve an element of interpretation on the part of the


candidate.

The candidate will take more time to process a scenario question as it often requires
a high level of reading skills. This should be taken into account when determining
the duration of a test (see Section 7).

Example 23 ~ Application skills

Julie is 14 years old and frequently uses an online community called MyParty, which is a
social network used by many of her friends. However, the service is open to any member
of the public. She has become very friendly with Jamie, who is another user of the service,
whom she has never met. Jamie’s profile reports that he is 16 years old and attends a
nearby school. Julie and Jamie share many common interests and Jamie has asked to
meet Julie, who wants to meet him.

Which one of the following is Julie’s best course of action?

A Refuse to meet with him.

B Agree to meet with him but accompanied with a responsible adult.

C Agree to meet with him but accompanied with a friend.

D Agree to meet with him.

This question uses a specific situation to ask a question that involves application
skills. Any question that uses a scenario that the candidate is unfamiliar with is, in
effect, assessing application skills.

Page 35 Introduction to Selected Response Questions


Page 36 Guide to Writing Objective Tests v1.2

Example 24 ~ Application and analysis skills

A user is having problems reading files from a flashdrive. While most files work correctly,
any attempt to access a few specific files results in an operating system error message:
“Cannot read file. Storage device may be corrupt.”. Which one of the following actions is
normally the best course of action in such circumstances?

A Copy the readable files from the device and do not re-use the device.

B Copy the readable files from the device, reformat it and recopy the files to the
device.

C Ignore the error and continue to use the part of the device that is usable.

D Reformat the device and re-use it.

Note that this question is an example of problem solving. Note also that there are at
least two weaknesses. The key (B) “looks” correct (it is the longest and most detailed
option); at least one of the distractors is weak (C) and uses pejorative language
(“ignore”). But it has its strengths too. The key is clearly the best answer (not always
an easy task when writing demanding questions) and it’s a challenging question
(admittedly made easier by the options). And the author didn’t resort to “None of the
above” as a final option! It is a moot point whether this item can be “fixed” or
whether it has to be discarded.

The following example uses a single scenario and a number of linked questions of
increasing demand.

Page 36 Introduction to Selected Response Questions


Page 37 Guide to Writing Objective Tests v1.2

Example 25 ~ Application and analysis skills

Raj and Sophie, who have never been married, have two children – Ben aged 8 and
Shazia aged 2. Raj and Sophie’s relationship has ended, and Sophie has married Carlton.
Raj has agreed that the children can live with Sophie and Carlton for the time being.

For questions 1-4, the options are:

A Raj and Sophie.

B Raj, Sophie and Carlton.

C Sophie and Carlton.

D Sophie only.

E Raj only.

1 Who has parental responsibility for the children at present?

2 If Section 8 orders are required in respect of the children, who could apply as of right
(without leave) for any Section 6 order?

3 Who would be able to apply as of right (without leave) for a residence or contact
order?

4 If Raj obtained a contact order to see the children every week, who would have
parental responsibility for the children?

PASSAGE-BASED READING

A second technique to aid the writing of demanding questions is to use passage-


based items. This involves presenting a passage of around 100 to 800 words and
asking one or more linked questions about the passage.

The passage can be narrative, argumentative or expository in nature. The questions


can ask candidates about the meaning of words in the passage (vocabulary in
context); ask questions about significant information the passage is seeking to
impart (literal comprehension); or measure candidates’ ability to analyse information
as well as to evaluate the assumptions made and the techniques used by the author
(extended reasoning).

Page 37 Introduction to Selected Response Questions


Page 38 Guide to Writing Objective Tests v1.2

Example 26 ~ Passage-based question

1 “Psychoanalysis has been criticised on a variety of grounds by Karl Popper,


2 Adolf Grünbaum, Mario Bunge, Hans Eysenck, L. Ron Hubbard and others.
3 Popper argues that it is not scientific because it is not falsifiable. Grünbaum
4 argues that it is falsifiable, and in fact turns out to be false. The other schools of
5 psychology have produced alternative methods of psychotherapy, including
6 behaviour therapy, cognitive therapy, primal therapy and person-centred
7 psychotherapy.

8 An important consequence of the wide variety of psychoanalytic theories is that


9 psychoanalysis is difficult to criticise as a whole. Many critics have attempted to
10 offer criticisms of psychoanalysis that were in fact only criticisms of specific
11 ideas present in one or more theories, rather than in all of psychoanalysis.
12 For example, it is common for critics of psychoanalysis to focus on Freud's
13 ideas, even though only a fraction of contemporary analysts still hold to Freud's
14 major theses.” (Wikipedia)

A number of linked questions could be asked about this passage. For example, a
vocabulary-in-context question could ask about the meaning of a word (or term) such
as “falsifiable” (line 3) or “cognitive therapy” (line 6); a literal comprehension
question could ask about the candidate’s understanding of this passage (such as
asking her to choose the best (one line) summary of the passage); and a number of
extended reasoning questions could be posed (such as one asking about criticisms
of Freudian psychoanalysis).

Passage-based reading can also be used to measure evaluation skills by asking


candidates to judge the logical consistency of written material, the validity of
experimental results, the interpretation of data, or the quality of writing.

Page 38 Introduction to Selected Response Questions


Page 39 Guide to Writing Objective Tests v1.2

Example 27
27 ~ Evaluation skills

The Fibonacci sequence of numbers can be defined by the following


mathematical recurrence relation.

The following Java method (i.e. function) specifies an implementation of this


recurrence relation.

public static int fibonacci (int n) {


if (n == 0 || n == 1) {
return 1;
} else {
return fibonacci (n-
(n-1) + fibonacci (n-
(n-2);
}
}

Which one of the following statements best evaluates this function?

A The algorithm will produce the correct result and is efficient.

B The algorithm will produce the correct result but is inefficient.

C The algorithm will not produce the correct result.

D The algorithm will fail.

Page 39 Introduction to Selected Response Questions


Page 40 Guide to Writing Objective Tests v1.2

ITEM ANALYSIS

One of the major advantages of selected response questions (SRQs) is that they can
be easily analysed.

Item analysis permits a more scientific approach to assessment. If you know the
properties of each question (for example, how difficult it is or how well it separates
candidates of differing abilities) then you can construct a better test.

This section explores two classical ways of analysing items: (1) measuring their
difficulty; and (2) measuring how well they separate candidates. The next section
explains how these measures can be used to construct tests.

FACILITY VALUE

The facility value (FV) of an item is a measure of its difficulty – or, more accurately,
its “easiness”. It represents the proportion of candidates who answer the item
correctly and is expressed as a decimal fraction between zero and one.

A FV of zero means that no-one answered the question correctly; a FV of one means
that everyone answered the question correctly; and a FV of 0.6 means that 60% of
the test takers answered it correctly. The lower the FV, the more difficult the item;
the higher the FV, the easier the item (hence, it is better thought of as an “easy
index”). A very easy item might have a FV of 0.9 (meaning that 90% of candidates
are expected to answer it correctly) and a very difficult item might have a FV of 0.1
(meaning that 10% of candidates are expected to answer it correctly).

Note:
Note In a competency-based system (such as SQA’s), the
FV measures the probability of a minimally competent
candidate answering the question correctly – not
not a typical
candidate.
candidate

Facility values are best assigned during pre-testing. Once a sample group of students
has attempted the item (assuming that this sample is representative of the target
cohort), an initial FV can be assigned. If pre-testing is not possible (or, more likely,
not feasible) a predicted facility value (PFV) can be assigned by the test authors.
Predicted FVs are assigned by subject matter experts (SMEs) and represent the
“best guess” of two or more SMEs. This initial estimate can be re-calibrated once the
item is used operationally.

Note that a FV is a relative measure of an item’s difficulty


– relative to the target cohort’s age and stage. For
example, a simple addition question might have a low FV
for Primary 2 pupils but a high FV for Primary 4 pupils.

Note that, in theory, any SRQ will have a minimum FV greater than zero. For example,
any true/false question will have a minimum FV of 0.5 (which represents the 50-50

Page 40 Introduction to Selected Response Questions


Page 41 Guide to Writing Objective Tests v1.2

chance of guessing the answer correctly) and any MCQ (with four options) will have a
minimum FV of 0.25 (no matter how difficult it is). However, in practice, some FVs
will be lower than this due to the way the item has been constructed – with a key
attracting more than its fair share of candidates and badly constructed distractors
attracting very few candidates.

It is recommended that items with FVs greater than 0.9 are discarded (too easy);
similarly FVs lower than 0.1 should be avoided (too difficult).

DISCRIMINATION INDEX

The discrimination index (DI) of an item is a measure of how well that item separates
candidates. It relates each candidate’s test score with his/her performance on a
specific item, and then compares the top candidates with the bottom candidates.

For example, if 30 candidates attempt an item, the DI measures the performance of


the top third (top 10) of the candidates with the bottom third (bottom 10) of
candidates (based on final test scores). If eight of the top ten answered the item
correctly and two of the bottom third answered it correctly then the item’s DI is:

DI = (8-2)/10 = 6/10 = 0.6.

DI values range from +1 (all of the top candidates answered it correctly and none of
the bottom candidates) to -1 (all of the bottom candidates answered it correctly and
none of the top candidates!); a DI of zero means that the same number of top and
bottom students answered it correctly. A positive DI is essential (which shows some
discrimination). If an item yields a zero or negative DI, discard it. The above example
illustrates good discrimination. It recommended that an item has a DI of at least 0.2;
items with DI values of 0.4 and above are considered to have good discrimination.

Discrimination indices cannot be predicted. They must be derived through pre-


testing or operational use.

There is a link between a question’s facility value and its discrimination index. A
“good” question that is designed to be difficult will have a low facility value and high
discrimination. But not all questions with low FVs will have high DIs. A poorly
designed question that is difficult to answer due to lack of clarity or inappropriate
language may have a low FV and low discrimination (since few candidates can
answer it – and poor candidates are as likely to get it right as good candidates).

The following example (see over) illustrates the facility value and discrimination
index for a specific question. The item was designed to assess the mathematical
knowledge of S2 candidates. It was pre-tested on 60 candidates of whom 18
answered it correctly; 15 in the top third and three in the bottom third. This gave the
following item analysis:

FV = 0.30
DI = 0.60

Page 41 Introduction to Selected Response Questions


Page 42 Guide to Writing Objective Tests v1.2

Example 25 ~ Item analysis

If the radius of a circle is increased by 20%, which one of the following represents the
corresponding increase in the circle’s area?

A 40%

B 44%

C 120%

D 144%

This item is difficult. Given that blind guessing would produce a one-in-four chance of
answering it correctly (FV=0.25), the recorded FV of 0.30 (representing 30% of the
sample) is very low. It also discriminates well, meaning that it is likely to separate
candidates and aid grading.

It is worth noting that this item is slightly cued. “44%” appears twice in the options
(in B and D) – which might encourage some candidates to assume one of these
options is correct (which would be a correct assumption – the key is B). This could
have been avoided by selecting a different value for D (such as 160%).

OTHER METRICS

There is a range of other metrics that can be calculated for SRQs. Most are complex
and, unlike facility values and discrimination indices, have no “real” interpretation.
However, the distractor pattern provides useful information about which of the
options candidates choose. For example, the following distractor pattern illustrates
the choices made by 100 candidates for Example 25 (above).

Option
Option Frequency
of selection

A 15

B 40

C 10

D 35

This distribution would infer that distractors A and C are under-performing and need
to be strengthened or replaced. It might also indicate that distractor D is too strong

Page 42 Introduction to Selected Response Questions


Page 43 Guide to Writing Objective Tests v1.2

and may require weakening. It would appear that this question comes down to a
straight choice between options B and D for most candidates.

There isn’t a perfect distribution for the options – but options that are rarely selected
or a distractor that is more popular than the key warrant attention.

Item analysis provides a means of evolving item banks by identifying under-


performing (“weak”) items – and eliminating them (“survival of the fittest”). The
initial calibration of items can be done formally (through field testing items prior to
their use) or informally (using predicted facility values for example) and these initial
values can be re-calibrated once the items are used in earnest. However, to be
effective, item bank evolution (like biological evolution) needs a mechanism to
identify weak items and replace these with stronger ones.

Page 43 Introduction to Selected Response Questions


Page 44 Guide to Writing Objective Tests v1.2

CONSTRUCTING TESTS

AUTHORING TESTS

This section looks at the process of combining questions into a test. The following
diagram illustrates the test generation procedure.

Figure 4 - Test generation


generation procedure

TEST SPECIFICATION

The test specification is the document (or “blueprint”) that defines the precise
nature of the test. It is normally created by the Principal Assessor (or equivalent)
under advice from the SQA Officer. The test specification will include the following
information:

• description (including links with source unit(s) and outcome(s))

• question format(s)

• number of questions

• duration

• rubric (including the marking scheme)

• pass mark (including grade boundaries where applicable)

• conditions of assessment.

A sample test specification is provided in the appendices.

The description of the test must (at a minimum) define the learning objectives that
the test is seeking to measure. In the context of SQA, this would mean the unit(s)
and outcome(s) that the assessment is testing (its “domain”).

The question format defines the type of question that the test will employ. This might
be true/false, matching, multiple choice or multiple response – or a mix of these
types. For example, a test might use 15 MCQs and 5 MRQs – the test spec’ should
spell this out.

The number of questions is self-evident but note that where more than one question
type is employed, the spec’ should specify the number of each type.

The duration
duration of the test will depend on the number of questions and the complexity
of the questions. Simplistic formulas for the duration of a test (“two minutes per

Page 44 Introduction
Introduction to Selected Response Questions
Page 45 Guide to Writing Objective Tests v1.2

question”) should be avoided. Scenario questions, in particular, take time to read,


assimilate and answer. The duration should be based on a typical test undertaken by
a typical candidate. If in doubt, err on the side of generosity – unless speed of
response is a critical aspect of the assessment.

The rubric defines the marking scheme and provides instructions to candidates.
Setters may adopt a simple marking scheme (one mark per question) or more
complex schemes (involving one, two or more marks for each item depending on its
importance or complexity). Simple marking schemes are recommended. This section
should also provide any special instructions for candidates.

The pass mark (or cutting score) is the minimum mark that candidates must gain in
order to achieve a pass in the test. There are a number of techniques for setting
pass marks, some of which are discussed later in this section. But pulling a figure
out of thin air is not one of them. And 50% is rarely a suitable cut score for an
objective test (due to the effects of guessing – see below).

If a test is graded (beyond the basic pass/fail threshold), the grade boundaries must
be defined. The grade boundaries define the marks required to gain an A or B or C
pass. For example, a C pass might require a total score between 60% and 74%, B
between 75% and 89%, and an A pass 90% or more.

Finally, the test spec’ should describe any special conditions that have not already
been described elsewhere in the specification. Examples include: access to
reference material (Is the assessment open book? Or open web?) and permitted
materials (such as calculators or special instruments).

ASSEMBLING THE TEST TEAM

The test team is responsible for constructing the test, using the test specification as
a blueprint. This team will normally consist of an SQA Officer and a number of setters
– or, in testing terminology, a test expert (the SQA Officer) and a number of subject
matter experts (the setters). The SMEs should have prior knowledge and experience
of writing SRQs. The size of the team will depend on a number of factors such as the
number of items required and the time available to write them. The more items
required and less time available, the greater the number of SMEs needed.

Subject matter experts may need training in the construction of selected response
questions. This can be done at the authoring event (see below) or prior to this event,
at a specific training event.

Page 45 Introduction to Selected Response Questions


Page 46 Guide to Writing Objective Tests v1.2

AUTHORING EVENT

Due to the collaborative nature of item writing, it is recommended that questions are
produced over a short period of intensive activity rather than the more traditional
SQA approach to question setting. For example, a team of four SMEs might be asked
to produce 200 items over an intensive working weekend. A suggested workflow
during the authoring event is provided below.

Allocate learning
outcomes to
SMEs

Agree targets Add item to


with SMEs item bank

Write Revise or discard


item item Yes

Add item Accept


No
to batch Item?

Completed Pass batch Reviewer checks


No Yes
batch? To reviewer batch

Figure 5 - Authoring event workflow

Authors need to be crystal clear about the learning objectives (outcomes) that they
are to assess. Where more than one outcome is to be covered by an individual SME,
the number of questions for each outcome should be agreed. Each author’s targets
should also include the types of question and number of each type of question (for
example: “Twenty multiple choice questions and 10 multiple response questions”),
the average facility value for their set of questions (see below), and the expected
productivity rate (for example, five items per hour).

Writing items is a solitary activity. Although authors may seek advice when they write
questions, the act of putting pen to paper (or, more likely, finger to keyboard) is an
individual task. Authors should be provided with a question template before
commencing. This template (which is normally a Word document) defines the precise
format of the question and will include metadata about the item (such as the

Page 46 Introduction to Selected Response Questions


Page 47 Guide to Writing Objective Tests v1.2

associated keywords and its predicted facility value). A sample template is provided
in the appendices.

If the items are being written for a test with a known pass mark, authors will require
to know the target facility value (FV) to aim for. For example, if the writers are
producing items for a test with a pass mark of 15/20 then the target FV will be 0.75
and each author should ensure that each batch of questions has an average FV of
0.75 (so that the overall item bank has a “correct” FV).

Authors should batch items before passing a group of questions to a designated


reviewer for checking. The reviewer will then review each item and do one of three
things: (1) accept it without change; (2) accept it with revisions; or (3) reject it. While
it is unlikely that the author and reviewer cannot reach a compromise about a
disputed item, in such cases the Principal Assessor should make a final decision.
Reviewing is best done blind (i.e. without knowing the identity of the author) to
prevent personality conflicts from interfering with the process. While group reviewing
is a good means of training writers and reviewers, it is an inefficient way to create
large numbers of items.

The output from the authoring event will be an item bank of approved and calibrated
items. The SQA officer will play a crucial role
role in maintaining workflow and ensuring a
productive event. Target setting and regular milestones will play an important part in
ensuring a successful outcome. At various points during the event, the officer should
convene review meetings when progress can be measured, and problems or
bottlenecks can be collectively identified and addressed.

DETERMINING TEST LENGTH

Determining the number of questions to include in a test is an important decision.


The length of a test has a direct relationship with the test’s reliability – the longer the
test (and, by implication, the more questions in the test), the more reliable that test
will be as a measure of the candidate’s ability.

There are a number of factors that affect test length including:

• the importance of the test


• the size of the domain being assessed
• the range of knowledge and skills contained within the domain
• the time available.

A high stakes test needs to be more reliable than a low stakes test – and therefore
needs to be longer. However, the improvement in reliability levels off over a certain
number of questions.

The number of learning objectives being assessed also has a bearing on the size of
the test. A test that assesses several outcomes (or one large outcome) will obviously
require more items than one that assesses fewer outcome (or smaller outcomes).
However, even a test that assesses a single outcome may require lots of questions if
that outcome covers a broad range of knowledge and skills.

Page 47 Introduction to Selected Response Questions


Page 48 Guide to Writing Objective Tests v1.2

And, finally, the time available needs to be considered. There is no point is designing
a test with 60 questions, requiring two hours to complete, if this is disruptive to
centres. For example, most Scottish schools operate a 50 minute period and tests
that last longer than this can be difficult to administer.

There is no formula for test length. Criticality, domain size and practical
considerations need to be balanced. However, in most instances of unit assessment
it is best to keep tests as short as possible to reduce the assessment burden on
centres (and candidates).

TECHNIQUES FOR SETTING PASS MARKS IN OBJECTIVE TESTS

There are a number of ways to set a pass mark. We will look at three methods:

1. Informed judgement
2. Angoff method
3. contrasting groups

Some are more “scientific” than others but, no matter which method is used, none of
them replace the need for human judgement.

INFORMED JUDGEMENT

This technique involves the most human judgement and, as a consequence, is the
most subjective way of setting pass marks (it also is the method most similar to the
way that SQA sets cut-scores).

At its most basic level, informed judgement involves the opinion of the members of
the setting team. These subject matter experts (SMEs) agree a sensible pass mark
based on their expert judgement and the following considerations:

• the minimum mark achievable through guessing

• the criticality of the judgement being made about candidates

• the complexity of the subject domain

• the difficulty of the test items

• the age and stage of the candidates.

No matter how little a candidate knows, s/he is unlikely to score zero marks in an
objective test due to the effects of guessing. For example, in an objective test
consisting of 100 multiple choice items, each with four options, blind guessing
should produce a minimum mark of 25% (representing the one in four chance of
guessing the correct answer to each question). For this reason, the pass mark in an
objective test is usually higher than 50%.

The importance of the assessment also has a bearing on the pass mark. For
example, an assessment that grants a license to practice as a surgeon is more

Page 48 Introduction to Selected Response Questions


Page 49 Guide to Writing Objective Tests v1.2

important that an assessment that confers a pass in a unit. Where it is critical that
candidates possess particular competences both the test duration (see above) and
the pass mark (see below) should be increased.

If there is an existing item bank, the difficulty of the items in the bank can be used to
determine the pass mark. For example, if we know that an item bank contains
difficult questions then that would result in a lower pass mark; conversely, a simple
item bank would lead to a higher pass mark. Associated with this is the complexity of
the subject domain. For example, a test on nuclear physics might have a lower pass
mark than one on multiplication tables – although this is dependent on the age and
stage of the candidates.

In practice, the informed judgement would be based on all of these considerations –


some of which may drive the pass mark up and some may push it down. For
example, an undergraduate true/false test for medical students would have a
significantly higher pass mark than a multiple response test for a low level unit.

The initial judgement may be refined after further consultation or pre-testing. For
example, practicing teachers may be asked about their views about the proposed
pass mark; and/or the assessment may be field-tested and the pass mark adjusted
in the light of the resulting scores.

ANGOFF METHOD

This method of determining the pass mark is less subjective than the informed
judgement approach. It involves aggregating the facility values (FVs) for each item
and estimating the pass mark based on this figure. The following example illustrates
this method.

Question FV

1 0.8

2 0.6

3 0.6

4 0.3

5 0.4

Total 2.7

Pass mark 3/5

Table 5 - Setting pass marks using Angoff

Recall that the facility value is a measure of the probability (between 0 and 1) of
minimally competent candidates answering the question correctly. For example,

Page 49 Introduction
Introduction to Selected Response Questions
Page 50 Guide to Writing Objective Tests v1.2

based on the above table, there is an 80% probability that candidates will answer
question one correctly (FV=0.8). Adding the FVs for each question, therefore,
provides an indication of the total score that a minimally competent candidate
should achieve (in this case 2.7). Subject matter experts would then either round
this value down or up using their professional judgements (in this case the aggregate
FV was rounded up). The resulting pass mark for this test is three out of five.

In practice, pass marks are defined in the test specification, and the task, therefore,
becomes one of selecting questions with FVs that aggregate to this pass mark. We
effectively reverse engineer the Angoff method. For example, if the test specification
defines a pass mark of 7/10 then the test should consist of questions whose FVs
add to seven (give or take a decimal place). This is a very simple task for a computer.

CONTRASTING GROUPS

This method, unlike the previous ones, requires pre-testing. The test is issued to two
groups of students – one group who are expected to pass and one group who are
expected to fail. The actual scores are then plotted on a chart and the intersection of
the graphs provides an initial pass mark. This initial pass mark is then refined using
the SMEs’ expert judgement.

The graph below illustrates the results for two groups of students – one group (the
blue line) expected to fail and one group (the red line) expected to pass.

30

25

20
No of candidates

15

10

0
10
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
0
5

100

Marks

Figure 6 - Setting pass marks using contrasting groups

The initial cut score would be around 55% (the approximate intersection of the two
lines). Raising this to 60% would reduce the number of “incompetent” students who

Page 50 Introduction to Selected Response Questions


Page 51 Guide to Writing Objective Tests v1.2

would pass the test – but increase the number of “competent” students who would
fail. Conversely, decreasing the pass mark to 50% would reduce the number of
“false fails” but increase the number of “false passes”. The final decision is based
on the professional judgement of the SMEs.

These methods can be used alone or in combination. They all provide some scientific
basis to the process of setting the pass mark. The alternative – pulling a pass mark
from thin air – is not an option.

Page 51 Introduction to Selected Response Questions


Page 52 Guide to Writing Objective Tests v1.2

DEALING WITH GUESSING


GUESSIN G

Guessing is often cited as a major problem with selected response questions and it
is true that blind guessing can produce relatively high marks for candidates in an
objective test. For example, blind guessing in a true/false test should produce a
result of approximately 50%. However, there are well established ways of dealing
with guessing. These are: pass mark setting, negative marking and correction-for-
guessing.

SETTING AN APPROPRIATE PASS MARK

The simplest way of dealing with guessing is to adjust the pass mark accordingly.
Instead of the “traditional” 50% pass mark, the cut score can be made higher to
compensate for the effects of guessing. For example, a multiple choice test that has
a pass mark of 75% is unlikely to be passed by blind guessing. We have already seen
three ways of determining the pass mark for an objective test (informed judgement,
Angoff method and contrasting groups). Any of these methods will eliminate (or
greatly reduce) the effects of guessing.

NEGATIVE MARKING

Negative marking involves deducting marks for incorrect answers. For example, the
following table illustrates a candidate’s scoring pattern in a five item test where one
mark is awarded for the correct answer, zero marks where a question is not
answered and one mark deducted for the incorrect answer.

Question Mark

1 1

2 1

3 0

4 -1

5 1

Total 2

The main problem with negative marking is that it penalises partial knowledge.
Selecting a “good” distractor is a better than choosing a “bad” distractor – but both
choices will result in the loss of a mark.

Page 52 Introduction to Selected Response Questions


Page 53 Guide to Writing Objective Tests v1.2

CORRECTION-FOR-GUESSING

This technique involves deducting a certain number of marks from every candidate
to compensate for the effects of guessing. The number of marks deducted can be
worked out in a number of ways, ranging from the crude (a fixed number of marks
deducted from every candidate) to the more sophisticated (when the number of
marks deducted is not fixed and is based on an estimate of how many guesses each
candidate has made). An example of the second approach follows.

In a 50 item test, where each item is a multiple choice question consisting of four
options (a key and three distractors), a candidate scores 38/50. The proportion of
marks deducted is based on the number of incorrect answers (which are assumed to
be guesses) and is worked out as follows:

No. of marks to deducted = No. of wrong answers x (1/No. of distractors)

In this case:

No. of marks deducted = 12 x 1/3 = 4 marks.

So, four marks would be deducted from this candidate giving her an adjusted score
of 34.

While less crude than negative marking, this method suffers from similar problems –
it penalises partial knowledge as much as no knowledge, and disproportionately
affects low risk takers who will choose not to attempt a question rather than answer
it for fear of losing marks, resulting in many unanswered questions – and deflated
marks.

SEQUENCING QUESTIONS

When deciding the order of items in a test, it should be borne in mind that tests
should begin with relatively simple questions and progress to more complex
questions. It is also advisable to group item types together – for example, all
true/false items and all MCQs. So, in most cases, it is advisable to begin with
straight-forward, low difficulty true/false questions and progress to more complex,
higher difficulty MRQ or assertion/reason items.

Page 53 Introduction to Selected Response Questions


Page 54 Guide to Writing Objective Tests v1.2

APPENDIX 1 – SAMPLE TEST SPECIFICATION


S PECIFICATION

Source unit
Provide details about unit, outcomes and performance criteria that the test is assessing.
Title Internet Safety Ref. No. 10 1234 SCQF level 4
No. of
Outcome Performance criteria
questions
1 All 9

2 d, e 9
Outcome(s) and Performance Criteria
3 a, b, c 7

Test details
Provide details about the test.
No. of questions 25 Type Number Additional info.
4 options for
Duration 50 min. MCQ 20
each question
Question format(s)
4 options for
MRQ 5
each question

There must be a fixed number of questions for each outcome. See


Selection of questions
distribution above. The question types (MCQ and MRQ) can be
Explain selection criteria for questions.
distributed between outcomes as desired.

Pass mark(s)
16/25
Including grading thresholds where applicable.

Rubric
Marking instructions, instructions to candidates, assessment conditions etc.

Marking
One mark per question.
instructions

Assessment
conditions No access to reference material (paper or web).
Such as reference
Candidate authentication is required.
materials, location,
authentication.

Instruction to
No special instructions.
candidates

Author Bobby Elliott Date 12 May 2006

Page 54 Introduction
Introduction to Selected Response Questions
Page 55 Guide to Writing Objective Tests v1.2

APPENDIX 2 – SAMPLE TEMPLATE FOR MCQS

Item

Stem

Options

Key

2
Distractors

Metadata

Outcome

PC(s)

PFV

Tags

Workflow

Writing Writer Date Time

Reviewing Reviewer Date Time

Banking Banker Date Time

Page 55 Introduction to Selected Response Questions


Page 56 Guide to Writing Objective Tests v1.2

APPENDIX 3 – CHECKLIST FOR MULTIPLE


MULTIP LE CHOICE QUESTIONS
Q UESTIONS

Test ID Item ID Reviewer

ITEM
The question relates to learning outcome(s) and performance criteria.

The level of language is appropriate to the target candidates.

The question is set at an appropriate level of difficulty.

There is one unambiguously correct answer.

Cueing is avoided.
STEM
The stem is phrased as a question.

Unnecessary information is not included.

Necessary standards are specified.


Negative wording is avoided.

Personal pronouns (“you”, “we”, etc.) are avoided.

Subjective wording is not used e.g. “What do you think…”.


OPTIONS
Options are sequenced in a definite order.
Length of options are similar.
Options are mutually exclusive.
The key is not distinctive in terms of length, wording etc.
Distractors are correct in every context (unless a specific context is given).
Definitive wording (“never”, “always”, etc.) is not used.
Pejorative wording is avoided e.g. “bad”, “little”, etc.
“None of the above” is used sparingly.
“All of the above” is not used.
COMMENTS

Page 56 Introduction to Selected Response Questions

También podría gustarte