Está en la página 1de 13

10/28/12

International Test Commission Publications orta

Social implications and ethics of testing


Dragos Iliescu, Dan Ispas and Michael Harris

Table of contents
Social im plications and e thics of te sting Abstract Introduction. Ethics as re fle ction and pe rsonal choice W hat is e thical? R ight and wrong be yond the law W hat m ak e s a te st e thical? (a). Adhe re nce of the te sting proce ss to the ge ne ral principle s of the scie ntific m e thod (b). C haracte ristics of the inte raction proce ss be twe e n the te sting profe ssional and the othe r stak e holde rs involve d in the te sting proce ss Inform e d conce nt Te st se curity and use r qualifications R e fe re nce s Sugge ste d hype rlink s to we b page s Q ue stions for classroom discussion

Uploaded June 2009

Abstract
The paper aims at increasing awareness of testing professionals towards ethical issues. Ethics is treated as a meta-category, defining right and wrong beyond law, morality or religion. International guidelines addressing ethics in the field of psychological testing are discussed. We discuss what makes a test and the testing process ethical: the adherence to the general principles of the scientific method (objectivity, reliability and validity) and some characteristics of the interaction process between the testing professional and the other stakeholders involved in the testing process (fairness, procedural justice, sharing and communication of results, informed consent, test security and user qualifications).

Introduction. Ethics as reflection and personal choice


A discussion on the ethics of testing probably should begin with a rigorous definition of what ethics is. But ethics is difficult to pin down and most theorists in the field of ethics avoid giving clear definitions, preferring instead to build the case for ethics by examples. We should state for the beginning that ethics describes criteria for assessing the appropriateness of behaviors, be they actions, decisions, or intellectual stances.

Ethics is from a scientific point of view a branch of philosophy. Philosophers distinguish Author inform ation between normative ethics, which is a prescription on what people should believe to be right and wrong, and applied ethics, which focuses on the examination of specific real-life situations. As such, applied ethics is not exclusively the turf of philosophers, but also of the practitioners who are confronted by specific real-life issues (Barnhart, 2002). In a very broad sense, ethics refers to the principles of right and wrong conduct. Ethical standards are prescriptions about what humans in general (or in our case professionals) ought to do, usually worded in terms of rights, obligations or benefits to society or for a greater good. As such, ethics has very loose boundaries with other domains, such as morality, religion and law. We will address some of these differences in a later section, building the case for a definition of ethics as a significantly broader intellectual endeavor than the common conception of analyzing right and wrong. In our understanding, ethical judgments are action-guiding, while not being prescriptive. Of course, ethics is normative and prescriptive: it is concerned with how we ought to act and what results we ought to try to bring about. Still, while ethics is normative, stating how we ought to act, and what results we ought to aim for in our actions, ethics is more than a set of principles. It is a matter of course that ethical behavior has to be based on sound principles and values. However, ethics is not an automatic comparison of a real-life situation to a set of norms, it also requires active intellectual processing. The impossibility of automatic normative judgment is given by the fact that real-life situations most often require the practitioner to react to complicated issues, which have bearing on multiple and conflicting values or ethical principles, thus defining ethical dilemmas. It is therefore appropriate to state that ethics is the study of what happens when there are no simple answers to a situation. However, ethics also requires ethical thinking, i.e. ethical reflection. Ethical reflection is based on the perception of ethics, as well as on the ethical judgment (Roberts & Wood, 2007). In order to have an ethical reflection, a practitioner should be able to perceive and identify the dilemma as a situation which involves ethics in some way. He/she should be able to apply one or more ethical principles to this situation, should consider alternatives and shouldcome to a personal decision of how he/she will behave. Thus, being ethical is something one does by ones own choice, beyond legal or moral prescriptions. Defining ethics on the basis of reflection and personal choice has a bearing on the course of this paper. We argue that practitioners should be prepared to judge professional situations and issues critically and creatively from an ethical point of view. In order to facilitate such a behavior, a normative stance in this paper would be of little if any help. It is not our intention to cover all the possible combinations of conflicting principles which would define dilemmas requiring ethical reflection. Instead, we hope to provide practitioners in testing with heightened awareness to what we consider to be relevant ethical categories in the field of psychological testing, facilitating in this way ethical reflection when real-life situations have a bearing on these issues.

What is ethical? Right and wrong beyond the law


Developing competencies for ethical reasoning is important for testing practitioners, because ethics also has a bearing on the social responsibility of testing professionals. Ethics is about relationships, about the place of a service to society, or, in a

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

1/13

10/28/12

International Test Commission Publications orta


broader sense, about the place of a profession in society. Ethics is at the core of virtually every discipline or profession, but is considered of high importance for relatively few. The statement of ethics is thus a statement of social responsibility of a profession and at the same time a statement of personal responsibility of those who practice that profession (Oakland, 2005). Ethics in testing or, in a broader sense, in psychological work, usually provides explicit norms of correct behavior. Most national associations of psychology have explicit ethical standards to which their members abide. Leach & Oakland (2007) have discussed a number of 31 ethics codes impacting the practice of psychology in 35 countries and found them important benchmarks for professional competence. However, Pope & Vasquez (2007) urge that awareness of the existing ethics codes and formal standards, while crucial to professional competence is not a substitute for an active and deliberative approach to fulfilling the ethical responsibilities required by professional practice. Ethics is thus more than blind and indiscriminate adherence to a standard or law. Ethics moves the discussion towards right and wrong beyond the law and ethical judgment may and should be done aside from the law and even in absence of a law. For example, ethical judgment enables psychologists and test users to be active in countries without formal standards in these domains (Leach & Oakland, 2009). The statement that ethical judgment goes beyond the law addresses the basis for that specific judgment, which is not so much the law, as the moral principle underlying it. Furthermore, we underline the fact that moral principles are not enforceable. While a law, or even an ethics code adopted by a national association may and will be enforced and trespassing will and should be prosecuted in some manner, moral standards move beyond the possibility of an organization of enforcing them. Most countries around the globe have laws regarding the activity of psychologists in general However, there are very few laws concerned with tests and testing. As ever more psychologists work cross-nationally and as psychology becomes ever more internationalized, it is important for international organizations with an interest in tests and testing to assume leadership and to generate a normative body valid from an international point of view. So far, psychologists, educators and other professionals active in testing have turned to the implicit moral standards which are the foundation for laws and ethics codes. Until a relevant international association will provide a comprehensive set of rules, to form a standard or guideline targeted directly to tests and testing, the main body of principles governing ethical reasoning related to tests will be implicit and related to our deepest beliefs as human beings and as psychologists. Some formal documents, which are the result of pioneering work and adherence to high standards of professional practice, have proven to be very influential with respect to the ethics of testing. We should mention in this respect the continuous work done by the APA, which is reflected in its latest form in the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999), a voluminous document of over 100 pages, discussing more than 250 principles of ethical testing, grouped in 15 categories. Also, we should mention the Principles for the validation and use of personnel selection procedures, published by the Society for Industrial and Organizational Psychology (SIOP, 2003), which has proven to be a landmark for I/O testing around the globe. However, in spite of their wide reach and pioneering farsight, these documents and others like them are tributary to local values, customs and practices and do not bring about adherence from psychologists around the world. An important step forward was the exceptional leadership assumed by the International Test Commission, visible in a set of three standards related to test usage (ITC Guidelines on Test Use, Bartram, 2000), test adaptation (ITC Guidelines on Adapting Tests, Hambleton, 1994; Oakland 2005) and computer-based and internet-delivered testing (ITC International Guidelines on Computer-Based and Internet-Delivered Testing, Bartram & Coyne, 2005). Several regional codes of ethics have provisions relevant for testing. Examples include the common code of the five Nordic countries of Denmark, Finland, Iceland, Norway, and Sweden (Nordic Psychologists Associations, 1998), the brief declaration of ethical principles of the four South American countries of Argentina, Brazil, Paraguay, and Uruguay (Ethical Principles Framework for the Professional Practice of Psychology in the Mercosur and Associated Countries; Ferrero, 2006) and the EFPA (European Federation of Psychologists Associations) Meta-Code of Ethics, which was approved in 1995 and revised in 2005 (Lindsay, Koene, vreeide, & Lang, 2008). However, none of these codes and guidelines have the international coverage needed for their universal acceptance. We feel that one document is especially relevant in this situation. The Universal Declaration of Human Rights (Gauthier, 2008), is founded on universal human values and enumerates universal human rights. This international document has a strong moral basis and has become embedded in laws and ethics codes of a large number of countries and professions. The Universal Declaration of Ethical Principles for Psychologists has been adopted by the International Union of Psychological Science (IUPsyS) and the International Association of Applied Psychology (IAAP) in 2008 (Gauthier, 2008). The Universal Declaration is the first document which has been approved internationally, by relevant organizations, stating a set of general principles for the profession of psychology. The Universal Declaration consists of a preamble, which is followed by four principles, each developed into a number of 5 to 7 underlying values. The four broad principles are: Respect for the Dignity of Persons and Peoples, Competent Caring for the Well-Being of Persons and Peoples, Integrity and Professional and Scientific Responsibilities to Society. The objective of the Universal Declaration is not to provide a cross-frontier code of ethics, but to provide a moral framework and generic set of ethical principles for psychology organizations worldwide (p. 1). In doing so, the Universal Declaration builds upon ethical principles that are based on shared human values, describing principles and values that general and aspirational rather than specific and prescriptive (p. 1). In view of this situation, where there are few if any laws regarding tests and none are binding internationally, where judgment is rather moral than legal and where implicit and universal moral values are captured in relevant international documents, we believe that for a test or testing procedure to be considered ethic, in a broad sense, it should abide by the principles of the Universal Declaration of Ethical Principles for Psychologists.

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

2/13

10/28/12

International Test Commission Publications orta

What makes a test ethical ?


Being ethical is a characteristic of a behavior and not of a product of behavior. As such, it is inappropriate to discuss a test as ethical or unethical. Instead, this attribute may only be applied to the way a test is used. Even a test with the most peculiar characteristics, for example a test which does not cover the intended domain correctly or has too large of an error of measurement could be used in an ethical manner. For example, a new test, which still lacks extensive proof of validity, may be used in an ethical manner if testee is informed in advance of its experimental nature and if the test is only used for low-stake decisions, as an icebreaker, or in conjunction with other assessments. Therefore, the discussion should be about the ethics of testing or, in a broader sense, the ethics of psychological or educational assessment. Another important point to be made is that a practice or procedure is only ethical or unethical if defined as such by the code of ethics one professional abides (Leach & Oakland, 2007; vreeide, 2008). This specification is important in the context of internationalization and globalization of psychological practice, in which we often are tempted to force our own values and our own ethical points of view upon other practitioners, from other areas of the world, where those values do not apply in the same manner. As such, a specific behavior may be labeled as unethical by a certain code of ethics and still be ethical in the limits of another one. Ethic codes are an expression of underlying values. We would all like to believe in universal values, and subsequently in universal ethical principles, but so far it has been difficult to sum up ethical principles in universal, cross-national codes. Even though, as noted, several cross-national initiatives have been attempted and have had some impact in the international scientific community, it is the national laws and the codes of ethics of national organizations that most test users abide after. It would be wishful thinking to assume that we wont find disagreements between codes of ethics stemming from different sources. We should not conclude that one couldn't act ethically unless at least one code of ethics supports that specific behavior. Values or reified constructions of what is good or desirable dictate our evaluative reasoning regarding to ethics as much as formal documents, like codes of ethics or laws. Still, a behavior should only be deemed as ethical or unethical with regard to the specific norm the psychologist adheres to, and not based on our own construction of what is correct. As a result, it is virtually impossible to discuss in this paper how ethical or unethical a certain behavior or procedure is, as the legitimate question would arise: based on what code? Especially in a document such as this, published under the patronage of the International Test Commission, the need to cover cross-national practice will take us to a continuous reference to the category of good practice. We will discuss two main classes of characteristics which should be considered when discussing how good the practice of a testing procedure is: (a) adherence to the general principles of the scientific method and (b) characteristics of the interaction process between the testing professional and the other stakeholders involved in the testing process.

(a). Adherence of the testing process to the general principles of the scientific method Testing is conducted following the scientific method. Broadly speaking, testing is carried out with the explicit purpose of generating scientific data for decision makers. Because testing uses the scientific method, professionals in this area, be they psychologists, educators, or other professionals, are called upon in order to apply the method to the best of their abilities. According to the generally accepted principles of the scientific method, this means that testing should be used in order to generate the needed information in an objective, reliable and valid manner. Any usage of a test that does not adhere to this principle could not be labeled as good practice.

Objectivit Objectivity is the main principle of the scientific method. As an expression of the scientific method, psychological and educational testing should be as objective as possible. In testing, objectivity refers to inter-user consistency in the execution, scoring and interpretation of standardized assessment procedures (Westhoff & Kluck, 2008, p. 68). Of course, complete objectivity is never possible and certain facts which are accepted at a certain moment in time as correct, true or objective in a scientific sense, are often overthrown by other, new, scientific discoveries. This is a characteristic of scientific reasoning, which is consensual by nature and reflects the shared understanding of the scientific community at a certain point in time, rather than truth in an absolute sense. The principle of objectivity translates in the domain of tests and testing in three ways. First, we will consider a test as being objective if the procedures for administration, scoring and interpretation are standardized and constant across time, users and test takers (Kline, 1993). All test users should administer, score and interpret the test in the same way and all test takers, indifferent of their characteristics or of the moment in time the test is administered to them, should have the same opportunity to perform. While administration and scoring may be easily standardized in order to be considered objective, interpretation always calls for the professional judgment of the testing professional and as such inherently brings into equation subjectivity. The need to interpret test data in an objective manner is acknowledged by one of the values of the second principle of the Universal Declaration of Ethical Principles for Psychologists, (f) self-knowledge regarding how their own values, attitudes, experiences,

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

3/13

10/28/12

International Test Commission Publications orta


and social contexts influence their actions, interpretations, choices, and recommendations. Second, objectivity refers to measurement without error (Anastasi, 1997). Scientific knowledge develops constantly in its attempt to minimize error in the human understanding of phenomena. Thus, the principle of objectivity translates into ethics as a dedication to the latest state of scientific knowledge. This reflects into one of the values of the second principle of the Universal Declaration of Ethical Principles for Psychologists, (e) developing and maintaining competence. Specifically, for tests users this prescribes the need to critically analyze the construction, administration, scoring and interpretation of tests, in order to ascertain if they are in accord with the latest scientific developments in the field. The state of scientific knowledge has advanced radically in the last years in the area of test construction. New technologies have emerged and it is nowadays often inappropriate to use a test in the construction of which these new technologies have not been employed. We now have new ways of approaching the design of a test, as well as sophisticated statistical procedures, like Structural Equation Modeling, Item Response Theory, Differential Item Functioning and others. These and other procedures are an assurance a test author may give to the users of his/her test, as they could contribute to proving the test user that the test he/she uses is up to the latest scientific knowledge. We may consider a procedure, be it a newly developed or an older one, as not representing good practice to the extent to which it cannot live up to criteria imposed by current scientific understanding. Subsequently, good practice in test usage will focus the preference of test users: - towards newer procedures; - towards using well-documented procedures; - towards procedures (be they old or new) which have proven to live up to the latest developments in science, by empirical evidence; - towards using tests with new or updated norms. Third, objectivity is not simply a characteristic of a test, but also of the situation. A test may be objective for a given situation or for a given population and not for another; something called differential functioning. Ethical test usage calls for an evaluation of the behavior of a test across situations and across populations. Sometimes tests behave differently in different contexts or for different populations and there should be empirical evidence or reasonable theoretical backing in order to state clearly what the psychometric features of a test are for the target population it is used in. Good practice (i.e. ethical behavior) in this respect will never just assume that a test performs well on a specific population or in a specific situation, but will rather ask for scientific evidence that this is indeed so.

Reliabilit Reliability is important for the topic of ethical testing, because reliability describes the error associated with the measurement. Decisions based on test scores should only be taken with a careful consideration of the error associated with the measurement of those test scores. While recent publications have started a modern debate on reliability (e.g. Thompson & Vacha-Haase, 2000; Dimitrov, 2002; Fan & Thompson, 2001), we will address this construct here as outlined by AERA, APA & NCME (1999): Reliability refers to the consistency of [...] measurements when the testing procedure is repeated on a population of individuals or groups (p. 25). Reliability poses at least two ethical questions. These are related to the dichotomy of high vs. low-stake decisions and to the different types of reliability. High vs. low-stake decisions. The first question related to reliability is a fundamental one: how much should we rely on the results of the test? Again, the discussion is not one of relying or not relying, but of how much to rely. The degree of reliance on the test result describes the limits of its ethical usage. If we may not rely heavily on test results, then they are useless as a basis for decision. Reaching decisions or feeding decision makers information based on unreliable data is unethical behavior. Naturally, as our discussion underlines the degree of reliance, the following question arises: how much is acceptable? Where should we draw the line and view a specific test result, based on its low reliability, as being unethically used in a decision? Scientific consensus sets certain limits on reliability and describes the types of decision that are possible to be reached in a certain span of reliability. While a satisfactory level of reliability depends on how the measure is being used (Nunnally & Bernstein, 1994, p. 264), the generally recommended limits are .70 and .90 (Nunnally & Bernstein, 1994, p. 265). High stake decision should never be reached if the reliability of the procedure used as a basis for decision-making falls below the .90 level. Low stake decision may be reached with scores which have reliability below the .90, but not below the .70 level. High vs. low stake decisions refers not only to the impact of the decision, but also to the scope of the decision: highly reliable tests are needed to sort individuals into many different categories based upon relatively small individual differences (e.g. intelligence), while lower reliability tests are sufficient if the tests are used to sort people into a smaller number of groups, based on rough individual differences. Procedures with a reliability placed below the .70 level should be used only with the utmost care. The segmentation of decisions into high vs. low-stake places again emphasis on the situational aspects of ethical behavior: it is only unethical to use a test for a decision for which it is not qualified by its reliability. Still, in spite of these rather clear guidelines, some situations require professional ethical judgment on behalf of the professional. In many settings, for example in some I/O settings, tests represent pass/no pass hurdles. In these situations, the discussion around reliability gains a supplementary significance, and the professional using the test should probably try to

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

4/13

10/28/12

International Test Commission Publications orta


include in his/her decision not only the criterion of reliability, but also the validity of the test, and the possible costs for the company when selecting false positives. Also, the test user could inform decision makers on this dilemma and build together with them an ethical approach to this business case. The basic rule of this approach should be the principle that the usage in itself of a sub-optimal test is not unethical, if the limitations are known and acknowledged in advance by decision makers. Standard Error of Measurement. The reason for the high emphasis placed on reliability as a psychometric feature of a test is not reliability in itself, but the ability of this characteristic to predict the distribution of the errors associated with the measurement, for the specific test (Dudek, 1979). The concept used in this respect is Standard Error of Measurement (SEM). SEM estimates how repeated measurements of the same person on the same test are distributed around his or her true score. The true score cannot be measured directly, because it is impossible to construct a test which is completely error-free. From a statistical point of view, SEM is defined as the standard deviation of errors of measurement that are associated with test scores from a particular group of examinees (Harvill, 1991, p. 1). From a logical point of view, SEM is directly related to the reliability of a test. The more reliable a test is, the smaller its SEM, i.e. the less error and the more precision is associated with the measurement. The acknowledgment of imperfect measurement and the existence of SEM poise a difficult problem for test users, conceptualized by the impossibility to look upon a test score as a true score. Instead, a measurement provides the test user with a range, not with a score: the range where the true score is placed, with a certain probability. The obligation to operate with ranges of scores and not with scores brings the discussion into the field of ethics. Most often, decision makers need clear and unequivocal information, which has to form the basis of their decision. Operating with ranges of scores makes for example the comparison of two scores difficult, especially when they are close to one another. Differentiation between different scores, stemming from different test taker is thus jeopardized. But failure on behalf of the testing specialist to recognize the need of operating based on test scores and SEM will amount to unwarranted decisions. Good practice will take the Standard Error of Measurement into account when communicating test scores or when reaching decisions. The problem is further complicated by the fact that decision makers are usually not comfortable with using interval scores. Most are not trained in understanding confidence intervals and will discard this information, focusing on the obtained score. Ethical behavior and good practice on behalf of the testing specialist will take into account this possibility. The specialist will take special precautions when reporting test data, will include SEM into reports in such a way as to make it impossible for this information to be discarded by decision-makers not trained in psychometric theory, and will warn decision makers of the problems associated with the use of obtained scores. What type of reliability? The second ethical question posed by reliability addresses the type of reliability employed. Reliability of scores on a test is assessed through two fundamentally different approaches. One approach considers that a test is more reliable if there is a higher correspondence between different parts of the test. This is called internal consistency and is measured through such procedures as the Cronbachs coefficient alpha, split-half correlation, or the correlation of parallel forms of the test. The other approach states that a test is more reliable if there is a higher correspondence between the results of the tests obtained at different moments in time. This is called test-retest reliability and is a correlation between the two administrations of the test. This problem is more subtle insofar as many test specialists use the two types of reliability interchangeable, even though they are clearly not so. Tests are always employed in order to give answers to specific questions. The type of reliability that should be taken into account for a specific question is dictated by the theory behind the analyzed construct. If the theory addresses a concept which should be relatively stable in time, then test-retest reliability should be measured. If the theory states that the parts of the test (i.e. items, subtests) should measure the concept in a similar way, internal consistency should be measured. The underlying theory tells us thus what kind of reliability should be measured, and there are some theories require the use of both types of reliability. Reliability is relevant to ethics as failure to consider the correct type of reliability will most probably stem from a lack of a documented application of the test. The test in itself will not be better or worse through this, but the conclusions and decisions will possibly be based on a reasoning which will not be consistent with the intended use of the test, as prescribed by the underlying theory. Therefore, good practice will take the correct type of reliability into account, adapted to the testing situation and the target construct.

Validit The actual scientific understanding defines validity as a complex and integrated corpus of scientific knowledge and demonstrations, which examines the psychological variables measured by a test. Validity refers thus to the degree to which evidence and theory supports the interpretation of test scores (AERA, APA, NCME, 1999, p. 9). The knowledge and the demonstrations are rarely collected in a single place and in a coherent manner, but most often are presented in various formats, in various places. Examining the validity of a test requires an active search and an attentive examination of the pieces of knowledge related to the test. Validity is the most fundamental consideration in developing and evaluating tests (AERA, APA, NCME, 1999, p. 9). Validity tells us what a test measures and allows us to interpret the results of the test, to formulate descriptive conclusions and predictions based on test scores. The first way validity is related to ethics is through the way a test is selected for usage. Test users have an obligation to only use tests which have been sufficiently validated for the intended purpose of the testing and the intended target population.

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

5/13

10/28/12

International Test Commission Publications orta


There are no tests which are valid for all situations and all populations. Validity is very much a situational aspect. As noted in Anastasi (1997), validity has a bearing not on the test itself, but on the interpretations of the test. Interpretations are very situational and integrate along with test scores the objective of the assessment, personal, cultural and situational variables. Tests should only be used if there is enough evidence supporting the benefits of using the test in relation to these other variables. Second, test users should only base their opinions and recommendations in assessment reports on data which offer sufficient validity in order to support these opinions. This has a bearing on the interpretability of test scores: aside from the obvious descriptive step, test users often tend to make predictive judgments. They use test scores to predict future behaviors of the test talker. These predictions should be based not only on logical or theoretical assumptions, but also on empirical evidence (such as predictive validation studies). At most times, this also means that the test is by itself not sufficient to warrant a valid decision. Using test data for decisions is partially independent of the test. The validity of decisions based on test data is thus different from the validity of the test. Valid decisions need the test user to take into account other relevant sources of information and to integrate these with the test scores. Especially in high stake contexts (AERA, 2000), we will consider good practice when test data is integrated with data from other sources, in order to reach valid decisions. Still, as ethical judgment will focus on the result and not on the automatic application of any principle, integration of test data with data from other sources should be considered carefully. There are times when data deducted from other sources is less valid than the data resulted from tests. In the integration of data, proper consideration should thus be given to the weighting of data.

Conclusions on the adherence of the testing process to the general principles of the scientific method There are no perfect tests. Objectivity, reliability and validity are not switches with only on or off states. Instead, there are many in-betweens from white to black. The psychometric characteristics of tests are, the same as many other characteristics, distributed normally across the population of tests. It is not unethical to use a test which is placed at the average or even under the average of this distribution, if the test user understands the limitations of the respective test, if the usage is done with a clear and complete understanding of the dangers, and the caveats are accepted and taken into account by the specialist using the test and if these drawbacks are communicated in a transparent way to the client. After all, a test is always employed to answer a specific question and one of the main criteria for choosing a specific test in a specific situation is the cost-benefit ratio for the particular clients question. It is, however, bad practice for a test user to ignore the need to document on the procedure, as is the decision to ignore shortcomings, or to use tests with shortcomings without communicating those to his/her client. Professional decisions and circumstances often allow for the usage of a test which is less than perfect for the intended purpose. This in itself is not unethical behavior, with the condition that the testing specialist understands the caveats and that he/she explains the drawbacks and cautions to his client. Virtually all the principles and all the characteristics of ethical behavior or good practice discussed here are a mix between common sense and a high level of professional judgment. In order to follow ethical guidelines, a testing specialist not only has to be aware of ethical practices in the respective area, but he or she also has to have a high level of professional understanding, in order to be able to evaluate the technical implications behind his/her decisions.

(b). Characteristics of the interaction process between the testing professional and the other stakeholders involved in the testing process Testing is an interactional process between the testing professional, the test taker and the client of the testing process. The client of the test user (i.e. the decision maker) is sometimes the tested person himself or herself. However, at other times, as in the case of testing in the field of I/O, forensic, educational or clinical psychology, the test taker is different from the decision maker. The testing professional has ethical responsibilities towards both categories of stakeholders. Fairness From this interactional point of view, the concepts of ethics and fairness are often interchangeable. Even though fairness is much closer to the everyday language and thus closer to the test taker, the concept of fairness is used in many different ways. For example, even though it discusses four different meanings of fairness in testing (fairness as lack of bias, fairness as equitable treatment in the testing process, fairness as equality in outcomes of testing and fairness as an opportunity to learn), the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999, p. 80) state that consensus on what is and what is not fair has not been achieved in the professional community and even less so in larger society. Ultimately, fairness is a mental construction of the test taker and the relevant community and as such it is subject to many influences. As all other mental constructions of the test taker, the construction of fairness may be influenced by the test user through the way he/she communicates with relation to the test itself and to the testing procedure. Procedural justice Even when testing is perceived by the test taker as being objective, reliable and valid (and thus scientifically correct), the perception of the test taker regarding the control he/she has upon the outcome of the test may vary widely. The assumption that outcomes drive the evaluation of a certain event is pervasive in the social sciences (Lind & Tyler, 1988); according to this assumption, people judge their social experiences in terms of the outcomes they receive. Attitudes towards tests and testing could thus be explained by these outcome-based judgments (Ambrose & Rosse, 2003). Contrary to this belief, process based models assume that the psychological construction (perception) of an event is not only driven by outcome, but also by the process itself. The main postulate of these models is that people not only care of the allocations, but

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

6/13

10/28/12

International Test Commission Publications orta


they also care how allocations are made. Process based models are relatively new (Thibaut & Walker, 1975). Also, there is some tension between outcome-based and process-based models (Lind & Tyler, 1988). The concept of procedural justice is central to research conducted in the relatively new process-based tradition. This tradition differentiates between objective and subjective procedural justice. Objective procedural justice focuses on the characteristic of a procedure to conform to normative standards of justice (Kaplan, 1986; Kassin & Wrightsman, 1985). Objective procedural justice is enhanced by reducing clearly unacceptable bias or prejudice, and psychological testing has longed moved away from clearly unacceptable practices. In order to maximize objective procedural justice we should focus on what is seen in normative documents (like, for example, the Standards for Educational and Psychological Testing) as rights of test-takers, among them the right to have the procedure explained, the right to be tested in ones own language etc. Subjective procedural justice concerns the capacity of a procedure to enhance the fairness judgments of those who encounter the procedures. This is the area we will focus on in the following discussion. Generally, procedural justice discusses the judgments of people that procedures and social processes are just and fair (Lind & Tyler, 1988). Specifically, in psychological and educational testing, procedural justice discusses the judgments of test takers regarding how just and fair a certain procedure they are subjected to, is (Konovsky, 2000). The questions which arise in this context are multiple and apply to the whole process of testing: how a test is chosen, how a test is administered, how the test is scored and interpreted. Subjective procedural justice is enhanced by promoting communication and transparence in the testing process and by allowing the test taker to get involved and take some responsibility in the different phases of the testing process (Ambrose & Rosse, 2003): choosing the test, administering the test, scoring the test, interpreting the test. All these behaviors are to be considered by us as good practice. More questions arise in this respect, specifically for every phase of the testing process. Should test takers have a saying in the decision to choose a certain test over another? Clearly, most test takers are not qualified to make this decision, as they have neither knowledge of the possible options for test usage in a testing context, nor the necessary technical expertise to reach a valid decision. The decision to use a certain test over another is taken by the testing specialist, based on the objective of the assessment, after which he/she usually narrows down a larger list of possible test options. Many times the final decision to choose a test over another from a short list is made in an arbitrary manner. Involvement of the test taker in at least this final decision will sometimes not damage the assessment process, but boost the subjective feeling of procedural justice he/she will experience. Certain testing settings are more amenable to the application of this principle then others. For example, test taker involvement can be done easier in an educational setting, when the goal is the measurement of vocational interests, but it can be extremely difficult or even unfeasible in a high stakes selection context. Should test takers have a saying in the time or mode of administration? At many times, there is no possibility for the test user to accommodate for the schedules of every test taker. In I/O or educational contexts, when testing is sometimes a large-scale and carefully scheduled procedure, accommodating the wishes of test takers will be impossible. At other times, flexibility is possible and is an important signal of cooperation. In most testing procedures there is at least a possibility of allowing the test taker a decision between alternative options for the time of testing. The test user should allow for as much liberty as possible, without damaging the assessment process, on behalf of the test taker. Also, the current state of technology has enabled many test authors and test editors to offer more than one standard way of administering a procedure. Many tests may be administered with the same results, in a standardized manner, in different ways, for example paper-and-pencil or computer and Internet-based. Even though sometimes the test user may have a personal option he/she would like to use, if there is consistent evidence that the mode of administration has no bearing on the results of the testing process, allowing the test taker to choose between paper-and-pencil vs. electronic testing will enhance the subjective feeling of procedural justice that the test taker will develop. Should test takers be allowed to ask for a retest or re-scoring? It used to be considered a proof of good practice and cooperation, especially in educational testing (AERA, 2000) to allow test takers to ask for re-scoring. Information technology has made scoring errors very unlikely especially when the test is scored electronically. Also, hand scoring forms have become more and more sophisticated and have for many tests embedded ways of easily checking if the scoring has been done correctly. However, even though scoring errors have become more and more unlikely, test users inquiries for re-scoring should be accommodated. At least in psychological testing (for example in personality assessment), the feedback of the test taker that he/she does not acknowledge the result of the test as being descriptive of his/her person should be reason enough for the test user to check the scoring. Sometimes test takers ask for a retest, i.e. for the repeating of the whole testing process. Certain domains, like cognitive ability testing or knowledge tests, to only name a few, involve learning as an important variable in test performance and retesting would clearly go against reasonable accommodation. Other domains, like personality testing, would most of the time allow for retesting. Such a concession on behalf of the test user would certainly heighten the test takers subjective feeling of procedural justice. Should test takers be allowed to get involved in the interpretation of the test? For most testing areas, test takers do not have the technical background required to play an active part in the interpretation of test results. It is, nevertheless, considered good practice to involve the test taker in the interpretation of the results in such a way as to reach a common understanding between test user and test taker on the meaning of test scores. The process of feedback to the test takers has as one of its main targets the construction of such a shared understanding. Subsequently, test takers are and should be involved as secondary parts, with a rather passive role, in the construction of meaning from test scores. Sharing of results Test users have a responsibility to the stakeholders in the testing process. One of them is the test taker him/herself. Sometimes, the test taker is the only stakeholder. Most times, there are several stakeholders such as relatives of the test

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

7/13

10/28/12

International Test Commission Publications orta


taker, parents of minors, supervisors or employers in organizational settings, attending doctors, psychiatrists, teachers and other professionals in educational settings, representatives of law enforcement, correctional or court personnel etc. Whom to share the results of a test with is sometimes a difficult question, which clearly stands in the area of ethics. Different codes of ethics approach this aspect in different ways. Some do not address it at all, but there is a significant difference in approach even between those codes which address this issue. For example, Leach & Oakland (2007) have shown that clients have greater access to test data in the U.S. than in South Africa. Even in the U.S., different states rule in different manners on this point, and even in the same state the same court may rule differently on different occasions, as shown by the decisions of the California Supreme Court on Tarasoff v. Board Of Regents (Leach & Oakland, 2009). In this respect, as vreeide (2008) shows, the connection between ethical and legal systems is important and it is much more likely the latter will be followed in issues related to the sharing of results. The right to have the results communicated to him/her is a fundamental right of the test taker. However, sometimes, the right of the test taker to be informed acts against the very purpose of testing. This may happen for example in the fields of forensic psychology, clinical psychology and (in some cases) I/O psychology. What other stakeholders to share results with is a difficult decision to make. Even more difficult is it to make the decision which information, with what depth, to share with whom. Psychologists adhere from this point of view to diverging principles. On one hand, psychologists adhere to the principle of Respect for the Dignity of Persons and Peoples (Principle I of the Universal Declaration of Ethical Principles for Psychologists), with explicit values like e) privacy for individuals, families, groups, and communities and f) protection of confidentiality of personal information, as culturally defined and relevant for individuals, families, groups, and communities. On the other hand, psychologists adhere to the principle of Competent caring for the well-being of persons and peoples, manifested in values like a) active concern for the well-being of individuals, families, groups, and communities; b) taking care to do no harm to individuals, families, groups, and communities; c) maximizing benefits and minimizing potential harm to individuals, families, groups, and communities. As we see, the values and principles of good practice urge the psychologist to respect privacy, while at the same time suggesting and enforcing the need of the test user to share results with other stakeholders aside from the test taker, when good is done by this, or for the benefit of minimizing harm to the test taker or to others. This could be the case for example by warning victims of possible harm. Discussions regarding the duty to protect are complex, and include different aspects of risk and protection management, as shown by Werth, Welfel & Benjamin (2009) and Leach (2009). One of the possible solutions to this problem is informed consent, which will be discussed in a later section. Thus, aside from the situations prescribed by the law, which differ from country to country, sharing of data should only be done according to the wishes of the test taker and with the explicit consent of the test taker. The procedure for collecting informed consent includes enumeration of the uses the test data will have and of the persons these data will be shared with. It is therefore a case of good practice to share test data only with the people the test taker agreed to share with, prior to the testing. Asking the test taker for consent on the sharing of data after the testing is not as unintrusive, as a consent elicited in this way might be considered as forced on the test taker (though in a subtle manner) and might involve mechanisms aside from the real wish of the test taker to share. Communication and reporting of results As noted above, the right to have the results communicated to him/her is a fundamental right of the test taker. This right is explicitly stated as such by many codes of ethics and is suggested, though not explicitly stated, by the Universal Declaration of Ethical Principles for Psychologists. However, standards are unclear regarding whom to communicate, what to communicate and in what manner to communicate test results. The obligation of the test user to communicate test results is set by the provision of adherence to the value of a) honesty, and truthful, open and accurate communications. Post-testing feedback to the test taker, on the results of the testing process and their meaning is considered good practice (Pope, 1992). Feedback to the test-taker encourages mutual communication and not only has an important role for the test-taker in clarifying the meaning of the results, but also an important role for the test user, for validating the interpretation he/she came to (Aiken & Groth-Marnat, 2005). However, sometimes situations arise when the discussion of test results with the test taker may seem inopportune. In clinical settings, as well as in forensic and correctional settings, disclosure of test results to the test taker is a decision the test user has to balance between his obligation to the test taker him/herself and to the client, who is often times someone else other than the test taker. Also, there are situations when complete disclosure may seem dangerous or inappropriate. At least two of the values in the Universal Declaration of Ethical Principles for Psychologists apply to this difficult situation. First, psychologists abide by the value of g) respect for the ability of individuals, families, groups, and communities to make decisions for themselves and to care for themselves and each other. In light of this value, psychologists should prefer disclosure of data to the test taker, trusting his/her ability to understand the results and respecting his/her right of making his/her own decision regarding the results of the testing process. Also, psychologists accept the values of b) avoiding incomplete disclosure of information unless complete disclosure is culturally inappropriate, or violates confidentiality, or carries the potential to do serious harm to individuals, families, groups, or communities. Disclosure of data should thus be complete. Incomplete disclosure should be avoided. Complete disclosure will be avoided in all those situations when the receiver of feedback is not the test taker him/herself and complete disclosure will violate confidentiality. Also, complete disclosure to the test taker him/herself will be avoided when there is potential that the disclosure will do harm to the test taker. A free-floating discussion in light of these values gravitates around the adjective complete, as applied to disclosure.

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

8/13

10/28/12
interpreted, verbatim information?

International Test Commission Publications orta


Should all test data be released? Should this include item answers, raw scale scores, un-interpreted (visual) reports, or only

Some of the concerns regarding the release of low-level test data, such as item answers, is closely connected to the problem of test security, which will be discussed in a later section of this paper. For example, the APA Statement on the Disclosure of Test Data (http://www.apa.org/science/disclosu.html) specifically points to this concern. Still, on the other hand, under the American Psychological Association Ethics Code (2002), documents containing the responses of test takers are ordinarily subject to disclosure, considering that the virtues of secrecy regarding testing are often exaggerated and its vices underestimated (Erard, 2004). Release of un-interpreted data, like raw scores, or visual reports, is coupled with concerns regarding user qualification and the technical ability of the test taker to understand these un-interpreted data. In this regard, test takers are not technically competent to understand raw data, and the limitations of these data, making it thus possible for the results to be misinterpreted and misused, and to lead to misguided decisions with harmful effects. In light of the value of c) maximizing impartiality and minimizing biases, held by the Universal Declaration of Ethical Principles for Psychologists, test users should only disclose data which may be understood by the test taker and make all efforts to ensure that the data has been correctly understood, including its limitations. The situation is further complicated by the fact that, as we well know, disclosure of all the data for an individual is most of the time pointless, without revealing the mean, standard deviation or other characteristics of the distribution of scores for the intended reference group. Also, there are different methods of using test scores (e.g., top down; multiple hurdle; banding) and data from a single individual mean nothing without the knowledge regarding the underlying algorithms and cut-off scores used in the decision. Should these data be shared? Is it even realistic to expect the test taker to understand and use in a coherent manner this kind of information? It is our view that data of this depth is an inherent part of the testing procedure and should as such be protected by the principle of test security, which is discussed in a later section. In cases like this, test users should question the usability of any such information for the testee and other stakeholders. For example, sharing with a test-taker the fact that he/she failed to pass a test by 1-point could make him/her upset, angry, and maybe eager to sue. In some settings, like for example in educational testing where no decisions are being made, it is perhaps worthwhile to share this information, while in other settings, like a hiring context, this would be helpful neither to the company nor to the testee. Subsequently, the depth of information to share should be carefully considered and ballanced with respect of usefulness and fairness for all stakeholders involved. The obligation to communicate test results to the test taker is treated by many test users in a mechanical manner, which abuses the underlying principle stated above and cannot be labeled as good practice. Anastasi (1997, p. 543) notes that psychologists have to approach communication of the test results in a form that will be meaningful and useful to the recipient. Pope, Tabachnick & Keith-Spiegel (1987) urge that feedback to the test taker and the communication of results should be approached in a professional manner. Hood & Johnson (1997) set the frame for a professional approach in feedback by explaining that feedback should be approached in conjunction with the intended purpose of testing and in light to the very specific questions that have been raised and for the solving of which the testing was employed in the first place. Pope (1992) discusses 10 fundamental aspects of the feedback process, which would qualify communication of test results as a good practice, among them the framing of the feedback and acknowledging fallibility. The framing of the feedback will have serious implications on the way the data and implications are received and integrated, being influenced not only by evident variables, like order of presentation or language, but also by the tone of voice and other subtle mechanisms. Acknowledging fallibility is crucial by making the test taker aware of the limitations of the data and of the potential sources of bias or error in the results. In accord to the APA ethical principles (1992), we will consider it a case of best practice for psychologists to indicate any reservations they have regarding or any limitations they see in the result presented.

Informed concent The testing process may be envisioned as a complex consulting process which requires communication on the delivery end, and also on the inception end. Clear and straightforward communication of the scope and goals of the testing process will straighten out expectations for both parties involved and will set the stage for a correct relationship. This is even more important in testing and assessment than in other psychological work as testing could involve an invasion of privacy (Anastasi, 1997). Psychological testing more so than educational testing runs this risk, because psychological testing does not require only performance but also self-disclosure, and self-disclosure is an invasion of privacy. The test user should consent or refuse the specific invasion of privacy that is implied by the test administered. Such an approach as a basis for testing will also have positive implications upon the outcome of the testing, as it will minimize presentation bias and other forms of faking (Aiken & Groth-Marnat, 2005). Informed consent is formulated as an underlying value of psychological work and thus also of psychological testing by the Universal Declaration of Ethical Principles for Psychologists, which as part of Principle I, Respect for the Dignity of Persons and Peoples, states that psychologists follow the value of d) free and informed consent, as culturally defined and relevant for individuals, families, groups, and communities. Informed consent is not only a legal provision indeed, in some codes of ethics it is not mentioned and some psychologists are not even aware of it (Iliescu, 2008). Still, we consider it a good practice for psychologists to respect the right of clients to have full explanations of the nature and purpose of the techniques in language the client can understand (APA, 1992). In certain testing settings, informed consent should be viewed in a narrower manner, as the consent part is forcibly limited.

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

9/13

10/28/12

International Test Commission Publications orta


For example, in the I/O arena, refusing consent to take an employment test would usually be grounds for rejection in most countries. Or, in the educational arena, refusing consent to take a knowledge test is of course grounds for failing the exam. Still, even in these settings and others, where consent is presumed or enforced, the testee has a right to be informed correctly of his/her rights as a test taker and should have the scope and goals of the testing process communicated to him/her. Informed consent is not a mechanical collection of a verbal acknowledgement or of written consent forms. Informed consent not only has a consent part, but also an informed part, which relates to the right of test takers to be informed, prior to testing, about relevant issues related to the testing process and the test itself (APA, 1998). In respect of the value of respect for the ability of individuals, families, groups, and communities to make decisions for themselves and to care for themselves and each other, test takers have a right to understand not only that an assessment is conducted, but also why an assessment is conducted, what procedures or techniques it will require, why these procedures are needed and what the outcome will be. In discussing the outcome of the testing, issues like disclosing of data, sharing of results, and feedback will be approached. Informed consent has the status of a psychological contract between the test taker and the test user. As such, it may easily be given and received in verbal form. APAs The Rights and Responsibilities of Test Takers: Guidelines and Expectations (1998) and other similar documents state that test takers have the right to receive a brief oral or written explanation prior to testing. Albeit brief in coverage, the information procedure which precedes the consent or refusal by the test taker is most often than not rich in information. As such, it is often important to formalize this psychological contract and collect informed consent in written form. Both the information and the consent part are under these circumstances less prone to misinterpretation. We will consider it a case of good practice if the informed consent is collected in written form.

Test securit and user qualifications


Test security is a major ethical concern of psychologists and is included as such in many codes of ethics. Even though the Universal Declaration of Ethical Principles for Psychologists has no direct correspondent for this ethical issue, more of its principles and values would apply to test security. Reasons why test security is of importance from an ethical point of view are multiple and we will briefly discuss some of the important points. As a preamble, though, it should be noted that, even though researchers who are active in the field of ethics note that psychologists and others commonly abuse this requirement (Oakland, 2005), there are very few pieces of research published on practices related to test security. The performance of a test taker on a test may be considered a valid reflection of the target construct only if the test taker has been assessed in a controlled manner. In this case, controlled refers, among others, to having no prior exposure to the items of the test. This is especially important in such testing situations where learning or training on the test items could enhance the performance of the test taker. One of the main reasons why test security is of great concern is that by controlling a test, the professional community ensures the viability of testing with the respective instrument. Restriction of the reproduction and dissemination of psychological materials has thus a professional reason, related to the viability of a measure. Test security also has a strong legal reason. Test authors and test editors are entitled to financial compensation for their work in developing and publishing the test. Issues of copyright and of potential infringement arise from this point of view. Some countries have very strict regulations with respect to copyright, while others ignore this issue altogether. Leach & Oakland (2009) consider that good practice in psychological testing should abide from the point of view of copyright of two international documents which have extensive legitimity and provide a sound background: the Universal Copyright Convention (portal.unesco.org/culture/en/ev.php-url_id=1814&url_do=do_topic&url_section=201.html) and the Berne Convention for the Protection of Literary and Artistic Works (http://en.wikipedia.org/wiki/Berne_Convention_for_the_Protection_of_Literary_and_Artistic_Works). Third, there is a strong reason related to the quality of psychological measures, which relates to test security. The translation or adaptation of tests without the approval of the current holder of copyright, be it the author or the test publisher, is also a violation of international copyright treaties (Leach & Oakland, 2007). Test authors and publishers usually control test adaptation in a careful manner, in order to make sure that the adapted version lives up to the heritage of the original test, i.e. has resonable equivalence with the original (van de Vijver & Poortinga, 1991). Illegal test adaptation is a disturbingly generalized practice across the globe, especially in developing countries (e.g., Iliescu, 2008). The advent of the Internet and related technologies has put psychologists in face of new problems to these old issues pertaining of test security. Information presented over the Internet is much more volatile, which enhances the risks. But the potential advantages of this pervasive technology should not be discarded easily. Responsible professional associations have prepared statements and guidelines for the use of professionals in the interaction with these issues (e.g. British Psychological Society Psychological Testing Centre, 2002; Naglieri et al., 2004). Of international importance is in this respect the initiative of the International Test Commission, which published its Guidelines for Computer-based and Internet delivered testing (Bartram & Coyne, 2005). This document addresses, in addition to technology, issues like quality, control and security, all of which have clearly as a fundament ethical concerns. The Internet has generated new challenges, which are technological in nature. For example, there is strong expectation for test publishers and other organizations which maintain testing sites to support thorough security, user access and data protection on their websites. Client-side, good practices in testing would require prevention of unauthorized copying of testing content by the unproctored test taker (Bartram, 1999). But aside from these technological challenges, testing over the internet does not really raise new issues, but only places old issues in new containers (Naglieri et al., 2004). The Internet has not only generated new problems regarding the testing process, but also potential pitfalls in face of globalization and increased ease of commercial transactions. Gregoire & Oakland (2008), for example, have recognized the danger posed by free and unauthorized transaction of protected materials over the Internet and urged eBay and other

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

10/13

10/28/12
protected materials.

International Test Commission Publications orta


companies to establish and maintain standards that prevent the unauthorized sale of tests and other professionally The formulation of the subject of our current discussion, test security, calls for a continuation. One secures a test against something. Tests are secured against unqualified users. This is why test security also raises the issue of qualified and competent usage of tests. We will not discuss test user qualification in detail. It is however important to state that not all countries have clear rules and regulations related to test user qualifications. A clear regulation from this point of view should not only prescribe levels of qualification, but also define specific competencies (i.e. knowledge and skills) for every level of qualification. We consider APAs (2000) Report of the Task Force on Test User Qualifications an important document which could guide the development of national guidelines and which could define the limits of good practice in this domain. Another important document is the EFPA (European Federation of Psychologists Associations)-EAWOP (European Association of Work and Organizational Psychologists) joint initiative entitled European Test User Standards for test use in Work and Organizational settings (2005). Though only limited to test usage in the domain of I/O Psychology, this initiative is of cross-national impact. In light of the above, we will consider it a case of good practice when user qualifications are treated in a transparent and structured manner, with a clear and theoretically founded rationale of the breakdown in levels of qualifications and a clear description of knowledge and skills for every level. As part of the discussion regarding qualified usage of test, two special classes of potential test users attract attention. One is the case of students and others who study to become specialists in testing, but are not yet qualified to use tests. The situation is insofar difficult, as these people ha e to use tests in order to become proficient, yet on the other hand are not yet qualified to do so. The only dedicated document covering this situation, as to the best knowledge of the authors, is the APA Committee on Psychological Tests and Assessment (1994) Statement on the Use of Secure Psychological Tests in the Education of Graduate and Undergraduate Psychology Students. This short document covers best cases guidelines regarding four main areas: security of test materials, testing demonstrations, teaching students to administer and score tests and using tests in research. Another interesting discussion in this respect regards the usage of tests by the tests taker themselves. Ethical discussions arise in this respect and cover questions like: Can test takers be trusted to self-administer? Shall they be trusted to selfscore? Are they qualified to do either? Is it possible for them to misinterpret the results? Is it possible for them to understand issues of test security? Will they abstain from training others? Is it ethical for psychologists to design and market tests for the usage of test takers themselves? If yes, under what circumstances? These questions are not hypothetical, as ever more tests are self-administered and self-scorable. We have good examples of successful tests of this category in the area of vocational counseling and personality.

References
AERA (2000). Position Statement on High-Stakes Testing in Pre-K12 Education. Adopted July 2000. AERA, APA, NCME (1999). Standards for educational and psychological testing. Washington, DC: AERA. Aiken, L. R., & Groth-Marnat, G. (2005). Psychological Testing and Assessment (12th Edition). Upper Saddle River, NJ: Allyn & Bacon. Ambrose, M. L., & Rosse, J. G. (2003). Procedural Justice and Personality Testing. Group & Organization Management, 28(4), 502-526. American Psychological Association (2002). Ethical principles of psychologists and code of conduct. American Psychologist, 57, 1060-1073. Barnhart, M. (2002). Introduction. M. Barnhart (Ed.). Varieties of ethical reflection: new directions for ethics in a global context. Lanham, MD: Lexington Books. Bartram, D. (1999) Testing and the Internet: Current realities, issues and future possibilities. Keynote paper for the 1999 Test User Conference. Bartram, D. (2000). International Guidelines for Test Use. Punta Gorda, FL: ITC. Bartram, D., & Coyne, I. (2005). International Guidelines on Computer-Based and Internet-Delivered Testing. Punta Gorda, FL: ITC. Bartram, D., & Hambleton, R. (Eds.) (2006). Computer-based testing and the internet: Issues and advances. New York: John Wiley. British Psychological Society Psychological Testing Centre (2002). Guidelines for the Development and Use of Computer-based Assessments. Leicester: British Psychological Society. Dimitrov, D. M. (2002). Reliability: Arguments for multiple perspectidves and potential problems with generalization across studies. Educational and Psychological Measurement, 62(5), 783-801. Dudek, F. J. (1979). The continuing misinterpretation of the standard error of measurement. Psychological Bulletin, 86(2), 335337. EFPA, EAWOP (2005). European Test User Standards for test use in Work and Organizational settings. [http://www.efpa.eu/download/2d30edd3542f33c91295487b64877964]. Fan, X., & Thompson, B. (2001). Confidence intervals about score reliability coefficients, please: An EPM guidelines editorial. Educational and Psychological Measurement, 61, 517-531. Gauthier, J. (2008). Universal declaration of ethical principles for psychologists. In J. E. Hall, & E.M. Altmaier (Eds.), Global promise: Quality assurance and accountability in professional psychology (pp. 98-105). New York: Oxford University Press. Gregoire, J. & Oakland, T. (2008). On the need to secure psychological test materials. [http://intestcom.org/archive/ebay.php]. Harvill, L. M. (1991). Standard Error of Measurement (A NCME instructional module). Items, 9, 181-189. Hood, A. B., & Johnson, R. W. (1997). Assessment in counseling A guide to the use of psychological assessment procedures (2nd Edition). Alexandria, VA: American Counseling Association.

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

11/13

10/28/12
of Psychology (ICP), Berlin.

International Test Commission Publications orta


Iliescu, D. (2008, July). Romanian psychologists view on ethic test usage. Paper presented at the 29th International Congress Keith-Spiegel, P., & Koocher, G. P. (1995). Ethics in psychology: Professional standards and cases. London: Lawrence Erlbaum. Kline, P. (1993). The handbook of psychological testing. London: Routledge. Konovsky, M. A. (2000). Understanding Procedural Justice and Its Impact on Business Organizations. Journal of Management, 26(3), 489-511. Leach, M. M. (2009). International ethics codes and the duty to protect. In J. Werth, E.L. Welfel, & A. Benjamin (Eds.), The duty to protect: Ethical, legal, and professional considerations in risk assessment and intervention. Washington, DC: American Psychological Association. Leach, M.M., & Oakland, T. (2007). Ethics standards impacting test development and use: A review of 31 ethics codes impacting practices in 35 countries. International Journal of Testing, 7, 71-88. Leach, M.M., & Oakland, T. (2009). Displaying Ethical Behaviors by Psychologists when Standards are Unclear. Manuscript submitted for publication. Lind, E. A. & Tyler, T. R. (1988). The social psychology of procedural justice. London: Springer. Lindsay, G., Koene, C., vreeide, H., & Lang, F. (2008). Ethics For European Psychologists. Gottingen: Hogrefe & Huber. Naglieri, J. A., Drasgow, F., Schmit, M., Handler, L., Prifitera, A., Margolis, A., Velasquez, R. (2004). Psychological Testing on the Internet, New Problems, Old Issues. American Psychologist, 59, 150-162. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Oakland, T. Selected ethical issues relevant to test adaptations. (2005). In Hambleton, R., Merenda, P., & Spielberger, C. (Eds.). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NY: Erlbaum Press. Pope, K. S. & Vetter, V. A. (1992). Ethical dilemmas encountered by members of the American Psychological Association: A national survey. American Psychologist, 47, 397-411. Pope, K. S. (1992). Responsibilities in Providing Psychological Test Feedback to Clients. Psychological Assessment, 4 (3), 268271. Pope, K. S., & Vasquez, M. J. T. (2007). Ethics in Psychotherapy and Counseling (3rd Edition). NY: Jossey-Bass. Pope, K. S., Tabachnick, B. G., & Keith-Spiegel, P. (1987). Ethics of Practice: The Beliefs and Behaviors of Psychologists as Therapists. American Psychologist, 42(11), 993-1006. Roberts, R. C., & Wood, W. J. (2007). Intellectual Virtues: An Essay in Regulative Epistemology. Oxford: Oxford University Press. Rolls, S., & Feltham, R. (1993). Practical and professional issues in computer-based assessment and interpretation. International Review of Professional Issues in Selection, 1, 135-146. Society for Industrial and Organizational Psychology (SIOP) (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author. The APA Committee on Psychological Tests and Assessment (CPTA) (1994). Statement on the Use of Secure Psychological Tests in the Education of Graduate and Undergraduate Psychology Students. [http://www.apa.org/science/securetests.html] The APA Test Taker Rights and Responsibilities Working Group of the Joint Committee on Testing Practices (1998). The Rights and Responsibilities of Test Takers: Guidelines and Expectations. [http://www.apa.org/science/ttrr.html]. Thibaut, J. & Walker, L. (1975). Procedural Justice: A Psychological Analysis. Hillsdale, NJ: Erlbaum. Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test in not reliable. Educational and Psychological Measurement, 60, 174-195. Tippins, N., Beaty, J., Drasgow, F., Gibson, W. Pearlman, K., Segall, D., & Shepherd, W. (2006). Unproctored Internet testing in employment settings. Personnel Psychology, 59(1), 189-225. Tyler, T. R. (1989). The Psychology of Procedural Justice: A Test of the Group Values Model. Journal of Personality and Social Psychology, 57, 330-338. van de Vijver, F. J. R., & Poortinga, Y. H. (1991). Testing across cultures. R. K. Hambleton & J. N. Zaal (Eds.). Advances in Educational and Psychological Testing. Boston: Kluwer Academic Publishers, 277-308. Westhoff, K., & Kluck, M. L. (2008). Psychologische Gutachten [Psychological reports] (5th Edition). Heidelberg: Springer.

Suggested hyperlinks to web pages


http://www.apa.org/science/disclosu.html; APA Statement on the Disclosure of Test Data. http://www.efpa.eu/download/2d30edd3542f33c91295487b64877964; EFPA, EAWOP (2005). European Test User Standards for test use in Work and Organizational settings. http://www.apa.org/science/securetests.html; The APA Committee on Psychological Tests and Assessment (CPTA) (1994). Statement on the Use of Secure Psychological Tests in the Education of Graduate and Undergraduate Psychology Students. http://www.apa.org/science/ttrr.html; The APA Test Taker Rights and Responsibilities Working Group of the Joint Committee on Testing Practices (1998). The Rights and Responsibilities of Test Takers: Guidelines and Expectations. http://www.aera.net/?id=378; AERA (2000). Position Statement on High-Stakes Testing in Pre-K12 Education. Adopted July 2000. http://www.intestcom.org/guidelines/index.php; Guidelines of the ITC (ITC Guidelines on Adapting Tests, ITC Guidelines on Test Use, CBT & Internet Guidelines) http://www.psychtesting.org.uk/downloadfile.cfm?file_uuid=64877B7B-CF1C-D577-971D-425278FA08CC&ext=pdf; British Psychological Society Psychological Testing Centre (2002). Guidelines for the Development and Use of Computer-based Assessments. Leicester: British Psychological Society. http://www.sipsych.org/english/Universal%20Declaration%20as%20ADOPTED%20by%20IUPsyS%20&%20IAAP%20July%202008.pdf; Gauthier, J. (2008). Universal declaration of ethical principles for psychologists. In J. E. Hall, & E.M. Altmaier (Eds.), Global promise: Quality assurance and accountability in professional psychology (pp. 98-105). New York: Oxford University Press.

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

12/13

10/28/12

International Test Commission Publications orta

Questions for classroom discussion


1. Should we follow local or international documents prescribing ethical principles? Which should be followed in what situations and why? 2. Can a test with known reliability problems be used ethically? If yes, under what circumstances?. 3. Which would you choose: a more fair test or a more valid one? Which one of these two principles should have precedence: fairness in testing or validity and the responsibility towards the client to achieve the best possible result? 4. The feedback organizations give to job applicants after the selection process is usually quite vague. Do you see any ethical problems here? 5. In your opinion, has the profession of I/O psychology abandoned the psychological heritage of responsibility and service to society? 6. When the needs and wishes of the client who commissioned the testing project and those of the testee are different, whom should the psychologist follow? Is it possible to accommodate both? 7. Regarding the conflict between adhering to professional standards and yielding to client demands: Should a professional always conform to professional standards? Under what circumstances is it acceptable to lower standards in order to accommodate client demands? Under what circumstances is it not acceptable to lower standards? 8. Under what circumstances do you think that the situation/context will take precedence in governing an action as ethical/unethical, over a set of values? 9. Do you think ethical codes are sufficient or do we need laws to govern the activities of testing professionals?

Author information
Dragos Iliescu (dragos.iliescu@testcentral.ro) holds a PhD in I/O psychology from the Babes-Bolyai University, Cluj-Napoca, Romania. He is Associate Professor at the National School for Political and Administrative Studies (SNSPA) in Bucharest, and Managing Partner of D&D/Testcentral, the major test publisher in Romania. He has worked for 12 years as a consultant in the field of business research, in areas like marketing research, branding and HR, for Romanian and international clients. Dan Ispas (dispas@mail.usf.edu) is a doctoral candidate in I/O psychology at the University of South Florida, Tampa, Florida, USA. He holds a M.A. in I/O psychology from the same university. His research interests include organizational interventions, counteproductive work behaviors, personality and affect in the workplace, and test development and validation. His research was presented at the Annual Conferences of the Society for Industrial and Organizational Psychology and the Academy of Management and was published in Human Resource Management Review, Industrial and Organizational Psychology: Perspectives on Science and Practice, Psihologia Resurselor Umane and The Industrial-Organizational Psychologist. Michael M. Harris (d. 2009) was the Thomas Jefferson Professor of Management in the College of Business Administration at the University of Missouri-St. Louis. He had a Ph.D. in I/O psychology from the University of Illinois-Chicago. Most of this work revolved around selection and hiring practices and compensation systems, with a focus on staffing/selection, compensation, and performance management, both in the domestic context and the international context. Michael published numerous peerreviewed articles in the area of human resource management and edited several books, including the Handbook of Research in International Human Resource Management (Lawrence Erlbaum, 2007). He served as a keynote speaker at the International Test Commission's conference in England, June, 2002, where he made a presentation entitled: "Patrolling the Information Highway: Creating and Maintaining a Safe, Legal, and Fair Environment for Test-takers."

.intestcom.org/Publications/ORTA/Social implications and ethics of testing.php?print=true

13/13

También podría gustarte