Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Table of contents
Social im plications and e thics of te sting Abstract Introduction. Ethics as re fle ction and pe rsonal choice W hat is e thical? R ight and wrong be yond the law W hat m ak e s a te st e thical? (a). Adhe re nce of the te sting proce ss to the ge ne ral principle s of the scie ntific m e thod (b). C haracte ristics of the inte raction proce ss be twe e n the te sting profe ssional and the othe r stak e holde rs involve d in the te sting proce ss Inform e d conce nt Te st se curity and use r qualifications R e fe re nce s Sugge ste d hype rlink s to we b page s Q ue stions for classroom discussion
Abstract
The paper aims at increasing awareness of testing professionals towards ethical issues. Ethics is treated as a meta-category, defining right and wrong beyond law, morality or religion. International guidelines addressing ethics in the field of psychological testing are discussed. We discuss what makes a test and the testing process ethical: the adherence to the general principles of the scientific method (objectivity, reliability and validity) and some characteristics of the interaction process between the testing professional and the other stakeholders involved in the testing process (fairness, procedural justice, sharing and communication of results, informed consent, test security and user qualifications).
Ethics is from a scientific point of view a branch of philosophy. Philosophers distinguish Author inform ation between normative ethics, which is a prescription on what people should believe to be right and wrong, and applied ethics, which focuses on the examination of specific real-life situations. As such, applied ethics is not exclusively the turf of philosophers, but also of the practitioners who are confronted by specific real-life issues (Barnhart, 2002). In a very broad sense, ethics refers to the principles of right and wrong conduct. Ethical standards are prescriptions about what humans in general (or in our case professionals) ought to do, usually worded in terms of rights, obligations or benefits to society or for a greater good. As such, ethics has very loose boundaries with other domains, such as morality, religion and law. We will address some of these differences in a later section, building the case for a definition of ethics as a significantly broader intellectual endeavor than the common conception of analyzing right and wrong. In our understanding, ethical judgments are action-guiding, while not being prescriptive. Of course, ethics is normative and prescriptive: it is concerned with how we ought to act and what results we ought to try to bring about. Still, while ethics is normative, stating how we ought to act, and what results we ought to aim for in our actions, ethics is more than a set of principles. It is a matter of course that ethical behavior has to be based on sound principles and values. However, ethics is not an automatic comparison of a real-life situation to a set of norms, it also requires active intellectual processing. The impossibility of automatic normative judgment is given by the fact that real-life situations most often require the practitioner to react to complicated issues, which have bearing on multiple and conflicting values or ethical principles, thus defining ethical dilemmas. It is therefore appropriate to state that ethics is the study of what happens when there are no simple answers to a situation. However, ethics also requires ethical thinking, i.e. ethical reflection. Ethical reflection is based on the perception of ethics, as well as on the ethical judgment (Roberts & Wood, 2007). In order to have an ethical reflection, a practitioner should be able to perceive and identify the dilemma as a situation which involves ethics in some way. He/she should be able to apply one or more ethical principles to this situation, should consider alternatives and shouldcome to a personal decision of how he/she will behave. Thus, being ethical is something one does by ones own choice, beyond legal or moral prescriptions. Defining ethics on the basis of reflection and personal choice has a bearing on the course of this paper. We argue that practitioners should be prepared to judge professional situations and issues critically and creatively from an ethical point of view. In order to facilitate such a behavior, a normative stance in this paper would be of little if any help. It is not our intention to cover all the possible combinations of conflicting principles which would define dilemmas requiring ethical reflection. Instead, we hope to provide practitioners in testing with heightened awareness to what we consider to be relevant ethical categories in the field of psychological testing, facilitating in this way ethical reflection when real-life situations have a bearing on these issues.
1/13
10/28/12
2/13
10/28/12
(a). Adherence of the testing process to the general principles of the scientific method Testing is conducted following the scientific method. Broadly speaking, testing is carried out with the explicit purpose of generating scientific data for decision makers. Because testing uses the scientific method, professionals in this area, be they psychologists, educators, or other professionals, are called upon in order to apply the method to the best of their abilities. According to the generally accepted principles of the scientific method, this means that testing should be used in order to generate the needed information in an objective, reliable and valid manner. Any usage of a test that does not adhere to this principle could not be labeled as good practice.
Objectivit Objectivity is the main principle of the scientific method. As an expression of the scientific method, psychological and educational testing should be as objective as possible. In testing, objectivity refers to inter-user consistency in the execution, scoring and interpretation of standardized assessment procedures (Westhoff & Kluck, 2008, p. 68). Of course, complete objectivity is never possible and certain facts which are accepted at a certain moment in time as correct, true or objective in a scientific sense, are often overthrown by other, new, scientific discoveries. This is a characteristic of scientific reasoning, which is consensual by nature and reflects the shared understanding of the scientific community at a certain point in time, rather than truth in an absolute sense. The principle of objectivity translates in the domain of tests and testing in three ways. First, we will consider a test as being objective if the procedures for administration, scoring and interpretation are standardized and constant across time, users and test takers (Kline, 1993). All test users should administer, score and interpret the test in the same way and all test takers, indifferent of their characteristics or of the moment in time the test is administered to them, should have the same opportunity to perform. While administration and scoring may be easily standardized in order to be considered objective, interpretation always calls for the professional judgment of the testing professional and as such inherently brings into equation subjectivity. The need to interpret test data in an objective manner is acknowledged by one of the values of the second principle of the Universal Declaration of Ethical Principles for Psychologists, (f) self-knowledge regarding how their own values, attitudes, experiences,
3/13
10/28/12
Reliabilit Reliability is important for the topic of ethical testing, because reliability describes the error associated with the measurement. Decisions based on test scores should only be taken with a careful consideration of the error associated with the measurement of those test scores. While recent publications have started a modern debate on reliability (e.g. Thompson & Vacha-Haase, 2000; Dimitrov, 2002; Fan & Thompson, 2001), we will address this construct here as outlined by AERA, APA & NCME (1999): Reliability refers to the consistency of [...] measurements when the testing procedure is repeated on a population of individuals or groups (p. 25). Reliability poses at least two ethical questions. These are related to the dichotomy of high vs. low-stake decisions and to the different types of reliability. High vs. low-stake decisions. The first question related to reliability is a fundamental one: how much should we rely on the results of the test? Again, the discussion is not one of relying or not relying, but of how much to rely. The degree of reliance on the test result describes the limits of its ethical usage. If we may not rely heavily on test results, then they are useless as a basis for decision. Reaching decisions or feeding decision makers information based on unreliable data is unethical behavior. Naturally, as our discussion underlines the degree of reliance, the following question arises: how much is acceptable? Where should we draw the line and view a specific test result, based on its low reliability, as being unethically used in a decision? Scientific consensus sets certain limits on reliability and describes the types of decision that are possible to be reached in a certain span of reliability. While a satisfactory level of reliability depends on how the measure is being used (Nunnally & Bernstein, 1994, p. 264), the generally recommended limits are .70 and .90 (Nunnally & Bernstein, 1994, p. 265). High stake decision should never be reached if the reliability of the procedure used as a basis for decision-making falls below the .90 level. Low stake decision may be reached with scores which have reliability below the .90, but not below the .70 level. High vs. low stake decisions refers not only to the impact of the decision, but also to the scope of the decision: highly reliable tests are needed to sort individuals into many different categories based upon relatively small individual differences (e.g. intelligence), while lower reliability tests are sufficient if the tests are used to sort people into a smaller number of groups, based on rough individual differences. Procedures with a reliability placed below the .70 level should be used only with the utmost care. The segmentation of decisions into high vs. low-stake places again emphasis on the situational aspects of ethical behavior: it is only unethical to use a test for a decision for which it is not qualified by its reliability. Still, in spite of these rather clear guidelines, some situations require professional ethical judgment on behalf of the professional. In many settings, for example in some I/O settings, tests represent pass/no pass hurdles. In these situations, the discussion around reliability gains a supplementary significance, and the professional using the test should probably try to
4/13
10/28/12
Validit The actual scientific understanding defines validity as a complex and integrated corpus of scientific knowledge and demonstrations, which examines the psychological variables measured by a test. Validity refers thus to the degree to which evidence and theory supports the interpretation of test scores (AERA, APA, NCME, 1999, p. 9). The knowledge and the demonstrations are rarely collected in a single place and in a coherent manner, but most often are presented in various formats, in various places. Examining the validity of a test requires an active search and an attentive examination of the pieces of knowledge related to the test. Validity is the most fundamental consideration in developing and evaluating tests (AERA, APA, NCME, 1999, p. 9). Validity tells us what a test measures and allows us to interpret the results of the test, to formulate descriptive conclusions and predictions based on test scores. The first way validity is related to ethics is through the way a test is selected for usage. Test users have an obligation to only use tests which have been sufficiently validated for the intended purpose of the testing and the intended target population.
5/13
10/28/12
Conclusions on the adherence of the testing process to the general principles of the scientific method There are no perfect tests. Objectivity, reliability and validity are not switches with only on or off states. Instead, there are many in-betweens from white to black. The psychometric characteristics of tests are, the same as many other characteristics, distributed normally across the population of tests. It is not unethical to use a test which is placed at the average or even under the average of this distribution, if the test user understands the limitations of the respective test, if the usage is done with a clear and complete understanding of the dangers, and the caveats are accepted and taken into account by the specialist using the test and if these drawbacks are communicated in a transparent way to the client. After all, a test is always employed to answer a specific question and one of the main criteria for choosing a specific test in a specific situation is the cost-benefit ratio for the particular clients question. It is, however, bad practice for a test user to ignore the need to document on the procedure, as is the decision to ignore shortcomings, or to use tests with shortcomings without communicating those to his/her client. Professional decisions and circumstances often allow for the usage of a test which is less than perfect for the intended purpose. This in itself is not unethical behavior, with the condition that the testing specialist understands the caveats and that he/she explains the drawbacks and cautions to his client. Virtually all the principles and all the characteristics of ethical behavior or good practice discussed here are a mix between common sense and a high level of professional judgment. In order to follow ethical guidelines, a testing specialist not only has to be aware of ethical practices in the respective area, but he or she also has to have a high level of professional understanding, in order to be able to evaluate the technical implications behind his/her decisions.
(b). Characteristics of the interaction process between the testing professional and the other stakeholders involved in the testing process Testing is an interactional process between the testing professional, the test taker and the client of the testing process. The client of the test user (i.e. the decision maker) is sometimes the tested person himself or herself. However, at other times, as in the case of testing in the field of I/O, forensic, educational or clinical psychology, the test taker is different from the decision maker. The testing professional has ethical responsibilities towards both categories of stakeholders. Fairness From this interactional point of view, the concepts of ethics and fairness are often interchangeable. Even though fairness is much closer to the everyday language and thus closer to the test taker, the concept of fairness is used in many different ways. For example, even though it discusses four different meanings of fairness in testing (fairness as lack of bias, fairness as equitable treatment in the testing process, fairness as equality in outcomes of testing and fairness as an opportunity to learn), the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999, p. 80) state that consensus on what is and what is not fair has not been achieved in the professional community and even less so in larger society. Ultimately, fairness is a mental construction of the test taker and the relevant community and as such it is subject to many influences. As all other mental constructions of the test taker, the construction of fairness may be influenced by the test user through the way he/she communicates with relation to the test itself and to the testing procedure. Procedural justice Even when testing is perceived by the test taker as being objective, reliable and valid (and thus scientifically correct), the perception of the test taker regarding the control he/she has upon the outcome of the test may vary widely. The assumption that outcomes drive the evaluation of a certain event is pervasive in the social sciences (Lind & Tyler, 1988); according to this assumption, people judge their social experiences in terms of the outcomes they receive. Attitudes towards tests and testing could thus be explained by these outcome-based judgments (Ambrose & Rosse, 2003). Contrary to this belief, process based models assume that the psychological construction (perception) of an event is not only driven by outcome, but also by the process itself. The main postulate of these models is that people not only care of the allocations, but
6/13
10/28/12
7/13
10/28/12
8/13
10/28/12
interpreted, verbatim information?
Some of the concerns regarding the release of low-level test data, such as item answers, is closely connected to the problem of test security, which will be discussed in a later section of this paper. For example, the APA Statement on the Disclosure of Test Data (http://www.apa.org/science/disclosu.html) specifically points to this concern. Still, on the other hand, under the American Psychological Association Ethics Code (2002), documents containing the responses of test takers are ordinarily subject to disclosure, considering that the virtues of secrecy regarding testing are often exaggerated and its vices underestimated (Erard, 2004). Release of un-interpreted data, like raw scores, or visual reports, is coupled with concerns regarding user qualification and the technical ability of the test taker to understand these un-interpreted data. In this regard, test takers are not technically competent to understand raw data, and the limitations of these data, making it thus possible for the results to be misinterpreted and misused, and to lead to misguided decisions with harmful effects. In light of the value of c) maximizing impartiality and minimizing biases, held by the Universal Declaration of Ethical Principles for Psychologists, test users should only disclose data which may be understood by the test taker and make all efforts to ensure that the data has been correctly understood, including its limitations. The situation is further complicated by the fact that, as we well know, disclosure of all the data for an individual is most of the time pointless, without revealing the mean, standard deviation or other characteristics of the distribution of scores for the intended reference group. Also, there are different methods of using test scores (e.g., top down; multiple hurdle; banding) and data from a single individual mean nothing without the knowledge regarding the underlying algorithms and cut-off scores used in the decision. Should these data be shared? Is it even realistic to expect the test taker to understand and use in a coherent manner this kind of information? It is our view that data of this depth is an inherent part of the testing procedure and should as such be protected by the principle of test security, which is discussed in a later section. In cases like this, test users should question the usability of any such information for the testee and other stakeholders. For example, sharing with a test-taker the fact that he/she failed to pass a test by 1-point could make him/her upset, angry, and maybe eager to sue. In some settings, like for example in educational testing where no decisions are being made, it is perhaps worthwhile to share this information, while in other settings, like a hiring context, this would be helpful neither to the company nor to the testee. Subsequently, the depth of information to share should be carefully considered and ballanced with respect of usefulness and fairness for all stakeholders involved. The obligation to communicate test results to the test taker is treated by many test users in a mechanical manner, which abuses the underlying principle stated above and cannot be labeled as good practice. Anastasi (1997, p. 543) notes that psychologists have to approach communication of the test results in a form that will be meaningful and useful to the recipient. Pope, Tabachnick & Keith-Spiegel (1987) urge that feedback to the test taker and the communication of results should be approached in a professional manner. Hood & Johnson (1997) set the frame for a professional approach in feedback by explaining that feedback should be approached in conjunction with the intended purpose of testing and in light to the very specific questions that have been raised and for the solving of which the testing was employed in the first place. Pope (1992) discusses 10 fundamental aspects of the feedback process, which would qualify communication of test results as a good practice, among them the framing of the feedback and acknowledging fallibility. The framing of the feedback will have serious implications on the way the data and implications are received and integrated, being influenced not only by evident variables, like order of presentation or language, but also by the tone of voice and other subtle mechanisms. Acknowledging fallibility is crucial by making the test taker aware of the limitations of the data and of the potential sources of bias or error in the results. In accord to the APA ethical principles (1992), we will consider it a case of best practice for psychologists to indicate any reservations they have regarding or any limitations they see in the result presented.
Informed concent The testing process may be envisioned as a complex consulting process which requires communication on the delivery end, and also on the inception end. Clear and straightforward communication of the scope and goals of the testing process will straighten out expectations for both parties involved and will set the stage for a correct relationship. This is even more important in testing and assessment than in other psychological work as testing could involve an invasion of privacy (Anastasi, 1997). Psychological testing more so than educational testing runs this risk, because psychological testing does not require only performance but also self-disclosure, and self-disclosure is an invasion of privacy. The test user should consent or refuse the specific invasion of privacy that is implied by the test administered. Such an approach as a basis for testing will also have positive implications upon the outcome of the testing, as it will minimize presentation bias and other forms of faking (Aiken & Groth-Marnat, 2005). Informed consent is formulated as an underlying value of psychological work and thus also of psychological testing by the Universal Declaration of Ethical Principles for Psychologists, which as part of Principle I, Respect for the Dignity of Persons and Peoples, states that psychologists follow the value of d) free and informed consent, as culturally defined and relevant for individuals, families, groups, and communities. Informed consent is not only a legal provision indeed, in some codes of ethics it is not mentioned and some psychologists are not even aware of it (Iliescu, 2008). Still, we consider it a good practice for psychologists to respect the right of clients to have full explanations of the nature and purpose of the techniques in language the client can understand (APA, 1992). In certain testing settings, informed consent should be viewed in a narrower manner, as the consent part is forcibly limited.
9/13
10/28/12
10/13
10/28/12
protected materials.
References
AERA (2000). Position Statement on High-Stakes Testing in Pre-K12 Education. Adopted July 2000. AERA, APA, NCME (1999). Standards for educational and psychological testing. Washington, DC: AERA. Aiken, L. R., & Groth-Marnat, G. (2005). Psychological Testing and Assessment (12th Edition). Upper Saddle River, NJ: Allyn & Bacon. Ambrose, M. L., & Rosse, J. G. (2003). Procedural Justice and Personality Testing. Group & Organization Management, 28(4), 502-526. American Psychological Association (2002). Ethical principles of psychologists and code of conduct. American Psychologist, 57, 1060-1073. Barnhart, M. (2002). Introduction. M. Barnhart (Ed.). Varieties of ethical reflection: new directions for ethics in a global context. Lanham, MD: Lexington Books. Bartram, D. (1999) Testing and the Internet: Current realities, issues and future possibilities. Keynote paper for the 1999 Test User Conference. Bartram, D. (2000). International Guidelines for Test Use. Punta Gorda, FL: ITC. Bartram, D., & Coyne, I. (2005). International Guidelines on Computer-Based and Internet-Delivered Testing. Punta Gorda, FL: ITC. Bartram, D., & Hambleton, R. (Eds.) (2006). Computer-based testing and the internet: Issues and advances. New York: John Wiley. British Psychological Society Psychological Testing Centre (2002). Guidelines for the Development and Use of Computer-based Assessments. Leicester: British Psychological Society. Dimitrov, D. M. (2002). Reliability: Arguments for multiple perspectidves and potential problems with generalization across studies. Educational and Psychological Measurement, 62(5), 783-801. Dudek, F. J. (1979). The continuing misinterpretation of the standard error of measurement. Psychological Bulletin, 86(2), 335337. EFPA, EAWOP (2005). European Test User Standards for test use in Work and Organizational settings. [http://www.efpa.eu/download/2d30edd3542f33c91295487b64877964]. Fan, X., & Thompson, B. (2001). Confidence intervals about score reliability coefficients, please: An EPM guidelines editorial. Educational and Psychological Measurement, 61, 517-531. Gauthier, J. (2008). Universal declaration of ethical principles for psychologists. In J. E. Hall, & E.M. Altmaier (Eds.), Global promise: Quality assurance and accountability in professional psychology (pp. 98-105). New York: Oxford University Press. Gregoire, J. & Oakland, T. (2008). On the need to secure psychological test materials. [http://intestcom.org/archive/ebay.php]. Harvill, L. M. (1991). Standard Error of Measurement (A NCME instructional module). Items, 9, 181-189. Hood, A. B., & Johnson, R. W. (1997). Assessment in counseling A guide to the use of psychological assessment procedures (2nd Edition). Alexandria, VA: American Counseling Association.
11/13
10/28/12
of Psychology (ICP), Berlin.
12/13
10/28/12
Author information
Dragos Iliescu (dragos.iliescu@testcentral.ro) holds a PhD in I/O psychology from the Babes-Bolyai University, Cluj-Napoca, Romania. He is Associate Professor at the National School for Political and Administrative Studies (SNSPA) in Bucharest, and Managing Partner of D&D/Testcentral, the major test publisher in Romania. He has worked for 12 years as a consultant in the field of business research, in areas like marketing research, branding and HR, for Romanian and international clients. Dan Ispas (dispas@mail.usf.edu) is a doctoral candidate in I/O psychology at the University of South Florida, Tampa, Florida, USA. He holds a M.A. in I/O psychology from the same university. His research interests include organizational interventions, counteproductive work behaviors, personality and affect in the workplace, and test development and validation. His research was presented at the Annual Conferences of the Society for Industrial and Organizational Psychology and the Academy of Management and was published in Human Resource Management Review, Industrial and Organizational Psychology: Perspectives on Science and Practice, Psihologia Resurselor Umane and The Industrial-Organizational Psychologist. Michael M. Harris (d. 2009) was the Thomas Jefferson Professor of Management in the College of Business Administration at the University of Missouri-St. Louis. He had a Ph.D. in I/O psychology from the University of Illinois-Chicago. Most of this work revolved around selection and hiring practices and compensation systems, with a focus on staffing/selection, compensation, and performance management, both in the domestic context and the international context. Michael published numerous peerreviewed articles in the area of human resource management and edited several books, including the Handbook of Research in International Human Resource Management (Lawrence Erlbaum, 2007). He served as a keynote speaker at the International Test Commission's conference in England, June, 2002, where he made a presentation entitled: "Patrolling the Information Highway: Creating and Maintaining a Safe, Legal, and Fair Environment for Test-takers."
13/13