Está en la página 1de 214

Essentials of Political Science
jan~esA&.Thurber, A&rnericanUniversity, Ecfitor

The Essentials of Pcllitical Science Series will present faculty and
students with co~lcisctexts designcrf as prirr~ersfor a given college
course, Many will be 200 pages or shorter. Each will cover core concepts
central to mastering the topic under scutly, I>rawing on their reaching as
well as research cxgericnccs, the authors present narrative and
analytical treatments designecf to fit well within the conf?-ines of a
crt~wtlecJcourse syl'iabrts.

Essentials c?fAmericun Gover12ment, I>avid AMcKay

Essentia

RESEARCH

A Menlber of the Perseus Books Group

All rights reserved. or any information electronic or mechanical. Essentials of politicaI research / AIan 19.A Member of the 13erseusBooks Group 13ublished in 2000 in the United Stares of Ainerira by Wesrview Press. 2. Political science-Research. Cumnor Hill. Copyri&t 82000 by Westview 13ress.westviewprerssorn L.1 1. Tide. Alan D. fjrinted in the United Scates of America. I. Series. Colorado 80301-2877. em . fjolirical science-Methodology 11. and in the United Kingdom by Wesrview Press. Monroe. 12 Hid's Copse Road.(Essentials of political science) Includes biograpl~icalreferences and index. ISBN 0-8 133-6866-V(pbk. Clxford OX2 9JJ Find us on the W<>rIdWide Web at ww. p. The paper used in this publication meets the requirements of the American National Standard for Permanence of Paper for Printed Library Materials 239.lkrary of C:ongress Caratoging-in-Publicatic~nData Monroe. inctudirzg phott~copy? sttlrage and retrieval systern. . 5SUIl Central Avenue. Boulder. No part of rhis publication may be reproduced or transmitted in any form or by any means. recording.48-1984. without permission in writi~lgfrom the putllisber.

Ill'elissa. and Mollie .For Paula.

This page intentionally left blank .

10 Exercises. Exercises. 32. 31 Types af Research Design.Contents List of Tables izzd Figures Preface 1 The Scienrific Study of Research Questiians What Does It Mean to Be Scientific?. 3 Reformulating Normative Questions as Empiricill. 44 1 . and Operational Definitions: An Overview. 23 2 Building Blocks of the Research Process Theories. 12 Suggested Answers to Exercises. 6 Research Qt~estions. 22 Operational Definitions. 29 3 Research Design The Concept of Causality.8 The Scietltific Research Process. 2 '7 Types of I-Iypotheses. 25 Exercises. 19 Theoretical Role. Hypotheses. 2 Distinguishing Empirical and Normative Questioils. 43 Suggested Answers to Exercises. 28 Suggested Answers to Exercises. 20 Units of Analysis.

98 Exercises. 92 Multivatriate Statistics. 1 13 Exercises. 54 Data 0x1 U. S9 lssues in Content Analysis.4 Published Data Sources The Xnternet as Data Source. SS Steps in Content Analysis. 1 OS) The Need far Standardization. 79 Levels of Measurement. 67 Interviewing. 78 Suggested Answers to Exercises. 73 Exercises. 106 Graphics for Muftivariate Relationships. 112 Principles for Good Graphics. SO Some Genera1 Data Sources. 11S Suggested Answers to Exercise A. 44 Exercises. 83 Uilivariate Statistics.S. 71 Writing Survey Items. S2 Demographic Da ta. l U7 How Not to Lie with Grapl-rics. 90 The Concept of Relationship. S4 Survey Data. Government and Po!itics. 48 The X~nyortanceof Units of Analysis. 65 5 Survey Research Sampling. 180 Suggested Answers to Exescises. 48 Strategies for Finding Data Sources. 116 . 52 Political and Governmental Data for Natioils. 5'7 Content Analysis. 64 Suggested Answers to Exercises. 102 7 Graphic Display af Data Graphics far Univariate Distributions.

130 Interpreting Contingency Tables Using Statistics. 2 20 Chi-Square. A Significance Test.177 Causal Interpretation. 1 4 l Pearson" r. 173 Significance Test for R" 176 Beta WeigI~ts. 1 33 Exercises. 15X Suggested Answers to Exercises. 190 References Inda . 153 10 MuXtivariate Statistics Coxztrolling with Corztingeliicy Tables. "11 Exercises. 147 Relationships Between Interval and Nominal Variables. 167 The Multiple Correlation. 1 59 What Can Happen When You Control. 178 Exercises. 160 Controlling with Ilntervali Variables: Partial Correiations. 136 9 Interval Statistics The Regression Line. I44 Nonlinear Relationships.8 Nominal and Ordinal Statistics Correlations for No~ninalVariables. 135 Suggested Answers to Exescises. 124 Additional Correlations for Nominal Variables. 1 1 7 Correlations for Ordinal Variables. 186 Suggested Answers to Exescises.

This page intentionally left blank .

3 7.11 .es and Figures Tabke.1 Stages in the research process 2. by ethnicity. 1996.1 Probability of chi-square 10. 1960-2 996 7.3 The correlatioilal design and examples 5. 1996 Popular vote for president. 1996 Reported voter turnout.2 7.1 C.c 5. 1996 7.1 The classic experiment and an exiil~~ple 3.1 Types of hypotheses and exaxnples 3.1 '7.1 faxnple size and accuracy Popular vote for presidetit.1 ProbahiIity of F for partial and multiple correlations (0.5 proba lsitiry Ievelj Figure5 1 . faxnple size and accuracy 1 Common bivsriate statistics 8.5 Turnout of voting-age population in presidetitial elections. Reported voter turnout.2 The quasi-experimental design and an example 3.. by ethnicity and education.

I 10.1 10.68 A U S . per pupil spending cm education. per pupil speriding on education. 19%-1996 9.7. by ethnic status.7 Percentage af persons below poverty level. 1990-1 996-correctly presented 7.6B U.S. 1990-1 996-incorrectly presented '7.8 Percentage of persolis below poverty level. 1996 7.2 Example of a curvilinear relationship Causal rnadefs for three variables and tests An example of a causai model: 1972 presidential election .

Any errors that =main. hr~wever. I have attexnpted to concentrate on wllat seem to be the most ixnporrailt pr>intsnecessary to understanding the research process. X have drawn on over twenty-five years of teaching this subject matter to students of Political Science at Hinois State University. Given the broad scope of this book and its relatively brief length. Drafts of the manuscript have been used as a text for several semesters. I have attempted to cover those points in sufticie~~t depth tl-rat the reader will be able to understand them. The inteiltioil in this book is to concentrate on the essentza:als. and my students have been helpful in correcting and refining the text. Therefore. it has been necessary to dispense with some technical details that a longer and more advanced text inight include. it would be appropriate for similar disciplines. are my respmsibility . At the same time. In wrltir~gthis book.Preface This book is intended as a comprehensive text for an introductory course in research methods for the sr>cial sciences* While written with students of Political Science in mind.

This page intentionally left blank .

When they ga beyond textbooks and the classroom. philosopl-ry. Second. First of all. students are often asked to do some research on their own-tl-re dreaded term paper. students in any subject spend most of their class time and study tirne Learning about the results of past research. to know about research methi>dology-that is. The need to understand and to be able to use research metl-rods continues beyond one" formal education. particularly those into which students from political. science and related disciplines go.The Scientific Studv of Research Questions The reason we have accumulated knowledge of any subjectwhether pl-rysics. they may have to ~udgewhether a piece of research is valid and whether its results ought to be believed. They can better understand what those findings mean if they have sorrte familiarity with the rnethods used to obtain thern. Brtt why is it important for people who are nut professionals in those fields. and those contracting for their services . their papers will be more meaniwful and rewarding if they can actually conduct original investiga tions. In advanced courses-and certainly in graduate school-this is a x~ecessity. particularly students. Consultants often use such methods. Although they may be able to get by with just su~rtmarizirtgwhat others have said. employees are asked to rnake decisions about the value of research methods and findings. In all sorts of occupations. or political science-is that others have undertaken systematic investigations of particuiar topics and reported the results. how research is done? There are several answers to this question.

as citizens wl-ro rnay be asked to vote on a tax referendum for a project recommended by a consultant" rreearch findings. or the real world: that which exists and can he known through the experiences of our senses-what cart be seen. the rnethods are comxnon to all social sciences. in local government.ilit~~-a finding cannot be accepted unless it can be replicated by others. heard. For that reason. the meaning of whick is discussed below. The first key part here is e~npirical. tbat is.The tenn refers to the facts. such as a swvey of potential clients. Moreover. Under this requirement-which is known by its technical term. The testing must be ol2jective. its results must not be dependent on any particular researcher's biases. intersuhective &s~"al. Much of wllat we might believe about things is not empirical. Understanding research methods is useful to all of us beyond tile workplace as well-ffjr example. Jt deals only with scielztific research. and in citizen organizations have a particttfar need to know something about research methods. What Does ]It Mean to Be Scientific? There are many definitions of science. it reflects our judgments about what should be. econt-jmics.should be able to evaluate their reports and findings. . Those who become active in politics. political science journals are increasingly requiring that authors of articles reporting empirical researcl1 make their data available for analysis by otl-rers. the next section of this chapter explains how to identify them. Perhaps the simplest one would be an attempt to i d e ~ z b band test erapirictlf gerzemlirntions. The purpose of the methods and techniques of scie~lceis to test empirical statements. and psychology. people may have to conduct some sort of research project on their own. Similarly. This book is an introduction to the process of research. but rather nornative-that is. Altl-rough the book is designed for students of politics and therefore uses examples f%omthat field and gives more attention to the techrliqrres that political scientists use most frequently. touched. A vitally import'dnt point to understand is that scientific methods cannot deal directly with nonempirical questions. and smelted. including sociology. it: is always important tbat scientific research reports carefully explain how data were coltected and analyzed.

The Sciefztific Stzady nf Research Qzaestions 3 The other key part of science is genemlzzation. Therefore. and scientific explanation requires generalizations. Some presidents runlling in good economic times are defeated. But when we collect that information on a large number of people from many places and across time. Tf we have generalizations about many phenomena.not just individual cases. are of little value by themselves. though the observation must he of individuals. the i n c u ~ ~ b e n t president is usually reelected.h not the only one. thoug. assuming that the econrlmic data were availahfe befcjrehand. Therefore. Jf there is a high rate of economic growth. was reelected in 1996. Scientists seek to rnake statements about entire classes of ab~ects. Alrlzough generalisations may not state this probabilistic quality explicitly. President Glintt~n.the incumbent. Note that the same reason could also be a basis for a prediclion of who would win the election. we can put them togetl~ef. The generalizations made in the social sciences are almost never absolute. The main purpose of science is to explain and predict. Gonsicter this simple logical syllogisxn: 1. There was a high rate of growth in 1996. (Generalization) 2 . This argurnerit is an explanation. we can make a generalization that people with rnore education are more likely to vote than people with less education. it is alrnr>st always implied. Distinguishing Empirical and Normative Quesrcions As noted earlier. (Observation) 3.Jmes has an advanced degree and always votes. it is important to be able to dis- . Some people with high leveis education do riot vote.into theories.ments. and some with little schooling vote regularly. science can answer only empirical questions or test empirical statc. The point is chat we must have generalizations to explain what has happened and to predict what will happen-and indeed. whereas MS. The f ~ t that s Mr. Smith has only a grade school education and does not vote. The election e x m p l e illustrates another important point. to understand h<-~w the world works. a term defined in the next chapter. for the election outcome.

Mathematics. Even if we find a normative proposition with which virtually everyone agrees ("Murder is bad"). is an example of purely analytical reasoning familiar to most people. desirable or undesirable. Exercise A at the end of the chapter presents some additional examples for readers to test their understanding. Whether they are simple descriptive statements ("Bill Clinton was reelected in 1996") or deal with complex relationships ("Controlling for presidential popularity. sometimes deal with analytical questions as a way of investigating the way things would be if abstract theories were true. particularly when one is selecting a topic for scientific research. these normative questions are fundamentally different because they cannot be answered objectively. Empirical statements refer to what is or is not true and can be confirmed or disproved by sense experience.4 The Scientific Strcciy o f Research Qzcestions tinguish empirical statements from other kinds.l presents some examples and comments on the rationale for their classification. Box 1. Social scientists. They deal with value judgments. . beautiful o r ugly. it still is normative and not empirically testable. This activity can help to develop empirical propositions whose testing would shed some light on the applicability of theories. the greater the increase in average real income. The answers to normative questions depend on the value judgments of the individual who answers them. or future ("Will the Democrats win the next election?"). present. Analytical statements refer to propositions whose validity is completely dependent on a set of assumptions or definitions rather than on empirical observation. they are empirical if objective analysis of data from sensory observation could potentially prove o r disprove them. It does not matter whether they are posed as questions o r as statements or if they deal with the past. questions of what is good or bad. There is one other classification of questions and statements: analytical. the higher the proportion of votes received by the incumbent partyn). particularly economists. Political scientists have often looked at different methods of casting and counting votes t o see what the consequences would be under these arrangements. Normative questions are different. that is. Examples could include: "Was Bill Clinton a good president?" "Should taxes be increased?" "Is democracy the best form of government?" According to the philosophy of science. including classical geometry with its proofs from postulates.

"iMost African Americans vote Republican. tllis is a false empirical statement. Normaaive. S." jn'czrmative) Afthough the extent of corruption under a nonpartisan system rnight be an empirical question. as the questiolz really asks which policy goal is more desirable.BOX. 2.1 Empirical. hut that is not the point. (It actually has l-rappened several times. 4. and it can be empiricaliy tested by surveys.'" Empirical j As it l-rappens. it is used as a value ~udgmenthere. but it is still empirical and could tested by observatioil. and Analytical Sentences 1. "is it more impc~rtantto adopt policies that will protect the environment or policies that will maximize economic growth?" "ormative) Although the word "ixnportant" is not necessarily normative. ""Sxty-two percent of the Arnerican people think the president is doing a good job." "c~rmative) The Supreme Court has in fact taken this position. . 1.) 6. because then there would be Iess cc~rruptic>il.S. "is it possible for a candidate to be elected president by the electoral college without havi~zgthe ggreatest n u r ~ b e rof popular votes?" "nalyticalf This question asks wl~etl-rerit is possible. Constitution. "It is better to have nonpartisan elections for local government. so it can he answered simply by looking at the way the electoral system is set up and constructing a hypothetical scenario about how it could l-rappen. 3. but it is still a normative judgment." ((Empirical)Although the evaiuaticrrt is obviously normative. ""Abortion is a fundamental right guaranteed by the U. the judgment that llonpartisailship is therefore better is normative. the statement is an empirical one about what value judgments people make.

is to change the frarne of reference. especiatly in politics. Tlze other method of refc~rmularing normative into empirical questions is to ask empirical questions about the assumptions bel-rind narrna tive ~udgxnents. Although chm~girrgthe frame of reference may be quite useful ftrr svrne topics. "A democratic political system is one in which government tends to respond to the wishes of tlze citizens. This can be done by taking the normative qtlestions that motivate our interest and reformulating:them as empirica! questions in one of two ways. are less likely But are to start wars." "naiytical) This is simply a definition and dues not require any empirical observation to test it. We have already seen an example of this in Box 1. the question of whether the public thinks his performance is good is an empifical one. Most normative judgments are based in part on beliefs about what is empirically true. far others tlze results produced would be trivial. many people believe that democracy is a betcer form of government than dictatorship because they believe that democracies are more stable.1. Afrer all. For instance. the political process is largely concerned with questions about wllat ought to be. tlzough often not the most valuable.7. . this was the basis of much of the objection to the scientific orientation that became dominant in political science in the 1950s and 1960s. The first m e t h d . such as presidential approval ratings. This means moving from a normative judgment to a question about the nomative ~udgme~its some persol1 or persms make. one might well abject that this excludes many of the moft interesting and important topics. Althougfi the question of wlzetizer the president is doing a good job or not is a normative one. In fact scientific research can deal with normative phenomena. Such refor1nu1ations can be made with any set of individualsthe public. political scientists. which is the easiest. and produce greater econoi~icdevelop~~ent. Indeed. or Left-handed civil servants. but it can do so only indirectly as it seeks to answer empirical questitms. Reformulating Normative Questions as Empirical On learning that scientific study does not attempt to answer normative questions.

Box 1. and Exercise B at the end of the chapter offers more. ( Assumptions) these asslullptio~lscorrect? Scientific investigatiorz trtay be able to test them. (Frame) Spending limits tend ta increase the reelection rate for incumbents. The assumptions method can be valuable in formulating interesting and important research questions. (Assumptions) 4. Strict limits on campaign spending far congressional elections should be adopted. (Normative) Democrats favor spending limits more than do Republicatls. troops generally l-ras not resulted in long-term prevention of disorder in the past. most reco~rtmendationsfor public policy changes are based on. thereby creating lobs and ultimately increasing tax revenue.The Sciefztific Stzady nf Research Qzaestions 7 BOX 1. Similarly.S. Whether or not these effects would occur is an empirical question that economists attempt to answer. Advocates of a tax decrease may argkle that it will stimulate the economy.2 Keformda tiag Normative Sentences as Empirical by the Frame of Reference and En?pirical Assumptions Meehads I. Should term limits he adopted far Gongresd (Normative) Do mtlst political scientists favor term limits? (Frame)VCiould term limits increase the influence of interest groups on congressional decisionmaking? (Assumptions) 2. but its limitations must be .2 presents some examples of refc3rmulation rrsirlg both methods. (Normative) Nations in the European Union favor the U. The United States should csntinue to send troops to the third world to attempt to restore order.S. Wc3ttld it be a gos~didea to legalize drugs? (Normative) Do most Arxtcricans favor legalization of drugs? (Frame) Would legalization of drugs decrease the occurrence of other crimes? (Assumptions) How tnuch would legalization of: drugs increase the frequency of addiction? (Assumptions) 3. (Frame) The support of peacekeepiw activities with U. sending of troops in trtost cases. assumptions about wllat the effects of tl-rose decisioils wilt he.

ernpiricai research can never actually answer a normative question. only to have instructors criticize the resulting papers for lack of focus? A thoughtfully chosen and clearly establisi-red researcl-r question can avoid this problem in both scientific and ntjnscientific inq~iiq. even better. the formtrlatio~~ of a research question (also called a research problem) is of paramount importance. A inore useful version would be "Is voter ttlmout reduced by political aiieilation?'\or. from siinply search question. peaceful. election laws. like any other serious intellect~~af investigation. or prosperous. It is not only prokssional scientists who must articulate a research question. Athctugh empirical reformulation may lead to research that will aid normative decisionmaking. but several criteria shoufd be kept in mind in choosing a topic and Eormulating a scientific re. and general enough that it suggests what a possible answer would be. and persons with particular economic i n t e ~ s t smay favor or oppose a tax cut regardless of its overal! effect t m the economy Research Questions Scientific research. Since this starting point will determine the design and conduct of the inquiry. But what are the elements of a desirable research questioll? This is ctiffic~1i.r:to answer in the abstract. The first criterion is c l ~ r i t y Aside being comprehensible in t l ~ eusual sense. but also beginners. begins with a question that the research is intended to answer. To restrict the e h v e examples to particular cities or elec- . or any number of other possible factors. For instance. Mow often do stuclents start with term paper topicsbut not research questions-and assemble stacks of information and write extetlsive summaries. "Does foreign investment result in long-term increases in the standard of living?'" Although research questions require specificiry for clarity. To use the previous exatrtples. "Does the use of election day voter registration increase turnout?'Yimilarly3 a question such as ""Wow can poverty in less-developed nations be rernedied?'+ould be improved by asking. this means that a question must be specific enough to give direction to the research. a believer in democracy trtight favor that fonn of governmetlr even if it were nor more stable. limiting their scope in time or place is neither necessary nor generalily desirable. the question "Wl-ry is voter turnout low in the United States?" "ves no direction as to whether we should look at citizen attitudes.kept in mind.

Although a given research project may weil be confined to a single time or place as a practical maccer. A second consideration is whether the necessary investigation can be devised and carried out with the resources available. Answering the question should potentially increase our general knowledge and understanding of the topic. Wc~rkingfrom existing theories or past research does not mean that the irlvestigator necessarity believes tkexn to be correct. at least. not a normative question. Answering the research question should be useful in some real-life application. Indeed. researchers in orher fields may have developed theories that can be applied. Evaluating a potential research question therefore requires finding out what past research findings exist or. First of all. its significance would greater than if the question came only from the researcher" iimagination. The research question must be one that can be potentially answered by empirical inquiry. But whether the research proves tlze past suppositions to be right or wrong. and it is an absolute requirement. wl-rich is a costly enterprise beyond the budget of even professional political scientists. Researching questiorls about attitudes of voters in presidential elections may require condt~ctingnational surveys. This is particularly true for questions dealing with causes of social yroblerns and their possible solutions ('"E-iave time limits on eligibility for welfare payments increased employment rates among past recipi- . A similar criterion is practiat relevance.The Sciefztific Stzady nf Research Qzaestions 9 tions in the case of voter turnom. tl-re suspicion that existing explanations are fundamentally inaccurate or no longer applicable in a changing world is often a major motivation h r research. Brit those who lack this abilith including undergraduate students. Anotlzer criterion is theoreticill siglzifiunce. may still pursue such questions by rnaking use of surveys conducted by others or by conducting surveys of limited popuiations. two methods for reformulating a normative question as an empirical one have already been presented. The second criterion is testiabilifiu. it must be an empirical question. or a single n a t i o ~in~ the case ~f economic development. what others have geilerally ass~lmedto be true. it is the more general question that science seeks to answer. Although political scientists map not have corzducted much theorizing on a given subject. would reduce the theoretical significance and practical relevance of the findil-rgs (these two criteria are discussed beiowf. because it represents building on previous research.

or conditions under which members of ethnic minorities with limited education become activists-that wt>rrld be more promising. Howewr. . Then we prepare a research design that could test those hypotheses (Chapter 3). 1 presents an oudine of the entire research process. and prefcrahly both. and ways in which they might be strengthened. Exercise C at the end of the chapter does the same. there may well be refatc. Box 1. it should have sorr. a survey of past research and tlzeorizing on a topic. The Scientific Research Process Figure 2 .. If neither is true. each stage of which will be covered in this book.3 presents several exarllples of passible research questions. the generalization that people with mare education have a higher voter turnout rate than people with Iess education is so well estahlished-in the United States and in the world in generai-that pursuing it as a research topic would not be a wise use of resources. then why pursue that topic? A final criterion is orzgiinulity. their strengths and weaknesses. Finally. For example. hypotheses are developed (Chapter 2). As discussed earlier. Athough there is a commtrn tendency to think of theoretical significance and practical relevance as opposing qualities.le degree of oriXinality. Thus there are five criteria to keep in mind in selecting a question for scientific research.ents?'". This does not mean that a research question must he completely new.d questions-such as why contemporary college students have low rates of political participation. From there. Xr slzouid have some degree of either theoretical skrtificitnce or pmclical r e k uance. but it does meall that the answer should riot be so weif established that there is fittie reason to expect a different outcome. The point is that there should be some poteritial value in answering a research question-eitlzer it should increase our general knowledge of tlze world. It shouId be clear and reasonably specific. Then one or more =search questions that meet the five criteria can be formulated. even for an undergraduate student. keeping in inincl what was already known. we rnust always start with. and it must be a q~zestion that can be investigated given available resources. Xt must be empirical to be -&file. or it should help in accomplishing sc~methingsomeone wants to do. the strongest research questions have some of both.

Additionally. and therefore would be low in significance. we collect the necessary data (Ct~apters4 and 5 ) . Finally. 9. Questioil: ""Slould the United States give military aid to Bolivia next year?'This question is obviously normative and therefore nut testable. However. ff it were improved in spccificity-for example. ""Has congessional voting been rnore along party lines since 154943'"then it would be much clearer and reaclily testable. Next. since the extent of party regularity in legislatures is a variable that politicai scientists have long studied. 8. it would have some degree of significance. Question: "How has Axnerican politics changed since the 1994 elections?'" This question is extremely vague. it is completely lacking in any theoretical significance or practical relevance.S. and it would have practical relevance for those who seek to influence public policy 2 . representatives?" This question is clear. . swtistical artalysis usually is needed to evaluate it (Chapters Q. It could be transformed by using the a s s u ~ ~ p t i o nmethod s and further strengtlzened by posing it rnore generally. easily testable. Improved: "Does receiving military aid cause less-developed nations to increase or decrease their spending on health and education"" 3. These findings then add to the body of existing knowledge and may lead us or others to raise new research questions. and probably original. senators tend to have higher levels of education than the spouses of U. Since empirical researchers in the social sciences typically collect large amourlts of infrirmation. it deals with only a single case.3 Evaluating and Improving Research Questions I. Moreover.The Sciefztific Stzady nf Research Qzaestions ff BOX 1.S. and so it does not meet the criterion of clarity. and 10). Question: "'Do the spouses of U. we draw our concltlsions and present them in a research report (infornation on presexitir-rgfindings graphicall y appears in Chapter 7).

If a fareign palicy decision would increase U. Ptltting courtrooEE trials o n television distorts the ~udicial pracess and defeats justice. 1s affirmative action an unconstitutional form of reverse discrimination? . Identify each of the following as empirical. normative. only oudaws wili have guns. Why do commui~istand socialist nations have lower irrcsmes than capitalist nations? 4. It is strongly suggested that the reader attempt to complete the exercises before iookiag at the answers. B.1. The current practice of campaign fund-raising is corrupting the character of American democracy. or analyticrtl. as the problerns could be answered well in a number of ways. then that's what should be done. 1. exports.FIGURE-. Allowing people to carry concealed weapons lowers the crime rate. Note that on Exercises B and C the answers provided are only suggestions.f. 8. If guns are outlawed.1 Stages in thc rcscarcf~process Formullate research questions -1 Formulare hypotheses -t Research design -1 llata collection 4 Data analysis -t Draw conctuslons Exercises Suggested answers to these exercises appear at the end. 5. 2. 7. 3. PeopIe who think that potiticiarrs are dishonest are less likely to vote than those who trust government.

Are votersVecl. Who shot President Kennedy? 3. 5. Should the United States increase the axnount of foreign aid it gives to poor natioils! 2. How democratic is the U S . and originality ff there are serious weaknesses. Evaluate each on the critetria of clarity. Political parties have fulfilled a majority of their platform promises over the years. Is political instability related to political change? Each of the following sentences is normative. Would we be better off i f Congress and the presidency were controlled by the same political party! 3. practical relevance. 1.The Sciefztific Stzady of Research Qzaestions f . spending for schools should be increased. Do appointed judges make fairer decisions than elected judges do? 4.S. I . suggest an improved version. Negative ca~llpaigltadvertising is what's wrong with elections today. Normative 2. Normative . testability. theoretical significance.? 9.I'seC Following are some pote~ltialresearch questions. Do we need a new political parry in this country to represent middle-of-the-road views Z Exer~. Since poor education is the biggest problem facing the nation. House had the poorest attendance record oil rotf calf votir-zg in the last session? 5. 4. Reformulate them using the empirical assumptions method. 10. Which member of the U.isions in recelit presidential elections influenced more by their attitudes on abortim or by their perceptions of the economic situation? Suggested Answers to Exercises I . political sysrern? 2.

government in agreement with the preferences of a inajority of the people?'" 2. as it deals with only a single event. Analytical 1. Xn addition. Do students in scl-rool. Would a new political party with an ideologically centrist pc-~sitionon most issues receive more than 20 percent of the votes? 1. 'Wow much of the time are the policy decisions of the U. and its mswer could conceivably have some practical relevance. Analytical C. The problem here is a lack af clarity. Empirical 4..S. Empirical 10. Normative 9. and it is definitively unoriginal. Normative '7. The yuescion is clear and specific. Xf made more specific. But it is not Likely to be testable. Improved: ""Dopolitical assassinations in modern Jexrlocracies lead to changes in the governing political party Z" 3.3. Is tl-re amount of U. the question certainly could have considerable theoretical sigt~ificanceandlor practical relevance. econornic aid received by a nation reIated to subsequent grawtk in per capita income? 2 . it lacks theoretical significance. Xf some empirical measure were . for example. Was the hequency of negative advertising greater in the 1990s than in the 198QsZ S. Ernpirical S. The problem here is that fairness is a normative concept.S. as tl-re term democracy is used in rnany ways and each has many aspects. districts that spend mare on public education have higher test scores after the average education and income of parents in those districts are taken into account? 4. Are federal budget deficits greater in years of unified party control than in years of divided control? 3. so the question nut testable. Empirical 8.

then the q~zestioncould be testable. the question is still of interest. as the answer is not completely clear and it r~eedsto be reinvestigated fc~reach new election. sigz~ificant. This is a clear question that could easily be tested. Therefore. Although it is not coxnpletely original. but it lacks any theoretical significance and has little practical relevance. . and relevant. nu improvement is needed. Improved: 'Wms a representative's attendance record affect his or her chances of reelectic~n?" S. This is a reaso~lablyclear and testable question that has considerable theoretical significance for our knowledge of voting behavior and same practical relevance for contemporary politics. for example. ""Are elected judges mare likely than appointed judges to render verdicts favoring the del'enbant in crirninail cases?" 4.The Sciefztific Stzady nf Research Qzaestions f5 subsrituted.

This page intentionally left blank .

Science starts and ends with theories. The concepts discussed in tl-ris chapter constit~~te tlze very heart of social science research. and familiarity with them is not only helpful in understanding how others conduct research but also viral to being able to do it yourself.\! is used in wide variety of ways. AIthough tlzese concepts might seem very abstract at first. Box 2. ect~nomicdevelopments. Theories. such as voting decisions.1 contains a diagram of these levels with two examples. by the end of the chapter you shouid be able to apply some of them to specific examples yourself. A theory consists of very general statements about hr>w some phenomenon. the term theor. and Operational Definitions: An Overview One of tlze difticulties in simply describing these building blocks of researcfi is that science operates at several levels. it could be defined as a set of empirinll gcmemEixatiuns abuzgt a t q i c .ding Blocks o f the Research Process This chapter presents a number of different concepts involved in the research process. mcurs. or outbreaks of war. The goal here is not to teach terminology but to help you keep these ideas straight as you work with them. Although. Hypotheses. But tlzearies are too general to test directly because they make statemetlts about the relatioilship between abstract concepts-sttch as economic development and political alienation- .

This is done by testing h3~12otheses. the higher the percentage of the population of voting age that participated in the most recent national election.R~ildlfzgR Locks C>( &:re Research Prc~cess f8 BOX 2.Variable 2. OPERATIONAL: The higher a survey respondent" answer when he or she is asked. ""Did you vote in the election fast November?'" that are co~nptexand not directly observable. OPERATIONAL: Operational Definition l is related tct Operational Definition 2. THEORY: Econo~nicdeveioprnerit is related to political development. To actually investigate the empirical apglicabitity of a theory.A hypothesis is simply an empirical statemertt derived from a theory. HYPOTHESES: The mare industriafized a nation. according the StatkrstnanUearbook. HYPOTHESES: Variable 1 is related tc-." the more likely that person wili ailswer "Yes" when asked. EXAMPLE 1. The logic linking the two . according to the U~2tel-lNations Yearbook. "Wfiat is your household" ailnual income. the rnore likely he or she is to vote. the greater tl-re level al mass political participation. HYPOTHESES: The higher a person" incorne. it inust be brought down to more specific terrns.2 An Overview of the Levels o f Research LEVEL THEORY: Concept 1 is related to Concept 2. OPERATIONAL: The higher the percentage of the labor force engaged in manufacturing. EXAMPLE 2: THEORY: S~cioecoliomicstatus affects political particiption.

Most scieritific hypotheses are mrritivariate as well as directional. The stages illustrated in Box 2. the opposite ~ c c u r s . if the hypothesis is confirmed by empirical observation.1 illttstrate. However. as one variable rises. Hypotheses are statements about v~rzables. la set of directions as t o how the variabkr is to be observed and measzdred." h ~nrzorni~ni relationships.1. If the hypothesis makes a statement about only one property or variable. ""Tbe wealthier a nation. the other tends to rise." h1 negative or inverse relationships. Hypotheses are those answers to our research questions that seem to be the most pramising on the basis of theory and past research. Each variable in a hypothesis must have an operatio~laldlz(init-lo~. but one or both of the variabtes are .1 show hr>w we move from very gerieral theoretical propositions down to specific instructions about how to measure variables. In a positive or direct relationship between two variables. the greater one% income. the lower its Level of illiteracy. Moreover. for exaxnple. which were discussed in the yrevic>uschapter.A variable is an empirica/ proper9 that ca.z take on two or more differerzt vai~es. Constructing operatioilal deflnitiolls is a vital part of the research process and is discussed later in this chapter. h rtzuitirrarilate hypothesz's rna kes a statement about l-row two or rnore variables are related. they specify not just that the variables are related to one another but also what the direction of the relationship is.that is. that is. ""The rnore education one has. hypatlleses are much more specific than theoretical statements. as shown in Figure 2. But even variables are not specific enough lor observatitrn.is that if a genera1 theory is correct. then the more specific hypothesis derived from it ought to be true. whether by looking on a particular column in a reference book or asking a specific question in a survey Types of Hypotheses Hypotheses rnake staternents about variables. then it is referred to as a lilrszvarkte hypothesis. These statements can take a variery of fcrrms. for example. as one variable rises.As the examples in Box 2. Hypotheses are also related to our research questions. the hypothesis does predict the direction. tile other tends to fall.that is. we must question the validity of the theory Gorn which it was derived. then our confidence in the general theory is inrreased. if a hypothesis is not confirmed.

the tower the turnout. Iadepe~zdentvauicables are those presumed in the theory underlying the hypothesis to be the caz. "causes." Theoretical RaXe Trr mtlst mltivariate hypotheses. An example of such a oofrtinal reiatic~nshipwould be ""Catholics are rnore likely t l ~ a n13ratestants to vote Republican. 1 Directional l No~dzrecrional Agc is retated to tQ turnout.lse and dependent variablles are the effects or consequences. such that they can. but here an introduction to the concept is needed. l Nonzi~zaE Catholics have higher turnout than Protestants. if we hypothesize a relationship between a person" gender and his or her a~itudes. The statement may include explicit language to that effect-for example. the higher the turnout. 2. Although this distil~ctionis sometimes diffmft to make. the substantive nature of the variables permits only one direction." ""ads to.cive The higher one's sinco~ne.it is cr~nceivableonly that gender is the inde(Which peiident variable and attitude is the d e p e n d e ~ ~ variable. each variable takes on a particu- lar theoretical role. For instance." or "resutrs in.tlot be described in quantitative terms. the presumed causal relaionship between the variables is specified. I3osz'.R~ildlfzgR locks C>( &:re Research Prc~cess 20 FICiliRi-." h other instances. t .1 Types of hyporl~esesand exampies Hypo $heses U~sivariate Xblultiva~iatc Turnout was 49% Nonassociatiorzal Getlcter is n ~ refaced r to turnout. l Negative Thc rnorc alienated. in trtost hypotheses it is apparent. Causality is discussed in greater detail in Chapter 3.

Control variables can go a Iollg way toward clarifying relationships between variables. Similarly. 7%)take a well-know example. In contrast. reiigion. As a practicai matter. then the dexnographics would probably be the independent variables. Gender and race. if we hypothesize a relatit3nship betweell dernographic attributes (econoxnic development. but it is n ~ passible t for your thoughts to affect your gender. Tlze control variable takes on a third theoretical role. Otle might readily condude that race is somehow the cause of h w e r turnout and advance explarxations based cm racial discrixnination in voter registration or cultural Qifkrences in politicai attitudes. if: we compare Afi. Yet. or cities) on the one hand. are subject to alteration with the passage of time. to conclude that one caused the other. if we statistically control for other characteristics such as education.gerider you are might intluencr: your thoughts. policies they adopt) on the other.ican Americans and whites who have the same Xevel of education and inco~neand live . ~zsuaflywith the terms cauttrolling Jar or holding constant.. It can be al! too easy. most social characteristics of individuals. such as education. are determined before birth. But we must always he alert to the possibility that other hctors rr. economic status. the intent is to ensLtre that their effects are excluded-that is. for example. the difference largely or even entirely disappears. as a number of studies have shown.g. are usually determined early in life. and region of' residence. urbanization. nations. When control variables are used. Hence we usually presume that the stjciai factors are independent variables and tlze behaviors are dependent variables.. to ensure that it is not these variables that are in fact responsible for the variations observed in the depellderrt variable. when we find that two variables are related and we look no further. aspects of political behavior. and their behaviors (e. and region of residence. and the like) of geographic or political units (e. states. In other words.g. African Americans l-rave lower rates of voter turnout than do whites.lay be involved. Ifitimateiy the decision as to which are the independent and whii-h the dependent variables is based on our theoretical tznderstanding of the phenomena in question. Control variahies in a hypotbesis are aiways expiicitly fabeled as such. Control variables are additiurral! vwkbles that mkhr affect the relationship between the independ~ntand dependent variables.) Often the nature of the relationship lies in the timing between variables. such as voting decisions and opinions.

If we say that people with one characteristic also tend tc-. depending on which unit of analysis is used. The student collected data on counties in the Southern states for a xlumber of variables and coxnputed correlations for all the variables. 1980. hut of what arc they properties? The answer is the unit c>f analysis in the hypothesis. the relationship between income and turnom may he very different.have another characteristic. then natioils are the unit of analysis. certainly. Indeed. One of his findings was a strong positive relationship between the proportion of a co~zntg'spoyuiation that was African Arrterican and . Units o f Analysis As mentioned earlier. there may be a choice.2 also identifies the anit ofn~czlysisimplied in the hypothesis. In the example just given. then the unit is the individual person. The choice of which unit to use in testing a hypothesis is extrexnely important. for both individuals and groups have both incomes and voting. thc~ughin the case of groups it would he totals OF averages. Exercise A provides additional examples. If the hypothesis is simply that "income is related to voter turnout. Box 2. such as the populations of states or cities. Jn rnany hypotl-reses the unit of analysis is explicit. variables are empirical. 90-91 j. Note that although most multivariate hypotheses l-rave only one independent and one dependent variable.2 presents several examples of hypothaes.22 R~ildlfzgR Locks C>( &:re Research Prc~cess in the same part of the courltry3each is as iikely as the other to vote (WoIfinger and Rosenstone. One of the major pitfaits that can occur if the wrong choice of unit of analysis is made is committing the ecologictzt fallacy: ermrrecr~sEydrawing conchsiorrs about irrdividu~lsfrom J ~ t ao t z grozfps. that is. the olrjects that the hypothesis describes. it is possible to have more than one of each. a collceyt discussed in the next section. properties. identifying the variables and their roles. any investigation of turnout should control for tbexn. or it could be groups of people. "fhis error is well illustrated in a paper subxnitted by a student in a poiitical scieizce class at Illinois State tlniversity. This would lead us to conclude that the main reasons for racial disparity in voter turnout are these de~rtograghicfactors." the unit of analysis could he individuais. Bttx 2. Sometimes the unit of analysis in a hypothesis is not so obvious. Tf the hypothesis says that some types of nations are higher in some factor than otlzers.

states 4. Wirh age held constant. Independeslt variable: Tiipecof economic system Deperidexit variable: Military sgeriding Control varia blie: GNP Unit vf analysis: Nations S.f. the greater the proportion of votes received by the party of the president. Identifying Independent. Independent variable: Negativity of campaign advertising Dependent variable: Turnort~rate Unit of analysis: U. senatorial campaign.S.BOX 2. communist nations spend inore tltan capitaiist nations for the military. and Control Variables and the Unit of Analysis 1. such as states or counties 2. the lower the voter turnout rate. With GNP hefd ci>nstant. Independent variable: Urbanization Dependent variable: Grime rates Unit of analysis: Geographic areas.2 Examples of Hypotheses. Independent variable: State of the economy Dependent variable: Proportion of votes for incumbent party Unit of analysis: Elections corzti~ilues . The more negative the advertising in a U. edttcation and political particip"rion are po&tivety =lated. Dependent. Urban areas have lower crime rates than rural areas. 'The better the stare of the economy. Independen t variable: Education Dependent variable: Political parcicipatian Cotltrol variable: Age Unit of analysis: Individuals 3.

such as populations of geographic areas. This conclusion also contradicted the surveys of the time. in which almost no minorities reported voting for \Vallace. Independent variable: Religion. the American Illdependent Party candidate. in which we construct operational definitiom. Committing the ecological fallacy trtay s f t m be texnpting. Controlling for political party. The studexlt" data and statistics were correct. draw collclusions only about states. hut this tells us wtfiing about how African Americans voted. This example & m i d serve t a rernind us of: the ixnportance of using the appropriate unit of analysis f>r testing hypotheses and drawing conclusit>ns. Xf the data coxlcern states. others have found that areas in the South with higher nonwlaite papulations voted more for Wallace. The student conclrtded that it was African Americans who voted for wailace-an axnaaing finding since wallace was a well-known segregationist who opposed civil rights legislation. This strange outcome was a result of the ecological fallacy. The decision about the appropriate unit af analysis becomes crucial at the next step of the research process. a legisiator" vvotes on abortion are related to his or her religion and educatioil.24 R~ildlfzgR Locks C>( &:re Research Prc~cess 6. which usually must come from surveys. Tt may be that 30 percent of a county was African Axnerican and that 30 percent of the vote went to a particular candidate. because data on groups. . indeed. His error l a y in drawing cunclusions about which individuals cast which votes. Education Dependent variable: Votes on abortion Control variable: Po!itical Party Unit of analysis: Legislators the proportion of the vote in the 1968 presidential election that was received by Ceorge Wallace. are much easier to obtain from published sources than data on individuals. Tlae best way to avaid the problem is t a draw conclusions only about the units of analysis for which the data were actually collected.

rial here is critical to geteing started. and the researcl1 question may have to be modified or even abandoned entirely. then those g r o w data would he appropriate. if our units are population groups. color of skin and eyes. It must specify prer w a ~and t whew (or h o d we wit/ get it. if the unit of analysis were a state. Xf a variable cannot he operationally defined. so it is necessary first to determine wl-rat the appropriate unit is for the hypothesis. Operational defirtitions are a cruciai part of the research process. Asian American. so that we wiif eventually be able to compare the frequency with which individuals who have one characteristic also have ailc~ther. what we want is to know which ethnic group each person identifies with. such as census figures and voting totals for cities and states. the hypothesis cannot he tested. people for whom data are available on each of our variables." Or. etc.. Often the unit of analysis will he individuals. Operational definitions have alrrtost nothing in common with the definitions one finds in a dictionary. then what we would want for race woutd be the . Whereas a dictionary might say that "race" refers to ""anyof the major biological divisiolls of mankind." an operational definition could be '%ask survey respondents whether they csnsider themselves to be African American.Operational Definitions Testing hypotheses requires p ~ c i s eoperational definitions specifying just how each variable will he measured. the unit of analysis will often determi~lehow a variable is operationalizcd. or aggregates. constructing an s p erattional definition has two requirements. hut the 1natc. it cannot be measured. A fundamental principle to be remembered is cllac all variLzbles in a hypothesirs must be operatiorsnlked firr the same zhnit of nnaIysis. In the excisely ~ h a we ample of race for individuals used above. On the other band. Data on population groups. census of 1990. will not suffice. If the same hypothesis concerned states. or other. particularly Chapters 4 and 5.S.d. distinguished by color of texture and hair. according to the U." As suggested in the previous section. and how we will get it is through a survey. the operational definition might he ""the percentage of the population that is nonwhite. that is. Afler the unit of analysis has been selectc. Native American. White. You will be better able to construct operationai definitions after learning the material in later chapters. Hispanic.

leasure would be the percentage of the vote tkat was Democratic. The methodology of surveys will be presented in Chapter 5. . usually by standardizing to the population.l.S. sucl-r as interest groups.r"td~rdz'x~d~ that it should be measured in a way that makes comparison of different cases meaningful. Unstandardized xneasures usually reflect tl-re total size of the population group more than anything else. persmal history data. Data sources for geclgraphic population groups and governments at ail levels are discussed in Chapter 4. simply heca~zselarger states or rlations have more of almost everything than smaller ones. Elowever.r. (If we do not standardize these aggregate measures. for tl~ereare very few pieces of politically relevant information about ordinary people tkat can be obtained in other ways. Ehwever. often sources may be found of infclrmation already collectd on them. and where we would get it could be the U. not tile total number u l vrltes. if the "iindividual" is a special type of person.3 presents examples of hypotheses and of how the wriables might be operationalized.) Box 2. Bureau of the Census. and votes on legislntive issues are a rr." the appropriate rr. then many other variables are readily available. Thus if the variable is ''how Democratic a state voted. ""lndividrrals'hs a unit of analysis can also be insticutians. though surveys of institutions may aXso be necessary. 11 we are concerned with the wealth of nations. 11 the unit of analysis is the individual. then almost any variable will correiate with any other.26 R~ildlfzgR Locks C>( &:re Research Prc~cess proportion of the population that is nonwhite. An astonishing variety of information is collected by governments across the world as well as by other agencies. then per capita gross rlatioxlal product (GNP) would be a better measure than total GNP. two units of analysis are very common in political science. and each has a typical type of data source. for mexnbers of Congress.latter of public record. As this example stlggests. such as the holder of a government office. For example. then the source us~talfymust be a survey. one prillciple to keep in mind when constructing operational definitions using data on This means groups is that the data usually must be st. and political parties. meaning people in general. Exercise B at the end of the chapter presents other exaxnpies for self-testing.1 contributic->nsand spending. corporations. campaigr.

3. Succas: The percellrage o f the total votes received by the candidate according to America Votes. according to the Worfd Haadbook of Political a ~ Social d Indicators.3 Examples o f Hypotheses and Oprrarionat Definitions 1. The higher the level of: a person's education. "Do you believe that ahortion should be legal under any circu~nstancesor not?" . S p e n d i ~ g :Tlze amount of campaign spending reported to the Federal Election Commission. Eiiucati~pl:Ask a survey respondent. "How far did you go in s c l ~ o o ~ ? " Opinion on abortion: Ask the survey respandent. assassinations. the lower the level of political instability Economic development: Per capita GNP as reported by the United Nations Yearbook.BOX 2. the more successful his or her campaign. the more likely he or she is to favor legal abortion. and irregular execrltive transfers per year since 1970. The more econoxnicaily developed a nation. Poiitical insmbility. The average number of coups d'itat. 2. The more a congressional calldidate spends.

and (if any'.28 R~ildlfzgR Locks C>( &:re Research Prc~cess 4. Spending fc~reducation: Per pupit spending for public elemeiltary and secondary education. With education. citl~~lilwd fr01rt data in Anzeric~Votes. The Larger a city. the more the state spends on education. the higher the crixne rate tends to be. Party cc~mpetz'tionzThe difference between the Republican and Democratic percentages of the vote k ~ goverrlor r subtracted from 100. Media attention is necessary for a candidate to succeed in a primary election. 4.S. control variables and the unit of analysis. Far each of the following hypotheses. democracies are trtore stable than dictatctrships. l. The trtore csmpetitive political parties are in a state. Exercises Suggested answers far these exercises appear at the end of the chapter. there is little difference in turnout between whites and African Americans. income. For each af tl-re following hypotheses. and region held constant. dependent. Statistical Abstract. 3. W11en Length of time since i~~dependeilce is held constant. identify wl-rat appear to be the independent. It is suggested that you attempt to complete the exercises before looking at the answers. 5. 2. according to the U. construct opemtional definitions for the variables. Southern states have less party competition than Northern states. .

according t a the U. 2 . control variable: time since independence. Cantroiling for education. controf variables: education.1. Indepelldent variahle: media attention. Independent waria ble: ty ye of government. the better the candidates s f the incumbent president" party do in congressional elections. The better the state of the econr>my. dependent variable: party competition. Suggested Answers ta Exercises l . according to tlze U. Urbanization: The proportion of persons living in places with poyulations of 2. Nations that receive U. Statistical Abstract. dependent variable: electinn success. Voter turnout: The proportion of persogls of voting age casting ballots in the 1996 presidential election. region. according to the U. Winning candidates have more positive perceptions of voters than do losing candidates. the lower the voter ttlrnout. Education: The median years of education af persons 25 years af age and over.S. unit af analysis: nations S. Independent variable: region.500 or more. Bureau of the Census. S. 4. dependent wariable: stability. Independent variable: race. unit of analysis: individuals 3. foreign aid are more likely to support the Uilited States in foreign policy. race. ~ l r ~of i t ar~alysis:cities 1. Independent variahle: size.S. dependent variable: voter turnout. People who perceive that they are better off: economicalfy tend to vote for the incumbent candidate for president. the more urban an area. . unit of analysis: states 4. unit of analysis: candidates 2. dependeilt variable: crime rate.S.S. Statistical Abstract. 3.

according the U.somewhat informed. "Do you ttlink tl-rat voters in this district are highly ink>rmed.S. or not very well informed about the issties?'" WinninglXosing: Look at the report of the State Election Crjmmission to see which of the candidates won the election and which iost. 4. 5. calculated from data in the Uni&d RTatz'ons Yearbook. Ross 13erat. State of the economy: The change in real per capita disposable personal income for the year of the election. ""Do you think you and your hmily are better off eccrnotnically. Foreigxl aid: Did a nation receive any military or economic assistance from the United States in 1997. worse off. Bob Dole. according to the Annual Report of $he Council of Ecorromic AduiSe~s. Success of the incuxnbent president" party: CaXculate what percentage of House seats were wail by that party" scndidates in each election from results in Coqressionab Qsdauterly Weekly Report. or about the same as you were four years ago?" Presidential vote: Ask survey respondent.3 fl R~ildlfzgR Locks C>( &:re Research Prc~cess 2. Positive perceptions: Interview candidates for the state legislat~~re and ask. "Did you vote for Bill Clintan.or surneone else in the electioil last Nc~vember?" 3. State Department? Support in foreign policy: Percentage of time a nation voted with the United States in the United Nations General Assembly in 1997. . E c o n o ~ ~ perceytic~n: ic Ask swvey respondent.

The first is co- . we must determine that three criteria have been met. But in a braader sense research design can refer to a whole proposal fur a research project that would also include the review of the literature. This step. As explained in the previous chapter. "influences. i s critically ixnportant in the research process." The previous chapter itlcrodwed the idea of an independent variable (the cause) and a dependent variable (the effcct). People use the term research design in two different ways. This broader sort of research design i s what you would submit if you were asking for financial support for a projecc or approval Eor a graduate thesis proposal." or "is a result of. In order to draw the conclusion that one thing causes another. the next step is to fc~rmulatea research design. and possibly even a budget far the proposed expenditures." ""leadsto. Here we will see more completely what this idea of causality means and how it can be determined. along with the building blocks covered in the previous chapter. a discussion of the statistical tests that will he used once the data are collected. details of how data will be collected.Research Design Once we have selected a research question and set forth one: or more testable hypotheses. research design refers to the logical method by which we propose to test a hypothesk. In this chapter. many hypotheses use the language of callsation-far example. The Concept of Causality The types of research designs presented in this chapter are all intended to test wllether one variable causes anutl-rer or causes tl-re variatioil in another.

for example. that storks are respmsible for babies or that umbrellas cause rain. The third criterion is nonspurkousness. each type of research design attempts to fulfil! these criteria. If we observe. We must be sure that any c~variationwe observe betweeri the independent and dependent variables is not caused by other factors. Two other criteria must also he met. Ail types of research designs intellded to determine whether causation exists are set up to measure the extent of covariation.uaricatiorr. evidence that two phenomena tend to occur at the same tirnes or for tl-re same cases. It is true that the physical and biological sciences and some of the social sciences use experimentation frequently. not because experiments are terribly comxnon in political science. Govariation is also called correlagkon. correlation does not mean causality. that is. This kind of reasoning can lead to the conclusion. One is time order. presidential popularity increases.lust have evidence that the presumed cause (the independent variable) happened before tl-re presumed effect (the dependent variable).is sometimes used to describe all sorts of tl-rings that are not experiments at ail. Experimentation has its own vocabulary.e rr. we are noting evidence of covariation. but because the logic involved is relevant to all types of research design.ueWExperirne~ztulDexigli~ When many people tl-rink of ""science. though never exclusively. for example. But." they think of experiments. It is i~nporttlntto understand how an experiment is set up. As we will see. and statistics that measure the strength of covariatic~nare referred to as correlatiofz coefficients-or simply curreliatiorzs. We sometimes use the modifier ""true'3ltecause the term experr'p~enir. or that people with high incomes are more likely than poor people to be Republicans. that every time there is a crisis in foreign policy. Figure 3. People s o ~ ~ e t i m have e s stopped there and assumed that covariation alone is grounds for concluding that causation exists. employing such terms . fW. Types of Research Design The "?). as is often repeated in methodology courses.1 presents an outline of what is required bp the 'kciassic" experiment-the sixnpless version of a true experiment. with varying degrees of success.

the units ofa~aIysis. The C:lassic Experiment Expcrimcntat group Assip subjects f randomly or by matching Stimulus { f~tdcpcrmdermt variable) Pasttcst i L il Posttcst (Llepcndcnt varia blc) Control group B. laboratory animals. mainly because the number of . tl-rat is. then the two groups should. also on any otlzes variables. Expcrimsntal R'Oui? Assign students /fl randomly \ Control group S t i ~ ~ ~ u t u s Posttcst {rake {Political coursef Interest score) $ {Do nor take courscf 130srrcst f ( ffolitical Interest score) Compare as sul2jecd. The classic experiment starts with a group af subficts. Sornetimes randomization is not used.1 The dassic experiment and an example A. The best: way to do this is to rartdomb iasskrz the subjects tct the two groups by same inethod such as flipping a coin.whether individual people. we will use them. or anything else. These subjects or units are then divided into two groups by soxne method that would assure tl-rat the two groups are as identical as possible on the dependent variable in the hypothesis.FIGURE 3. statisticalljf. An Example: Hypothesis: Taking an introctuctory American C. wllether or not those variables can be measured. be identical in their distribution on not only the depende~itvariable but. but we will also see how they are translated into the terms we have used to describe hypotheses.s and slinzulus. Xf this is done.overnment course increases political interest.

>upsin any other way. for example. If. then we can conclude that the hypothesis is confirmed. After the stimulus has hacj time to work its expected effects. (""Significantly" k a statistical term that will be explained later in the bor>k. Tlnis is assured by the fact that the experi~nentaiand treatment groups were exactly the same in all ways before the stirnulus was applied. it is the posttest comparison that shows whether there is covariation.) Tc) understand how the classic exyerimerit can ""pove39he hyporhesis. all subjects in both groups are given a posttest that measures the dependent variable. Let us see how the classic exprirnellt could be used to test the hypothesis that taking an introductory American Govern~nentcourse increases the degree of political interest among college students. the experimental group measures higher on the dependent variable in the posttest.1. (It is aiso assumed that all sub~ectswere treated in the same way in all other regards. The stimulus (or lack of it) is the independent variable in the hypothesis. does not receive the stimulus. (This example is also diagrammed in Figure 3. tl-ren we see that she subjects who received the stimulus measure higher on the test than those who are not. First. then receive a stimuists. suck as randarnization or matching. we inust be certain that the results are nonspurious. a properly conducted experilneat call provide a ct~nvincingtest of a hypothesis that one variahle causes has a causal effect on-another.) Finally. Then a procedure catted "matching" is used to divide the subjects into two groups that have very similar distributions on the dependent variable. Thus. If they are significantly different in the way predicted by the hypothesis. Finaliy. Under those circumstances it is necessary to use a pretest to rneasure the dependent variable. The other group. If they were assiglled to grt. the criterion of time order is clearly satisfied by the fact that the stimulus (independent variable) is applied before the posttest measures the depelldent variable. we might take as our subjects ail of the incoming freshmen at a college . often called the experimental or treatment group. called the colztrol grow.34 Research I>wign subjects in the experiment is too small. Second. it is useful to see how the three causaliq criteria are met. The subjects in the first group. That is why it is so important that the sstbjetlts be assigned to groups by an appropriate method. the results of the two groups' ppasttests are coxnpared. then we could not be sure that any difference between groups was caused by the stixnulus.) First of all.

and instrwtional events such as lectures. at least temporariIy. It is important to emphasize that m a n i p ~ f a f i of u ~subjects is a rlecessary part of any true experiment. race. so the fact that likely to choose t ~ enroll. Typically researchers conducting an experiment advel-tise for people willing to spend a few hours of their time at a specified location participating in a study in exchange for a . At the end of the semesler. Using the ~zniversity" corrtputer.make that decision.one year. Such manipufation is necessary because self-selection would probably yield two groups tl-rat would not be identical in their political interest initially. which is the posttest in this experiment. we require every freshman to fill out a questionnaire that asks a list of questions about their interest in politics. it is rarely possible to involve anythir~glike a sa~rtpleof the general public in. an experiment. The questionnaire. We cannot change a person's gender. rather than allowing them tc-. a r ather events. for we can manipulate. whereas those in the other group (the control group) are not allowed to take the course. such as wars. If the experimental groupthe group that took PS 101-has a lzigher average score than the controt group. age. One of the biggest problems is that it is difficult or impossible to trtanjpulate trtany independent variables. then we conclude that PS 101 caused greater interest. is structured such that tl-re responses yield a score reflecting degree of political interest. we had to tell students wl-rether or not they would take tl-re course. we randomly separate them into two groups. Whereas nonexperirnexltal researchers usually make a careful effort to use random samples of the entire adult populaticrn for surveys. We schedule one group (tl-re experimental group) to take the course (let's call it PS IM). they are also subject to a number of practical limitations. Another problem with experimentation is a lack of representative saxnples. Indeed. economic conditisns. elections. the use of experimentation in political science has largely been iimited to investigations of communications. Nor can we manipulate larger social phenomena. news reports. a r rnany atlzer social characteristics or people" beliefs or attitudes. advertising. Xn the PS 101 example. In fact. confirming our hypothesis. individuals' exposure to sucl-t stimuli as campaign speeches. Although true experiments are generally considered to be the best test of hypotheses. students who have more interest in politics are more t in American Government. they have more interest after taking the course than tl-rose wllo did not take the course would prove ~lothingin itself.

since the relevant population consisted only of college studems. and human beings cannot he as closely controlled as Laboratory animals. Consider the typical. A related probkm is that of outside influences. But if tl-re experiment runs far weeks or months. Indeed. expriments do somethit~gto subjects that they might not otherwise experience. in which subjects are only observed. as in tl-re 13S 101 example. but possil?ie dangers must aiwa y s be considered. It is often a diternxna far the researcher as to wl-rether to construct a Iixnited. knowing that they will have to fill out a questionnaire afcerward. where stimuli usually are limited to c. might affect surne subjects. ethical considerations are of particular concern in human experimentation. Hence we can never he completely sure about wl-rether the effects observed in the experimental situation would be the same in real life. well-controlled experiment in a higMy artificial setting or to use a real-world setting over a longer period a i d run the risk of havixlg external influences affect the outcome. Indeed. It is seldosrl a serious probiexn in political science experiments. then this corrcerrt is minimized. such as corrversations. and even some psycholr~gicalresearch. presumably with minimal or no disturbance to them. there are innumerable possibilities for other influences to exert an effect and contaminate the experiment. Thus it is always possible that other stimuli. In the PS I01 example this was not a problem. surrounded by strangers. This is obviously a serious consideration in biological. and personal experiences. medical. Finally. the experiment may require watching material about politics by people who would never expose themselves to such stimuli on their own. situation in experiments on effects of the mass media: Most people do not usually watch television in a strange place. news events.r>mmui~icatioils. where stimuii or other experimental conditions (suck as the withholding of medical treatment) could be very harmful. Another freq~zentproEtlexrl is that experimeaits often are conducted in an artificial setting. Most experirnents in political science use hurnan beings as subjects. as it might well be in a highlf artificial setting. but this inevitably will excfude large segments of the population. federal law requires that researcfi invoivitlg human subjects undertaken by any institution receiving federal funds (wi~ichincludes almost all colleges and universities) he approved by . Unlike other research designs. If the time between the stimulus and the posttest is minimal.36 Research I>wign mtrdest fee.

A better name might he the before-and-after design. including survey research.a local panek (The rule even extends to nr)nexperimeiital research involving any contact with individuals.) Despite all these potentiaf prc. .sctorial desigfs. A solution to this problem is the Solomon four-group design. The Solomon four-group design is actually a version of the f. The preceding discussion should serve to point out that aithough experimellts are logically the best way to fulfil1 the causalit)i criteria. An example would be a study on the effect of: precinct-level campaigning in which one group of subjects were exposed to politicat appeals only by Democrats. 13asttestcomparison can then determine the effect of the pretest as well as that of the stimulus. One addresses the possibility that giving a pretest inay have an effect on the subjects. so that each possible combination of stirnuli can be applied. Indeed. the logic of a11 experirnent is the same. and a control group that received no appeals. A number of variations in experimental design expand on the si~rtpleclassic model to circn~rtventsome of the potential problems. The experiment i s simply done two or more times with different subjects.>blems. one by both parties. This is an unfortrli~ateiahel as it is not a true experiment. that alone may increase their interest or affect their opinions and thus potentially influence their responses on the posttest given an hour or two later. hut it is [>hen used without any such references. in many situations they are nut the best choice of research design. every method has its limitations. It can be presented in rnuch the same terms as a true experiment. If the subjects are initially given a yuestiuilnaire on some political topic. which is used when there i s rnore than one stimulus (and thus mare than one independent variable) or difkring levels of the same stimulus. one only by Republicans. once with pretests and once without. f c x that is clre essence: comparison of the dependent variable belore and after the independent variable has heten applied.experimentation does have consideratlte merit as a technique for testing hypotheses. The Quaii Eperiment (Natgral Experiment) The second type of research design i s comrrtoniy called the quasi experiment or natural experiment. Regardless of the number of groups and combination of stimuli. in which the experiment is done twice.

It is a significant difference in change between groups that would lead to a conclusion that the independerit variable influences the dependent variable. Figure 3. and therefore any differences between the grou ys in the extent of change frcrrn pretest to posttest is presumed to result from tl-re stixnulus. Second. diagrams the quasi-experimental design. that might influence the dependent variable l-rave had their effects on all subjects at the time af the pretest. But what about the other two criteria? The criterion of time order is met. Rather. The hypothesis to be tested is that watching a presidential debate increases intensity of support for candidates.. the criterion of cova&tz'tzn is met in this desig~l. so that t11e amount of change can be measured for each group. that is. First. the subjects are not assigned to groups. Tile quasi experiment relies on the assumption that all of the other possible factors. the quasi experiment requires a pretest af the dependent variable.We can observe whether the stimulus fix. the independent variable. Admittedly. but it differs in two vital ways.2. as this before-and-after design always includes a rneasure of the dependent variable after the stixnulus-and so we always know that the independent variahle came before the dep e d e n t variable. which is the essence o f a true experiment. Thus the quasi experiment lacks manipulation of the independent variable. the two groups may be (and ~zsually are) quite different from one anather in many respects. But in the quasi-experimentd design. they are given a survey that measures . But it makes possible the testing of causal hypotheses in situations where a true experiment would be difficult or even imyr>ssibfe. as is the case in a true experiment. Before the debate. But what about the criterion of nonspuriousness? A true experiment assures nonspurious results by starting out with identical experimentai and cnlltrol groups. randomly assigned groups will be identical. The subjects are students enrolled in large sections of an introductory political science course.1. It does look similar to the classic experiment.2 also outlines an example af a quasi experiment tl-rat is similar to the example of a classic experiment in Figure 3.38 Research I>wign Figure 3. this assumption is something we can be less sure about than the principle that large. the independent variable) is associated with a different amouxlt of chaxlge in the dependent varia ble. In this way. kaiown and unknown. we observe which subjects have something happen to them and then go hack and sort them into the experi~xtenta!and control groups.

a variet): of studies over the years. a second survey is administered. do not generally make voters favor one candidate over the . it seems. have generally confirmed this hypothesis. including one by the author using this design.FIGURE-. they are sorted after ~t is known whrcls experienced the sr~mulu?. Tncidemall?i.2 The quasi-cxpcrimcntal dcsigrl and an cxarnpIc A.3. f Stimulus 130srtesr f"feresr (Delsendcnt (Independent (Deperldent Varta ble) VdriabIe) Stimulus (Delsendcnt (Independent Variable) VartabIe) + r \\ "JI f"feresr Compute Change Compare Change 130srtesr Compute ' (ll"eper~dent Change %nabre) B. again asking for strength of preference and also asking whether or not the student watched the debate. The surveys include a coded means of identification so that the results of an individual's pretest can be compared with his or her posttest while guaranteeing confidentiality or anonymity. An Example: Hypothesis: Watching a presidenral debaee increases itltensit-y of support. it is possible to calculate whether the intensicy of candidate preferences increased more in those who saw the debate (the experimental group) than in those who missed the debate (the control grttup). The Quasi-experimental Llesign Subjects are not ass~gncdto groups m advance. With matched pretests and postrests in hand. After the debate. Presidential debates. including which catldidate they prefer and how strongly they hold that preference. Stimulus f'rereur (Report (Intensity of sumart) watchmg debate) Suhjecn: all studerlts tn a ctass ot support) Conzpare \\ Stinzulus Pretest (Report (intensity of support) nor watchit: debate) their attitudes about the candidates.

Hawever. VVe know that although a person" gender or race might affect his or her vote. our knowledge of many subjects makes that determination fairly easy.3 presents a11 outline of this sirtlple procedure. it is also called a "crross-sectionaImdesign. they srrengthen the preference for the choice the voter has already made. A1tkougl-s. rathes. Hence. The correlational design attempts to meet the criterion of nonspuriousness by analyzing the effects of control variables.40 Research I>wign other. This method is nc>tas strong as that achieved by true experiments or even quasi experiments. can be used not just in this type of design but also in quasi experiments and in true experiments. Howeve4 as the discussion of independent and dependent variables in Chapter 2 poiltted out. It is on the criterion of time order that the correlational design is weakest.) It is bp far the 111ost common a ~ p r o a c hin political scieltce research. How does this si~npledesign fulfil1 the three criteria of carzstlfiw The extent of covariation is clearly deterrnined by rneasuring the extent of correlatioil between the independent and dependent variables. (For that reason. To avoid confmion.some correlational research may control for a considerable number of other factors. I-.'"thnt is. T h e Correlational Design The correlational design is very simple. because here we can control only k)ir those variables of which we are aware and can measure. we can never be sure that one must be the cause and the atl-rer the effect. Since no difference is required in the point in time when the indeyendent anif dependent variables are collected.Iowever. Figure 3. it is olren possible to ensure that some of the most prominent complicating hctors are not creating a spurious relationship between the independent and dependent variables. statistical measurements of the strength of the relationsl~ipbetween variables. it: shoutd be pointed out that "cr>rrelations. The correlational design differs from the quasi-experimental design in that it does not require any repeated measurements of a variable over time. it: could nor be tile other way around. At a hare minimum it requires only collecting data on an independent and a dependent variable and determining whether tl-rere is a pattern of relationship. it is usually advisable also to colfect data on other potentially relevant variables and statistically control for them. it is rlever possible to control for e v e ~ t h i n gthat rllight he relevant. although the correlational desigr~is funda~nentally .

it can be applied in any situation where data can be collected a n two a r mare variables. \ / *\. The units af analysis were counties within a state. The indepellde~ltvariable.The a~zthorwished to test the hypothesis that voter turnout is tower in urban areas.3). voter turnout. census data. Contrat for income.3.. it can provide considerable evidence of causality. Independent varia hle Correlation? Dependent variable K. Czontrat for respondent's party identif cation / Recall call-tpaign contact \\ C:arrelation? M.S. The Correlational I3csign Control variabtes L\ i . Here is an example ot: a carrelational design (also diagrammed in Figr~re3.Voted for contacting party weaker than the experimental and quasi-experiment4 designs. An example: Hyporl~esis:Campaign contact afiects voter. And since it does not require any manipulation or even continued measurernenrs over time. When these two figr~reswere analyzed. was operationalized as the percentage of population Iivirlg in "iurhan places" according to U. party competition. etc.3 The corrclationaI dcsigrl and exaxnpIcs A. the . h . Urbanization Voter tumout C . urbanization. The dependent variable. education age.FIGURE-. was simply the number of votes cast divided by the votitlg-age population. An example: Hypothesis: Voter turnout is Iower in urban arcas.

the relationship between urbanization and turnout was only slightly dirninisl~ed(Monroe 1977). The independent variable was measured by a survey question that asked whether the respondem remembered being contacted by any workers from either of the political parties before the election. What had happened was that party workers tended to contact vtlters who had supported their party in the past. ail availabie from published sources. Analysis of these two variables revealed a definite pattern. percentage in professic~naland managerial occ~tpaticjns. median age.. the relationship between contact and vote disappeared. But did this mean that door-to-door contact really affected votes? When the respondents>party identification (i. which were almost entirety urban. the analyst does not attempt to . Here is an example (also diagrarrtmed in Figure 3. for one reason or another. Correlational designs are frequently used in connection with data from surveys. Therefore. The dependent variable was the respondent" reported vote. Those people voted for the party of the contact. this example showed that such attempts to persuade voters rarely change their prekrences. median education. including median income. and those wlla had heard from the Demo~cratsusually vr~tedfor the Democratic candidate. but they would have anyway.42 Research I>wign relationship was vesy apparent.percentage nonwhite. Tl1e example also ilfustrates the importance of using control variables. and turnout declined as urbanization increased. urban and rural areas differ on many other characteristics known to he related to turnout. were used as control variables. But one trtight questiort whether it is realty urbanization that affects turnout. whether respondents identified themselves as Republicans.3) where a control variable proved to be important. percentage employed in manufacturing. ar independents) was used as a control variable. after all.When these other variable were controlled statistically (using multiple regressioil. Democrats. Respondents who recalled having been contacted by Republican workers tended to vote Republican. The researcher (Mramer 1970) wished to test the hypothesis that conta~tlngvoters in a doocto-door campaign caused them to vote for the party that rnade the contact. Like many other studies of campaigning. several other variables. and a measure of party cs~rtpetition. The ct~untieswith no urban population had the l-rigl-restturnout. Some correlational research reports can he found in which. a procedure that will be discussed in Chapter IQ). the Ic~westturnout was iil the metropolitail areas.e.

However. there is also a great deal of research in the literature of political and social science that does not meet the requirexnents of even a correlational design without control variables. The weakness of a case study is that it Iacks the ability to measure covariation. Methods of statistical controlling and their application to causal interpretation are presented in Cl~apter10. The results nevertheless have some value. but it may be quite ernpirical. l. However. such work is descriptive and may serve to increase our knowledge.control for any variables. Even il a case study could determine causality in some way. It is suggested that you attempt to write these designs hefore you look at the answers. our ability to draw any cr~nclusionsabout causaliq between the variables is more limited. we have no way of knowialg what the outcome would have been if conditions and actions had beexi differexit. case studies and other. hut since only one case is studied. Propose a hypothesis m d a research design of the type specified. sometimes in great depth. in which the history of a particular event is recounted and analyzed. hut it cannot "'prove" anything in a scientific sense. There many examples of lengthy studies on how particular policy decisions were made. Although there are a great number of vtariatic~ilson these three basic types of design as well as ways of combinkg them. similar types of research can be valut~biebecause they may suggest research questions and hypotheses to which more rigorous designs involving larger numbers of cases can be applied. Their authors seek to shed some tight on why those decisions were reached. Write an experimental desigil for the research question "Dc3ets negative political campaigning decrease voter turnout?'" . Often this research does not invc~fve: quantitative data (though it could do so). Exercises Suggested answers follow the exercise questions. because tlzey tell us that two variables da occur together. An example of such descriptive work is the case stgdy. Essentially. its conclusions would not he generalizations.

When Congress allawed states to increase speed limits on interstate highways. Write a quasi-experiment4 design for the research question ""Boes increasing speed limits increase the number of traffic fataiities? 3. Write a correlatiollal design for the research question ""Does election day registration lead to higher voter tt~rnout?" 13roposel-rypothesesand write research designs of each type for the research question ""Do the efforts of precinct workers contacting voters drrring a campaign g a k votes for their party" candidates?'" 1. the subjects are asked if they intend to vote in the Senate election or x~ot. Write a quasi-experimental design h r this question.S. 2. that is. it makes critical comments about the cmdidate's opporrent. Write an experi~nentaldesign for this question. ( 1994). They are randomly assigned to tlze experimentai and control groups. The hypothesis is that increasinlg speed limits inrcreases highway fatdicies. The percentages of each group intending to vote are tl-ren compared. some states did so and . This experimental design was used by Ansolabehere et ai.44 Research I>wign 2. The hypatl-resis is that exposure to negative advertisernents will decrease tl-re intention to vote. The control group watches a tape with the same content except that a nonpoliticai product commercial has been inserted instead of the political ad. the researchers also iwestigated the sarne research question with a quasi-experirnencal design using aggegate data. Afterward. Senate candidate tl-rat is ""negative" in nature. Suggested Answers to Exercises 1. Write a correlational desigr~for this question. 3. The experimental group is shown a videotape of a recent local newscast into which has been inserted an advertisement far a U. Subjects are recruited by advertisements and offered $15 to participate in a stucly of iwal news. 2 .

The hypothesis is that voters who recalt having been contacted by a campaign worker for a candidate will be more . if so. The percentages voting for the Democratic candidate supported by the campaign workers is then compared for the two groups. The cl-ranges in death rates from pretest to posttest Eor the two gn(>upsare then compared.others did not. and whether it was a southerr-r state or not. 3. The dependent varia bIe is the percentage of voting-age population casting batlots in tl-re 1996 presidential election. The posttest is the traffic fatality rate in each state during the first year that some increased the limit. 2. The hypothesis is that election day voter registration results in higher voter t u r n ~ u t . degree of party competition. controlling for other characteristics of each state's population.The units of analysis are states. the postcesr is administered by using a tetepho~le survey asking whether each person in the sampte voted and. A random sample of registered voters is selected. and the s m p l e is rmdomly divided into experimental and control groups. This makes a quasi-experimental design possible. States are then divided into tvvo groups: those that increased the speed Limit dtlring the next year and those that did not. Xmmediately after the election. fur whom they voted. median age. percentage living in ~lrbanareas. The l-rypothesis is tl-rat people contacted by someone working for a candidate will be xnore likely to vote for the candidate. The pretest is the traffic fatality rate in each state during the last year that the speed limit was SS mifes per hour in aII states. includitlg medial1 years of education. 1. Tl-re independent variable is whether or not a state had election day voter registration in 1496. xnedian hmity income. The relationship between these tvvo variables is analyzed. Workers go to the homes of voters in the experimental group and give a piece of Democratic party campaign literature to the selected voter arid deliver a short speech asking for support for the candidate b r Gongess* Those in the control groups receive a nonpartisan brochure and message urging them to vote.

and median age. The relatiomhip between these two variables is analyzed.iicanand Democratic precinct committee members fro117 a random sample of precincts in a state at the time of an election. and all respondents are asked their voting ilatenticzn in the coming election for governor. A number of studies have used this sort of design. includinf: Katz and Eldersveld (1961j and Cutright (1963). the better that party will do in the etection. mtlst have found that precina campaigning had oniy a small impact on the vote. the same individuals are interviewed and asked for whom they voted. The data are then analyzed to see whether there was greater cl~ange amoilg those who were contacted by either party. A panet survey is conducted three months beiore a gubernatorial election. including median income. percentage in professionat and managerial employment. A random sample of registered voters is selected. is measured by surveying botlz the RepuI?. 3. They are asked haw much time they put in during the campaip. or not contacted. . The independent variable. The l-rypathesis is that the more time put in by precinct workers fc~ra party during an eiection campaign. Tlze dependent variable is the Republican percentage of the vote for a minor office in each precinct. contacted by both parties. 'The voting intention fi-om the first survey for each individual is c o ~ ~ p a r to e dhis or her response from the postelection survey to see whether there was arty change.3C. worker time. percentage nonwhite.Research I>wign likely to vote for that candidate. But the design proposed here is a quasi-experimental design because the dependent variable (voting intention) is measured both before and after the independent variable (possible contact by a party worker) is measured. They are also asked if they recalf havirtg been personally contacted by workers for either candidate. Immediately after the electian. controlling for otlzer clzaracteriscics of the precinct available from census data. Note that this is similar to the research by Krarner (1970) used as an example of a correlational design in Figure 3. and the net advantage in time to Republicans over the D e ~ ~ o c r a tiss computed for each precinct.

Data xnight be defined as empirical observations of. one or more zilnriables for a rr~mberof cases. this is not necessarily the case.The examples of operalional definitions presented in Chapter 2 included several that were based oil published data frorrl a reference source. Tl~iscllapter introduces some of the major published sources of data that political scientists use in their researctl and suggests some strategies for discovering other sources. Although we usually think of data as numerical. because it is rare that even a very well funded project would allow the researcher to travel to many cities or states. a technique for turning verbal messages into quantitative data. medium. collected acrordil. or low-but since the infc~mationfc~und in published sorirces often csncerns groups or aggregates. we must construct our operational definitions in terms of the data available. the data . Republican.Published Data Sources How do we get the data rlecessary to execute our research designs and test hypotheses? Often it is possibie to use information others have collected and made available to the public. Catholic. Northeastern. The chapter concludes with a description of content analysis. An explanation of the term d n a is needed here.tg t o the same opercltional definitio~s. let alone to aII the nations of the world. Many variables are actually a record of which category a case falls into-for example. P-iaving some familiarity with what kinds of data are available and where they might be found makes this task less difficult. high. to collect information first-hand. This is fortunar-e. When we have to rely on existing sources for our data.

however. it is necessary first to make sure that the information is reported for the particular unit of analysis needed. districts. but keep in mind that they may have changed. (The Internet addresses cited here were accurate at the tirne of this writing. In planning a research project that will use published data. although searching for data over the -Internet offers the advantage of not having to travel to a library. such as nations.) Data obtained h a m the Internet should be used with caution. which generatly can be found in a library or. Much of such data is reported by geographic or pc>litical units.Published Data . Second. the choice of unit of analysis is vitally imporcam in planning a research project. on what can be placed on the Inter~~et. for several reasons. One is that since there is virtually nt->limitation. is likely to be mucl-r less tirne consuming than randornly searching Web sites.Sozarccs are in x~urnericalterms. This is especially true for research that relies on published data. counties. actually going to a research library (s~tchas most college and universitjr libraries). municipalities. some Internet addresses are nt->tedthat can provide access to such sources. The Internet as Data Source This chapter is mainly collcertled with published data. found there that rnay be l-righfy misleading. armed with the kind of background provided in this chapter. if not completely inaccurate. should have rnade clear. cexisus tracts. states. and precincts. legal or practhere are ""data" to be tical. In the saxnpling of data sources presented here. Probabiy the safest strategy would be to limit one" use of the Internet for research purposes to those sites that contain inforrnation such as government documents and standard reference books of the type one would find in the library. Often a given reference book includes data on . usually as totals or in strine standardized form such as percentages or averages. The Importance of Units of Analysis As the discussion of hypotl-reses and variables in Chapter 2. A major advantage of searching the Internet for data is the possibility of finding informatio~~ that is more up-to-date than printed data. increasingly9on the -Internet. as these data sources usually are organized by type of unit of analysis.

is reacliity avaita bie. one must be careful to avoid the ecological fallacy: Do not attempt to draw conclusions about individuals from aggregate data. For example. Fur exaxnple. such as in terms of percemges. political. the presentation of major sources of data below is organized not only by the substantive type of data but also by the units for which the data are reported.Published Data Sozdrces 49 many different kinds of variables (economic. corporations. published sowces provide little i h r mation of relevance to political research about ordinary people as individuals. though there is a great deal about groups of per~ple. aggregate data usually are meaningf~rlonly if they are standardized in some way. two reminders of points made in Chapter 2 might be useful here. The sowces suggested in this chapter are primarily of the type that would provide the information necessary for testing hypotheses. such as by dividing a total by the population of the unit of analysis to produce the percentage or per capita figure. Congress. such as stavs or cities. political parties. . your search would be much more time consuming. Second. And "individuals" h the sense of unit of analysis can include goverilment agencies. it is sometimes necessary to collect such inlormation nor from a library but through an original survey. Most published data relevant to political research are aggregate data. First. Aggregate data ofren are akeady in an appropriate standardized form. but not always. the methodoIogy of which is presented in Chapter 5. Tlzerebre. preferably almost all of tfzem. Hence the sources sugested here report data for many cases. rnainly where tl-re individuals are not ordinary peopie. Usually the researcher can convert the data into a useful form. Therefore. you obviously need to find sources that report these data for a large number of nations. dara on a number of personal characteristics of members of the U. but soEBe are irtdividual. to name only a few institutions o n which published data can be found. But in general. Therefure. that is. they report summary figures on the population of geographic a r political units. including their individual votes on bills.S. social) but only for a single kind of unit. if you wish to test a hypothesis about the relationship between the per capita income of nations and their level of voter turnout. and unions. and often for d l possible cases. Most published data are aggregate. and you might wel! find that different sources use somewhat different definitions. If you had to reiy on individual sources fur each nation.

include information on so many topics that not ail would be inc1w&d in the catalog. the easier your search will be. when you find one reference sowce. Many of the most important collections of data. G i n Familiarity with Major SourceThe mare ftzmiliarity you have with the important sources. such as the Statistical! Abstract of the United States (discussed below). such as states.Sozarccs 50 The following sectiorzs of the chapter. ALtbough this is an appropriate resource for finding books that discuss research topics. They would not be helpfwfin locating research findings or generally doing the background Iiterature review rlecessary to formulate a research question.Published Data . whether you read them it1 the library or at an Jnternet site. Given the way libraries are organized. you may well find similar and possibly mtrre useful sorzrces nearby As was emphasized in Chapter I. because you can see wl-rere otl-rers found their information. it is importa~ltto review past research literature when fomulating your research questions and hypotheses. However. to get . Here are some tips that might lead you to what you need more quickly. This chapter is intended to provide the begir~ningsof that familiarity. Jn additiorz. it is just a sampling to get you started. arranged by type of inforrnation and unit of analysis. so information on cities or nations would not be useful fc~ryou. Strategies for Finding Data Sources The resorirce to which many students turn first to find idormation in a library is the subject catalttg. Note also that the sources Listed here are suggested only as places to find data. you will probably be interested only in a particular unit of analysis. are intended to introduce you to a few of the published data sources frequently used in political science research. it is nut necessarily the most promising for locating data sources a n those topics. The Iiterature review is also useful for loating data. This tells you what was avaihabie and where it was found.

. This is important for two reasuns. Your questions are likely to be better received if you have thought out exactly what yori need. Consult Librarians or Other "Expert. because fibraries often catalog this material in different ways from other publications. often you will. . Recorditzg complete information is particularly important for Xnterrlet sites. typically a journal article. Take (rcurefgl Note of the Soulz-6.and some material rnay be available only 0x3 inicrotitm or micrt~ficlne. But he receptive for sugestions on alternative indicators for your variables. Your iibrary also may have databases on CD-ROAMS. the title. and more importailt. you may need to check the origind source of its datca for more detailed information.Published Data Sozdrces 51 this information. Most college and university libraries have personnel who specialize in different subject areas. consult the library staff. Second. including the unit of analysis. you may need to consult tl-rat source again. rather than relying on a summar)r. 123-1241. need to go to the original report. Even when you have located a reference source. First. Faculty members are mother source of expertise.c You F h d Once you do find information that may fill your research needs.S. including all of the information about the publication.(" When at a loss for where to find inforrnatiorz on a particnlar type of variable. be sure to write down just where you found it. and tl-re date as well as tl-re exact site address and tl-re date you accessed it f Scott and Garrison 1998. Although bibliographic formats for citing electroilic sources have not yet been staildardized. Muck help is available if you ask far it. Cansuiting the library staff may he particularly important when using U. suck as exactly how the variables were defined. it is certainly necessary to include the author (if available). They have a great deal of experience with subjects in their disciplir~esand rnay be able to point you directly to the source you need. such as you might find in a textbook.so advice frt~ma staff member is partict~larlyuseful for the uninitiated. any research you present using those data will require a full citati~rzof the source. government documents.

Also worthy of melltion is the World Almanac. and other nations. it is also the most widely available reference book.S. More detailed inforrnation can be found in other UN volumes such as the Demogmphk Y e ~ r hook.The Statistical A h ~ t r a c tof the United States. graphic. and UNESCO Statistical Yearbook. economic.S. g u v e m e n t publications. . governxnent agencieschttpz fedstats. published annut~llyby the U. Iatrrrret sit-f.:Fedstats is an on-line source that provides access to statistical reports from many U. and government spending. For the world as a whole and the nations as units.S. the primary sources are publications by the United Nations. econoxnic.S.gov>.S. Although most of the information in the Statistical Abstract comes frorn the U. Department of Coxnmerce. it includes xnaterial lrom a wide variety ol private sources as well. The America~Sriazisdw I ~ d e xis a comprehensive guide to data found in inost U. employment. artd social-for the United States as a whole and for tl-re fifty states as well as a limited amount of inforrnation on U. demoincludes data o n a wide variety of variables-political. incl~~ding economic and social indicators-data such as income. and is reasonably priced and sold on newsstands. The sources are preselited in terms Of units of analysis reported. Sd-atzstz'cal "Yearbook.Sozarccs Some General Data Sources A few sources encompass a number of categories of both types of data and r~nitsof ax~a'iysis. metropolitan areas. it allows searches by subject matter as well as by geogaphic. The Wc~rld Almarrac reports information on an enorxnous nuxnber of topics. Brtreau of the Census and sther government agencies. literacy rates. and demographic categories. Demographic Data This section lists some sources of data on general population characteristics.Published Data . age. m 4 o r cities. The most general source is the United Nations Yearbook. and the latest edition will include some information more recent than other published books. which has been privately published every year for over a century. race.

S. The census af the United States is conducted every ten years. but often p=setitir-rgthem in a more convenient farm. and World Econo1. A number of other international agencies publish statistics on naic The International Monetar). Intt. it is always possible that there are considerable irraccuracies in solBe of the data.S.Two overall volumes cover the x~atior~ as a whole and by state: U.S. including counties and xnunicipaiities. particularly e c o n o ~ ~indicators. U.a Book. A number of private pubiications also report these kinds of data. Morgan" State R ~ x n k i ~ g ~ .arislicalA ~ S L ~ofCthe I' United States. Polibicnl H a ~ d b o o ka f t h e World. whether by design or by accident. Fund (IMF) publishes the lnterniational Financial Sli. demographic inhnnation is the US. A list of scjurces for c~thernatiotls can he found in The Statistiat Abstract of the Ul-zitsd States. Therehre. The Organization for Econt~micCooperation and Development ( QECD) publishes the annual Economic O~tloi>k. Social and Economic Characteristiw. Soxnewhat easier to use is the county and City Dat. States and LOL-alitZeS The most convenient and coxnprel-rensive source for dexnographic.S. cconoxnic. which contains similar data for those units. Bureau of the Census.iricDafa. The Wc~rlciBank publishes the World Develczgf~entReport and World Tables.7. atld social data for staees is the S~.Published Data Sozdrces 53 Note that the information on individual x~atior~s in these (and most other sources) is compiled from reports submitted by the governments of those nations. which includes a number of widely used variables for all counties and larger cities in every state. usually drawing them from the more ofGcial sources. and each census produces a set of vofu~nes. described earlier. gov>. Privately published reference books for demc~graphicdata on states and units within them include the Alfifanac of the Fifty States and Katlzleen 0. General Population ChariacteriPstics and U. Examples include the annual Sr~atennan'sYearbook. Separate volumes Eor each state provide more detailed breakdowns for units within the state. . which reports it in a nurnber of publications. The basic source of almost all U.rrlet size: The site fc~ron-line cetisus data is qhttp: cesus. tions.ltistics Yearbook. and the State and Metropolitan Area Data Book.

Data an U. Jodice. Kenneth janda" Political Parties contains data evaluating parties and related topics for fifty-three nations. which are.S. and irregular executive transfers is Charles L. Particularly valuabie for its data 0x1 variabies such as assassinations.S. politicai rights.ls Yearbook. O'Leary3 Political Risk Yearbook. and the Ipzfernational Yearbook and Statesman2 W h o S W h o . World Enclyclopedia of Political S y s t e t ~ sand Parties. The l~ternatiulzalAlmanac oJ Electoral History. h a n g the possibfe sources that report some of this political information are the Politic~alHandbook of the World. and Military Balance.Sozarccs Political and Governmenr. Vi. democracy. l Williarn D. based on information reported by the rlations themselves. as noted earlier.Nierni.S. Cr>plin and Mictzael M. This sort of data is generally not found in United Nations publications. offers up-to-date assessments and predictions about likely political and economic conditioils in all nations. federal government as well as state and loca1 units. Wcjrld Military Expendztures and Arms Transfers. Congre-~s and the Presidency As American political scientists have probably devoted more time to studgillg the U. Of considerable interest to students of international politics are data on milicary and defetlse activities. Congress than any other institution. and civil liberties. This is particularly true of indicators that might be used to measure variables such as political instability. Government and Politics This section lists a few of the most useful sources for finding infarmation on the branches of the U. a vast . though hardly compreheilsive. Sources for this sort of data include Ruth Silvard. World Arufamct3nts and Disl-krmanzct3n.Published Data . Taylor and David A.. The largest collection of international voting results data is Thornas T. Mackie and Richard Rose. the Statesman's Yearbook. One geileral.t~alStatbfics 0%A~ntrricnvrPolilics. source is Harsld W Stanley and Richard 6.ll Data for Nations This section lists a few sources of infc~rmationabout the governmental structure and politics fcjr a large number of nations. World Military and Soczal Expenditures. wl~ickiis designed for undergraduate students. World Handbook o(Politic7al and S o c i ~ Irtdiccltors.

which is similar to CQ Weeky Report but concentrates sornewhat rnore on. The biennial Pi?l'itics in America provides profiles of mernbers m d el-reir districts. Open Secre&: The DolEur Power of PACs in Coggress. Ornstein's Vital Statkttcs Co15gressassembles many useful sets of variables. B~lreauof the Census called Population arzd Housing Characteristics far Congressional Districts. the Congrwstonal Record is large and not particularly well organized. their districts. which includes personal data on every member of Congress. districts that appear in maliy of the aforementioned sources is a publication of the U. A competing weekly publication is the Nationai Jourvtal. To track down the content and status of hills currently under consideration. The ultimate source fur the data on congressional.Published Data Sozdrces 55 xiumber of sources of data are available 0x1 the two houses. more useful for most research projects. The mtlst important referefices on C~ngressare the vario~zspublications of Corlgressional Quarterly. their votes. the researcher may consult a Commerce Ctearing House publication. a measure of how often Congress has agreed with the administration. If your research deals with past years. Particularly useful is the bienxiial Al-ma~zacofA8"tterican Politics. . which includes news stories on what is l-rappening in Congress and in gowmmeitt and politics generally as well as the votes of each member on biifs and important procedural questions. the annual Congressio~alQuar&rly Ajmanac compiles much of the weekly information systematically. their campaign finances. Cortgress alzd the N i l t i o ~is a set of books that compiles information over many years. However. the Congressional Index. Bibby and Norman J. and the districts tl-rey represent. and a nurnber of private publications are usuall). and ratillgs of their voting records by interest groups. their members. The mast basic source for Coltgress is the Cclrzgressional Record. More detailed data on campaign finance may be found in the Almanac of Federat' PACs arid Larry Makinsoxi arid Joshua Coldstein. John F. Congressional: Quarterly has long provided measures such as the presidential support score. published every day Cotigress is in session. The Congressiovtisf Record reports everything said on the floor (and text that is inserred "into the record'" but was not said) as well as all of the votes cast by individual members. the executive branclt. The basic source is the C Q Weekly Report. There are malty other private publications on Congress. which presents data in separate volumes for each state. Inc.S.

published by the Council of State Covernments. Walter Dean Ballots.36-1 842 has presidential results Bumham" Preszde~~iclJ by counties.X~otlse. fc~rthe presidench the Senate. Balachax~dranand S. The mast general source for data on state governments is the annual Book of the Stdks. Congressional Quarterly's Cude m U. 2 8. and the House-are refativeiy easy to find. Garwood. I ~ t e r n e tsites: Information on the two houses of Congess. BaXachar~dran's State and Local Statistics Soz-lrces provides a detailed Listing. Kendra A.clerkwetn. may be forzlxd at ~I-xttp://uvww. S a t e R a n k t ~ g s .56 Published Data . Almanac of the Fifty States.senateegovr. Many of the general sources cited above. Morgan. Other sources include Coilgressional Quarterly's Guide t o the Presiderscy and Lyn Ragsdale. r e p r t s vr>tesfOr federal offices and governor by county. and Alfred N. Eleiticms reports statewde and district figures for these offices since 1824. published every two years since 1956.S.g~vr and <http:llwww. including the Satistical Al~stract. Most state governments publish reports on each election for statewide and state legislative elections for the district and county level. Hovey and HaroXd A. Other sources include Kathlcen 0. the basic sowce is the M u ~ i c i p a lYearhook* Results of federal eilections-that is. The Statistical Abstract ir-rcludes a list of major state sources. More derailed information may require rekreltce to publications from ir-rdividual states. such as the CQ Weekly Report and the A l ~ ~ a n aare c also very useful far information on the presidenr. including documents and votes fc>r recent years. Results for state and focal elections are rnore problematic. CQS Stage Fact Finder: Rankings Across America. and M. which deals mainly with spending. The World Alrntzrt~zcprovides county-by-county returns for recent presidential elections. For local governments. The America Votes series.Sozarccs Many of the sources cited above for Crjngress.also provide some state-level data. Hovey. For . America at the Polk does the same at the state level for tile earlier years of the twentieth century. Viul Statistics ciln the Presidency.

. Page and Shapiro 1983. a topically arrangcd list of survey questions. which provides a breakdown of the responses to each question by a standard set of demographic variables. Floris W moll% A AAnzeric~~z ~ Profile reports results from a nurnber of questions repeated from 1972 to 1989 in surveys by the Natioilai Opinion Researcfr Genter. most researchers are nat in a position to conduct their own surveys on a large scale and must instead make use of the results of surveys conducted by others.g. which begin with l 9 8 l data. E x a ~ ~ p l of ses of how presidential popularity changes over time je.. Survey Data Although political science research frequently relies on survey data. A number of other sources are available. Although published results of srtrveys from sources such as those cited above are necessarily aggregated. must then consult the D a t ~ a. microfiche collection of survey reports from a wide variety of sources. typically one must turn to Local sources. The largest body of pubtished survey resuits is fc~undin the American Public Q p i ~ i o nIndm and the accompanying Americiarl P ~ b l i cOpirtion Datu. Hastings" Igdex to International Publr:c O p i ~ i o n(annual since 1978) reports surveys from the United States and many other nations. The Galiap Poil is a set of volumes going hack to X935 reporting all Gallup surveys in a more limited form* Elizabeth Hann Hastings and Philtip K. Edwards 1983). Sometimes election results are published in local newspapers shortly after the election.Published Data Sozdrces 57 smaller llnits. Mueller 1973. it is especially important to make sure that the data can be obtained before proceeding any further. Tf you are contemplating a project that would require such localized election data. Mor~roe$5398). But for precinct returns it may well he necessav to go to the city or csunty office responsible h r administering elections to obtain such inbrmation. The Gallup Poll publishes The Ciallup Report (monthly since X 53651. they can be used as sources of data for research designs that compare the results of different e s this type of research include the many analysurveys.g+. The Igdex is just that. such as wards and precincts. There is also a body of research that uses results of surveys from many sources and coi~birtesthis with data on governxnent policy decisions to assess the relationship between public opinion and public policy (e. To find out the answers to a question cited in the I ~ d e xone .

But often researchers in the social sciences wish to make use of information structured very differently. thus allowing them to test hypotl-reses about individual behavior. such as the text of speeches. an organization to which most uiliversities and many colleges belong.galIup. htrnlz.nsd. The National Election S t ~ ~ d i ediscussed s. Indeed. but can also be used in conjunction with answers to spen-ended q~zestionson surveys.uib.ed~-nes.no/cessda/namer.irss. ? r ~ rby ~ ~objectively e~ am! sysztmnticc7EEy identif5ti~zgspecified cCf~7racteristics of messdges" "erelson 19"7it). Content Analysis The sources cited in the previous sections provide information that is already in the fcjrm rleeded for data analysis s r can be turned into a data set relatively easily.. Textual data can be analy zed quantitatively througl-2 content analysis.edu>. a Iarge part of the research oil voting behavior in the United States since 19413 is based on the National Election Studies ( N E S )conducted every two years by tl-re Institute for Social Research at the University of Michigan.. the Roger Center <http://www. news articles. including the use of statistical analysis? Xn fact it is. The TCPSR also archives the results of hundreds of other surveys as well as other data sets. The complete set of NES survey data from 1948 to 199"7s available on CD-RO:V.unc. all available in computer-readable form. The ECPSR representative at a member institution should be contacted for frrrther information.Published Data . below. .uconn. and the Social Science Data Archives-Nl"ortt-2 Arrterica . Political scientists also make considerable use of the individuai responses to surveys conducted by others. Content: analysis is mast ctmmoniy associated with published verbal texts.edul>.corn>.rogercenter. may be consulted at *~http://www~umi&. This method has been defined as "any technique fir m~zkirzg i n f .Sozarccs 58 Jatenzet site: Recent survey results from the Gallup Poll may be found at <http://www.=http:llwww. or o t h e r documents. Data files containing the answers gven by individual respondents to each of these extensive surveys are distrib~zted through the Inter-University Cansortiurn for Political and Social Research (ECPSR).pri~-~~etc1n~edul-ahe1sc~nl index>. The Qdu~rtXnstitute at the University of North Carolina <http:l/www. Is it possible to analyze such material in the same objective and systematic way as aggregate data. Otl~ersites include the Princetoil Srrrvey Research Center *1http:Nwww.

and countless studies of news media content (e. but we can systexnatically analyze what they wrote in speeches. but they are particularly important wl~enplanning content analysis. history. cornxnunications. It has been used by researchers in many fields. but some deserve particular ernpl-rasis. not a type of research design. It is obviausly appropriate and often essential if the research question deals with content itself. we cannot interview the popniation from past generations. research question. One is the importance of having a clear theoretical framework.Published Data Sozdrces 59 Content analysis was developed in the early twentieth century and was first used for the analysis of newspapers.. For instance. and education as well as all of the social sciences. 1963). [inguistics. These are highly advisable for any kind of research.. The steps that must be taken in a content analysis are the same as those in any other scientific investigation. Robinson and Sheehax~$983). and the other is that newspapers tend to give more favorsthle . Content analysis is a valrtable research tool that should not be overir~okedin planning a research project. and political party platforms (Pomper 19&0). particularly during World War II. letters. Later it was applied t o propaganda. Ilewspapers. All of the usual stages in the research process apply when ~zsingcontent analysis. content analysis can be used in conjunction with any of the research designs presented in Chapter 3.g. speeches by presidents. Content analysis is a datld collection method. Indeed. content analysis will he illustrated with the example of a simple research yuestion: Do newspapers give better coverage to incumbent candidates than to challe~~gers? This question rnight produce two hypotheses. Patterson f 980. and other documerrts. But content analysis is also valuable as an indirect measure in situations where more direct observational methods camot be used. Steps in Content Analysis In the following explanation. because faiiure to d o so could mean that the whole process of analyzing a large amount of textual material is wasted effort. and hypotheses. including literature. but they have some slightiy different twists. One is that newspapers tend to give more coverage to incumbent candidates far local office. Examples from pc-~liticaiscience include the analysis of diplomatic messages (North et a1. such as the question of whether news coverage of a political caxnpaign is biased.

In tl-ris case we can. only papers with a circulation over a certain number. For our example. We can do frequency counts on the occurrence of individual words. the context in which a word is used is so important that longer units are frequently needed. P o ~ ~ p e r serltence could be classified o n a n u ~ ~ b of . that is. Rather. Finallj~~ we must specify the time period to he c m ered. it is the segFEent of content for which data o n the variables wilE be collected.t the liecording Unit The recording unit is not necessarily the same as tl-re unit of analysis that the hypotl-resis would seem to imply. such as how many times an individual's name is me. we must specify wllat kind of stories we will analyze. However. all daily papers.In our example. we are obviously interested in newspaper stories about candidates. it miglzt be from May to the Novernkr election in a particular year.. content analysis is s o ~ ~ e w hdifferent at from other data coIlection metl~ods.tltioned. Sele6. Trr this respect. or only one particular paper? Our decision would be based on the arnourlt of time and effort we can devote to the content analysis as well as on how accessible the papers are to us. as discussed below.60 Published Data . Each e r variables. we might: select all stories about candidates in any general elections for courltJr offices. such the party affiliation of each candidate for the eoffices we are studying. We would also need to control for other potentially relevant variables.000-and then take a sample of that population-say. papers in a single state. In this example. These hypotheses csuld be tested with a correlational design. f econd. Define the Population We must first define tl-re population. all daily newspapers in the United States with a circulation of over 50. tl-rere is the sentence (or possibly the i~ldepelldentclause in a compound sentence). Since we are not interested in everything prirtted irt those papers. define a Iarge: population-say. but in which newspapers-air newspapers. The smallest recordirlg unit in content analysis is the word.because verbal texts can be divided several different ways.Sozarccs coverage tct incumbents. a random sample of twenty of those newspapers. specify the b o b of content to which we wiSh t o 6r(?~em&e.

it would also be the story or sqmsnt. a relatively objective and unambiguous variable. Themes might be used as recording units in analyzing. The choice of unit depends greatly on tl-re type of content to be analyzed as well as on the research question to be investigated.Published Data Sozdrces 6l (1980) used the sentence as a unit in his analysis of Republican and Democratic platforrns froxn 1948 to 1 976. h r example. Although an item can be of any length.s and seconds. We can measure the quantity of newspaper coverage in terms of the nurnber of words or the 1errgtl-r of the story in coluxnn inches. What constitutes an item can vary greatly depending on the type of comxnunication being studied. far most purposes very long iterns. a single book. But there are several ways to operationalite each. These examples are just a sampling of the ways verbal content can be divided for the purposes of analysis. An analpis of television entertainment programs. In our two hypotheses. in news broadcasts. that is. we would select each story about candidates for coumy office as our recording unit. might well use the program as the recording unit. meailing a whofe unit of communication. such as whether the story . Broadcast news stories are usually measured in terms of time. the independent variable is whether the candidate was an i~~cumbent or a challenger. The length-of-. are problematic because of the difficulty of classifying such large bodies of content. Identfi and Operationully Defi~zethe Variables Next come the variables. In the example of newspaper coverage of local elections. minutc. such as whole books. and we might wish to use more than one. A theme is rather bard to define. the story is typically selected. we might find it useful to measure other strucrural attributes as well. The qutlntity of coverage is an exaxnple of a st-iuctural characteristic of a message. The dependent variables are the quantity of coverage and the quality of coverage.story measure we select becomes our operational definition of quantity In our newspaper example. The must commonly used recording unit is the item. With newspapers. but more typically we woufd record the occurrence and frequency of a particular theme within each recording unit. Another possilsle unit is the theme. it might be described as any occurrence of a particular idea that we are interested in. such as one investigating the axnount of violence depicted.

partisanship. In tlze case of these newspapers. but with content analysis it is usually a simple process. hut we might not have the resources to analyze all of the local campaign stories over a sixmonth period. as that would give us the same day of the week every time. More useful would be first to specify the catqories we will use to evaluate each story. Instead we c m take a random sample of those stories. and the page number. either by using a random rlumher table or simply by taking every sixth day. we know that they are published each day.62 Published Data . if only to rnake it possiHe to check for errors in data collection. we have already decided t o look at a sample of twenv daily newspapers. The other dependent variable. involves the sgbstarttive characteristics of a message. negative. the date. VVe would have to know. we could identifr the cornrnon categories of commentary about local candidates-experience.Sozarccs appeared on the front page or whether it was accompanied by a picture of the candidate. ""hoesty" would be a positive persona! referetlee. and neutral toward the candidate in question. and issues. so we could take a random sample of thirty days from each paper. In our example. quality of coverage. We would also need to record wl-rich candidate and office was the subject of the story. Sample the Pop# lution Whetl-rer or not we Ir~okat all of the content in the population we have defined is a question of how much time and other resources are available. persmai attributes. but tl-ris can be difficult to do. and it would be advisable to keep a record of which newspaper it appeared in. For example. We should then attempt to specify the kind of wc~rdsand phrases that would qualify for each subcategory. (It would not be advisable to take every seventll day. who a11 of the possible candidates were and which were incumbents. as we usually can identify all of the possible text materid and specify where to find it. preferably in advance. We might attempt simply to classify each campaign story as positive or negative toward the candidate. plus the inevitable "iotf-ter.'"ach of these categories would then be subdivided into comments that were positive.) . Randonl sampling is discussed in Chapter S in connection with survey research. After reading a good number of stories.

One is simply to record whether or not there were any rekrences such as. as rneasured by the number of positive and negative comments each received. Contingency analysis wt~uldenable us to coxnpare incumbent candidates and challengers on the quantity of coverage each received. would be to record the rlttmber of rekrences in each category. . First of all is frequency analysis. Analyze the Data It is now passible to test our hypotheses.Published Data Sozdrces Glkect the D a t ~ We would then be ready to go through the selected issues of the newspapers. such as issues and experience. the information from our coding sheets can be entered into an appropriate computer program for analysis. frequency analysis would tell us such things as how much coverage the newspapers gave to the local campaigns and the extent to which it concentrated on the different categories of evaluation. Tjfpically this entails simply tabulating how often different variables occur. We would record that inlormation for each story we found about a local catrtpaignthis is referred to as coding. like m y other data. In our example. such as a sheet of payer that lists each variable. We could also control for the party of the candidate and the particular office being contested (Chapter 50). The metl-rods of statistical analysis to be used wilt be described in later chapters. There are two ways to record the data on the various categories c>f pc~sitiveand negative coverage. but more valual>le.These analyses could be conducted for each newspaper as well as for tl-re sample as a whole. Slightly more time consuming. When we have finally gone through all of the selected newspapers and csded all of the relevant data. which is another name fur multivariate statistics (Chapters 8 a1-d 9). as well as the quality. as measured both in the number of stories artd in their length in column inches. It would be advisable to prepare a form for the data collection. can he evaluated in two general ways. for exmple. including all categories of the quality of the coverage. positive comrrtents on experience. anotl-rer name for univariate sta~istics (Chapter 4 ) . we would have to perfor~n contingency anafysis. But CO test our hyyocheses. but we can preview some of' it now. Data prc~ducedby content analysis.

Pomper (1380) not only used the content analysis of party platbrxns to catalog the promises rnade by the parties hut also used documentary sources to determine the extent to which those promises were fulfilled in later years.Published Data .Sozarccs Issues in Content Analysis An inherent problem in any content analysis. This is particularly important when a content analysis seeks to draw conclusions about the effects of communications. It is also important to make as clear as possible what kinds of words and phrases should be included in each category Finally. Your task is to devise an . that is. when the results of the content analysis are presented. Even if one individual will be doing all of the data colfection. particularly that of the substantive varieth is objectivity. it is val~lableto incorporate data from different sources.. A decisiorz as to whether or not a particular word or phrase fafis into one of our categories is often somewhat subjective. the unit of analysis is given. one. Exercises Answers to the exercises follow.intg the coding at that moment. First of all. The solution is to have more than one person csde the same subsample of text and then compare their resuits to see whether they coded the same material in the same way. it may depend on the personal j~ldgmentof the person dt. it is important to include as many examples as possible of how actual statements were coded. 1st using content analysis. this is particularly a problem when several people are intvolved in the data collection. Thus researchers such as 13atterson (1980) and Graber (1988) have combined surveys of individuals with content analysis of the news coverage to which their responderlts were exposed. Although this problexn cannot be avoided entirely. Follt~witlgare several variables that might appear in hypotheses. there are some steps that can be taken to minimize it. For each. as with many other methods of dam collection. the same approach could be used by having several other people code some of the same material to see if there are any subjectivity probfexns. It is suggested that you attempt to formulate solutions before fookirrg at the answers. The extent of the similarity of their decisions is called intercoder reliability and can be evaluated by several statistical measures.

Military expenditures as a percentage of each nation" grc~ss national product in 1996 (or latest year available). Source: US. 3998 (Washinrgton. representative's voting record 4. 1. The levei of mass political participation in U.S. 45-47. 3. 1998). 2 .S. Liberalism of a U.S. The exact data source should he cited with csmplete bibfiogragflic information.) . DC: Wcjrld Pric~rities. This datca source should be one that would provide the information for all or most of the possible cases.19961. 298. f uccess of a U. it is necessary to actually look at that source to see exactly what informa tion is available.nshington.Published Data Sozdrces 65 operational definitiorl based olx a published data source.The Almanac of Amertcapl Politics 2000 (Wi. BC: Nationai Journal. Statktical Abstruct of the U ~ i t e dStates. Bureau of the Census. The rating given to each representative's voting record by the interest group Americans for Democratic Action in 1994. fn order to do this. (Data on individual representatives are found throughout the bhook. 1 9 9 ) . Economic development of a nation 5.S. DC: U. Source: Ruth Leger Sivard. World Military arzd Sockl Expendilures. Government Printing Office. 1996 (Washington. Milit21ry spending of a nation 3. states 2. The percentage of the population eighteen years of age and older in each state castiw votes for presidential electors in 1996. Source: Michaei Barc~neand Grant If~ifusa. president in dealing with Congress Propose a research design using content analysis that could be used to investigate the research questions 'Tb what extent have American party platlorxns increased their attention to the problexn of crime over the years?" a d 'TElave Republican platforms given mtrre attention to crime than Democratic platfC~rmshave?" Suggested Answers to Exercises 1.

lines. The content analysis could be conducted in several.66 Published Data . The unit of analysis would he the Reprtblican and Democratic platforms since 1960. . words. in which case one wauld count the number of sentetlees in which some reference to crime appears."s and that Republican platforms tend to give more attention to crime than Dcrnocratic piathrms. Source: t.) The hypotheses to he tested could he that parties have gven more attetltion to crime since 1980 than they did in the 1960s and 19". or inches.Sozarccs 4. 760-861. Averai&-epercentage total Mouse and Senate concurrence. Alternatively. The recording unit could be the sentence. Source: The World Almarzac and Book of Facts. (These data are available only from 1953 on. The per capita gross domestic p r o d u ~ tJGDP) of each nation. 5. the measureEnent should be standardized. 1998f. Whatever method is used. computed in comparison to the total number of sentences. 2 999 ( IWalnvvah. DG: Congressional Quarterly. 390-391. Vi.. revised edition (Washington. lines. This is important because party platforms vary in length. the texts of which can be found in the annual Congrassictnal Quarterly Alitnnnrlc fsr each presidential election year and also in tl-re CQ Weekly Report after each national party convention. it would then be possible to calculate whether relatively more attention was given to criine in later platforms than earlier and whether there was a differelice between the politicai parties. that is. NJ: Wc~rldAimatlac Books). (me could count the number of times the word "crime" (m a synonym) aypears or xneasure the length of the sections deaIi13g with crime (in words. generally increasing over the years.taE Statktz'cs the Presidency. or inches).yn Ragsdale. If these data were collected. ways.

Altkotlgh this is derntrnstrated by long experience with surveys. Survey interviews are used for large samples of the general population as weil as far specialized g r o u p such . Survey research is a such a cornrnon rnetliod of data coIXection-it is used not only in social science research hut also in political campaiglls and market research-that understanding how it is conducted is valuable for everyone. but they usually are. tl-re rationale far savnplil~gis mathematical.Survey Research Survey research. a pattern would . arsd recordi~gthe a. r asking qzresgiorrs. People sarnetimes express doubt that estimates based on only a tiny fraction.I$ hofders of govertlrrtent pr>sitions. based on probability theory Suppose you were faced with the task of determining the relative number of red m d black marbles in a very large basket. If you started to draw more marbles out of the basket. Tf you lcroked at only a single marble.000 out of a population of 209 million. such as election predictions. they ~zsesamples. can be accurate.'kea~~s t i l k i ~ ga sample o f a l ~ q e popuhtiort.rzswers. also called ""polling. or anything else. perhaps 2. that would tell you very little. Sampling Since researchers are us~talIIyinterested in drawing conclusions about poyuiatitlns that are so large that it would be impossible to interview ail of the individud members.The logic of sampling is tl-re same wl-retlzer one is selecting citizens for a survey. laboratory animals for experimental and control groups.

it is possible to draw a sample of ten red marbles or even a bunclred. The researcher cannot select more marbles of one color on purpose.5 percent sure that a sample would be off by no more than 3. for a sample of 1. Even if the basket is evenly divided in color. (This relationship occurs because the amount of sampling error is proportional to the square root of sample size.000 should almost always come out between about 52 and $8 percent Republican. Furthemore.000 cases unless there is some special need. they will rend to approximate the characteristics of that population. the percentages of red and black would resemble those of the whole basket. such as a desire to ~ b t a i naccurate trteasurernents for sultsamples of the population. If we were taking a survey of how people had voted in an election in which the total vote was 5. the emore accurate the measurement is likely to be. however. A frequently asked question is ""How large should a sample be?" As nc~tedabove. Such considerations are necessary to assure a ""random sample.1 are based on several assumptions.1 illustrates this principle. For accuracy. Increasing the size of small samples considerably increases accuracy.) I-fowever.000. (On average-----S0percent of the time-we would expect to not be off' more than about one percentage point. As the sample pew. and the basket should be well mixed beforehand.68 Survey R wearch tend to emerge. In other words. the most important of which is that a simple random sample is used. and no black marbles. the relationship between sarnple size and accuracy is not a straight line. The paint of this example is that if sufficiently Large random samples are taken horn a populatitril.1 shows. tkorrgh that is extremely unlikely. Hence even well-financed commercial surveys rarely exceed 2. By the time you had drawn 100 marbles. this process must be free of bias." Note that the results are a matter vf chance.5 percent Republican. The column t~eaded "95% Confidence Interval" &sht>wsthe maximum amouilt of error a sample would make 95 percent of the time. .) Note that the figures in Table 5." "but this requires some qualification. then a saxnple of 1. the dis~fibutionof these samples takes the form of a normal disfl'ibaidkn-a bell-shaped curve-which allows us to estimate the accuracy of a given sample. As Figure 5. we could be 9. but the relative gains di~ninishwith larger samples. the considerable costs of survey research are directly proportional to the number of interviews conducted. the proportions would remain fairly constant hut would come closer and closer to the proportions of the total.1 percentage points in either direction. The larger the sample size. Table 5. the ailswer is ""the larger the better.

" that is. One factor that makes little difference is the size of the population from which the sarnple i s drawn.Survey Research 69 TABL. SampIes of a few hundred or even fewer c m be quite useful for many research yurposes. tl-zen we could number them and use a random llunlber table to select the needed sample. then there are many ways of selecting such a sample. in which a random start- . A variation that produces essentially the same result is the systemtic sr-znzpk. bm unless the sample size is one half or more of the population size. I Sample Size and Accuracy Sample stzc 95% Confidence Xntervai f NOTE: These figures assuxne simple rarldorn samplir~gfrom an it&nitely large population of a characteristic lzeltl by one-half the poptlfation. ff our population is the students enrofted at a particular university. A simple or pure rundvm siznzpk i s a sample taken by a inethod ensuring that each mgnzber of a population has an equal chance of be&wselected. the gain in accriracp is very small. a csEnputer csuld readily perform the same frznction. If we have a list of all of the members of a population. nineteen tixnes out of twenty (another way of expressing 95 percent). Keep in mind also that the ranges sl-rown in Table 5.1 are what could be considered the ""maxixnuxn error. the survey wiII be more accurate than the intervat s h o w . Sampling can he dune in several different ways. The name of each student could be placed on a slip of paper and the saxnpie drawn from tbe figurative hat.E S . It i s true that a saiinpie of any given size taken from a single city wifL be more accurate than one drawn from the whole world.

mul~Y was developed for large surveys using pertistage C ~ B S ~ LsampI2'ng sonal interviews. if a list of the members of a population is available. Cluster sampling involves sampling of geographic areas dawn to the city block." individuals select . In the "street corner sample99heinterviewer stands in a public piace and questions whoever will stop. such lists are not available. it is easy to select a random sample. that is. so a survey that employs it. Random and cluster samples arc both probabiIity samples. such as the Gallup poll. whereby tellephoile numbers are randomly coilstructed from the range of possible n~zmbers. Jn the ""strawpoll.000 Sample size ing point is used and then every tenth name (or every hundredth. Large-scale telephone surveys that use random digit dialing.actually use a fcjm of cluster sampii~~g of area codes and exchanges. Hawevet. cluster sampling is somewhat less efficient than pure random saxnpling. Fur technical reasons. A number of other methods are used that Qanot meet tl-ris test. if the sample is to be drawn from the general population of the nation. ezier3l case in the popcejatiorz has ia know%cltlia~ceof selectiarz.OQQ. resulting in the selection of a nuxnber of '"clusters" around the country where interviewing is done. needs a sampIe of as many as 1. Becmse of that and other practical considerations. or whatever ir-rcrement is needed) is chosen. In short. or even h0117 a particular city.Survey R wearch 70 FIGURE 5.1 Sall-tple size and accuracy 0 20f2 200 300 400 500 600 700 800 90f2 1.500 respondents to achieve the accuracy level of a pure random sample of I.

Telephc~nesurveys also offer the advantages being conducted more quickly. in which iilterviewers approach people leaving the pc~llingplace. There are two ways that people can be asked questioils. Personal interviews are generally considered to result in a higher quality of measurement than telephone interviews. may appear to be a variation of ""street corner sampling. This can be done in a personal (or Eace-toface) interview. and so most surveys in recent decades have been done by telept~one.Some degree of bias is built into this metl-rod. The exit polls conducted by the television networks since 1980 appear t o be highly accurate. in which the respondent reads the questions and records his or her own answers. Respondents in personal interviews have been found to be somewhat rnore at ease. since some people do not have telephones. However. The "exit polls'konducted by journalists on election day. and each is commonly done by two different methc~ds. it is possible to select a reasona bly representa tive sample." but they avoid tile usual bias of chat ayyroacl~in that everyone wl-ro is voting that day (aside from those casting absentee ballots) must leave a polling place. in c ~ ~ l f p a r i with s ~ n personaI interviews. to understand questions better9 and to be rnore likely to express preferences. at least in their estimates of election outcomes. personal interviews conducted by going door-to-door are extremely expensive. Personal interviews can he longer than telephone interviews. presenting fewer problems of access (such as respondents unwilling to open their doors to strangers).Survey Research 71 themselves to be respondents. One versiorz of the fatter is the practice of encouraging people to pl-rone in to express their opinions. and they are not used for serious research. An alternative rneans of conducting a survey is the selfadministered srarvq. One problem with this . and visual displays can be shown to the respondent. By sampling precincts and usinrg a predetermined formula br what proportion of voters should be approached. the interviewer reads the question and records tile response.In intervieweradmir~istered surveys. or over the telephone. a i d allowing more callbacks to households where 110 one was horrte. Neither of these has any guarantee of relative accuracy. academic or athewise.. usually in the respondent's home. but today tl-ris is a relatively small problem.

the greater the probable bias in sample selection. the mait survey is not a good approach far this population. Another common method o f conducting a self-administered survey is to use a captive pupulatiurz. Even tl-ren.administered questionnaire. . The advantage of using a captive population is that it is inexpensive. Mail surveys can be mtrre useful in researching specialized populations. that is. they may he those with more intense feeli~lgs about the survey's general. it can be quire useful if the research question deals with a specific group whose members are available and willing to filf out a survey questionnaire. and the project will necessarily take several weeks or months. The most common example would be a classroom of stndelrts. One method of conhcting self-administered surveys is to mail the questionnaires out and hope that the respondents return them. The I o w a the response rate. People attending a meeting and employees on the job are other possibilities. In these circuxnstances a list of the population is available and those sampled Iikeiy have greater interest and possi hly a hove-average reading levels. The self-administered survey also has a potential sarnpling problem. since cssrtplete lists ot the general population are not available. topic* Respo~~se rates can be increased by including a cash payment or calling respondents to encourage their participation. However. but such steps erode the cost advantages of self-admirristered surveys. questionnaires are sent by first class inail addressed to a specific respondelrt. This mealls that some pote~ltialrespondents will not he able to resyo~rdat all to a self-. Those who do choose to parricipa~einay well be different from tl-rose who do not. a group that is assembled for soxne other purpose and over whom the researcl-rer has some millimal control. However.72 Survey R wearch method is that a significant proportion of the adult population of the United States (as high as 30 percent by soxne estimates) has a low reading level. a well-done mait survey requires sending one or more additional waves of surveys and followup reminders to those who have not responded. The great disadvantage is that this method can never resuit in a random sample or even a representative sample of the whole population. for example. such as members of an organized group or occupation. feadirlg to higher response rates. and many otl-rers will be reluctant to do so or not understand the q uestions. The great disadvantage of this approach is that response rates are typically very fow. In a well-done mail survey.

or other dichotoxnies being tl-re simplest. Most pe[~PIwvvilf make choices on long lists of typical yes-or-no questions even if they have no prekrevices 017 those topics. to be presented to respondents. In an effort to measure more precise degrees of intensity. various kinds of visual scales can be employed. or strongly disagree?" When it is possible to show visual aids to respondents. In other words.Survey Research Writing Survey Items The most critical step in survey research is writing the questions. discussed in Chapter 4. These are summarized in Box 5. The latter process is actually a form of content analysis. Whatever the format. But i f they are given open-ended items.in which respondents are given all of the possible answers. The case can be made that open-ended questions are often better h r xneasuring tile opinions. or i&ms. so that anyone's opinion would fall into one of them. This is nut because closed-ended questions are better measurements. and the categories must cover alt possibilities. and oj>en-encklrl'. with the yes-orno. '"Bo you strongly agree. their real feelings can be expressed. This means that there is a reasoElable expectation that most of the population to be saiixlyled has some howledge of the subject matter and terminoiogp to be used. even if the resolution refers to a . process. There are a number of common prohle~xlsin the csnstructio~~ of survey items. the answers to a closedended question should meet two criteria: They must be mutanaliy exclusive and collectzvely exhaustive. Most surveys consist of closed-ended items. Asking members of the general public whether they favor passage of House Resolution 1314 is silly. but becase they are easier and less costly to administer. There are two basic types of questions: close~d-ended.) One of the most important considerations is that respotlitlmts m11st be competent to answer a question.for example.1 along with examples and how the problems might be corrected. and concerns of respondents. in whch respondents indicate where along the scale their opinions fall. attitudes. disagree. agree-or-disagree. agree. the axlswers sl-roufd not overlap. (Mditional examples can be ft2und in the exercise at the end of this chapter. and analyze. more complex sets of choices can be ~zsed. Closed-ended items can take a variety of kxrns.in which respondents are given a more general question and asked to articnlate their own answers. The problem with open-ended items is that it is more difficult far the intenriewer to record the responses and fc~rthe analyst to classify the responses into categories for tabulation.

" Because some respondents are eager to agree with an interviewer. ""Do you agree with this prop o s a l l ' k e are ""leading'9hern into a positive response. it is especialty irnparcant to xnake clear that negative responses are acceptable. it is harder for the respondent to understand what is Lteing asked. In this way. Asking whether the death pellalty should be used for ""bloodthirsty killers who torture their innocexit victims" is inappropriate and unnecessary. but even with personal: knowledge. Another technique is to use a flter questiorr?whereby respondents are first asked whether they are familiar with a topic. hut the solution in such cases is to set forth the details. Tf a question is long and coxnplicated. short and s i m l e items are best. such as associating a political figure with a substanthe policy proposai. but interviewers should always be ready to accept it as a response and not attempt to force a choice. and then ask a simple yrrestion. some topics are more complicated a1-d require more exylartation. . F-fence it is necessary to include phrases such as ""d you agree or disagree. h o t h e r rule is to rrever stutc qmestions in the negative.74 Survey R wearch prominent issue. If we ask respondents only. the problem of bias can be lntrre subtle when any csntroversial individual or gri~upis unnecessarily introduced into a question. 111 survey questions. Admittedly. it is permissible and often advisable to present a surnmary of a proposal before asking about preferences.'" "would you say we should or should lot." "Id0 you f a v ~ r[or oppose. However. in several sentences if necessary. as we cannot assume that most people know such things as tfte amount of incorne tax their family paid last year or the population of their own community. Most surveys do not customarily present (ino opinion" to the respondent as a possible choice. An obvious requisite is to avoid crsing any binsed or e~fotionali kangzkiage in survey questio~ls. Although such extreme emotionalism is ntjt iikely to be used. asking ""Do you agree or disagree that the United States should not reduce its contributirons to the United Nations?" is likely to he csnfusing to the respondent. The problem of competency arises not only with technical knowledge. A common pitfall in writing survey items is failure to avoid leading questions-items that f2il to present all of the possible alternatives. all respondents are being asked about the same subject.The choice of wording should be as neutral as possible so that tfte phrasing of the question does not sway the respo~ldentto one side. For exaxnple.

with Examples 2 . Respondent must be competent to answer. Wro~sg:"Do you favor or oppose the United States continu- ing to waste your hard-earned tax dollars on foreign aid?9' Better: " D o you think that the aEnount of money the United States spends on foreign aid should he increased. or remain the same?" 3. Short and simple questions are best. Wro~sg: you agree tlzat there should he term lixnits for all elective c~ffices? " Better: " D o you agree or disagree with the idea that there sl-routd be term limits for all elective offices?" 4.BOX 5. Wrc)ng:" D o ycju think Section 14-B of the I947 TaftHartIey Act should be repealed or not?" Better: ""At the present time. Avoid biased or emotiorzal language. Wrong: "Would you favor or oppose the idea that all empir~yersbe required to provide health insurance for all their employees meeting certain minimum staildards. "Avoid leading questions. Wc~uIdyou favor or oppose this idea? 5. states can prohibit contracts that require w r k e r s to join a ~znion. Do not state questions in the negative. with the goverrzment providing health insurance klr peopie who are unemployed?" Better: ""I has k e n proposed that all employers be required to provide health insurance for all their e m p k e e s meeting certain rninixnurn standards.1 Rtrfes for Writing Survey Iterns. decreased.Wc3uld you favor or oppose taking away a state" power to prohibit such contracts? 2. The government would provide health insurance Eor people who are unemplcryed. Wrong: ""Toyou think the United States should not decrease its invofvernerit in Bosnla or not?" .

but they were of a different race.tle-barreledquestions. Would this bother you or x~ot?'" 9. Avoid dr>ul. should the new school t ~ something r eke?" be named Central OF NOTPI? . Wrong: ""DO you favor (12 t)ppOse the prt>ptxal to improve edticatit~n?" Better: "It has been proposed tlzat all public scllools test children in the third and sixth grades and the senior year in high school to make sure they have learned what they should. Avoid ambiguous questions. Would you favor or oppose this idea?" S. Minimize threats.Better: "Do you think the United States should decrease its involvement in Bosnia or keep it at the current level? " 6. TXryo~g:"1s ideological proximity more important in your electmal decisit~nmakingthan fiscal c~~ilsiderati~ns?~' Better: "Which is more important to you in deciding how to vote-how liberal or conservative a candidate is. Avoid unhmiliar language. Wro~zg:((Should Central High Schoof and North HighSchool be merged and the new school be named Central or not? Better: "'Do you agree OF disagree with the proposal to merge Central Higlz School and North High School? X I the two schools were merged. TXryo~g:"Do YOU want to keep black people our: of your neigtzbcjrh~~c~d? '' Better: "'Suppose a family who had about the same income and education as you were going to move into your neighborhuad. or how the candidate stailds on taxes and spending?" 7 .

threafi shotjld be avoided. h e way to help ensure that questions are clearly worded and unlikely to be confusing to respondents is to try the questions a number of times hefore adrnin- . This is a matter not only of tlze wording but aiso of the substance of the qrzesrion. Language familiar to almost everyone can be substituted. It m s t be clear to the respondent just what the question is about." "recidivism.nJanzilirlr to the avemgc person. '90you think that the United States should reduce fc~reign aid and spend the money on welfare here at homel'TThese subjects can and should be covered in two separate questions. Ambzguous questions must be avoided. Writing good survey irems is a combination of good communication skills and experience. If a technical term cannot be avoided. Some survey questions may be threatening to respondents. asking whether a person watched the presidential candidate debates may seem to imply that they were not good citizens if they did not. An ambiguous question is one that could have more than one meatling. then it must he explained. This problem can occur with less controversial topics as well. more likely. asking someone a question using the aphorism that "politics makes strange bedfellows" migl~tcause some respondents to come up with some very interesting interpretations today. A c o m m ~ t lreason h r ambiguity is vagueness.Survey Research 77 h ~ b v i t consideratit~n ~~~s is vocabulary used: Never w e ''big9' worrl's that w~ukII!be t. "Were you able to watch the debates or not?" This offers an implied excuse for those who did not watch. A final rule is avoid do~lrk-barreledqmestions. and it extracts the same inlormatioil. These are i t e m that ar-tempt to get one answer to two different questions. there is a risk that the respondent wilt refuse to answer or. but certainly not in a survey In almost all cases. For instance. o r at least mzutivutixed." and "philanthropic"' might be appropriate in a college classroom. such as use of dangerous or illegal substances or exhibiting racial prejudice. for example. The threat in this case could he reduced by asking. Terms such as ""ideological. For example. Even a reference to such familiar phrases as "'Right to Life" a t ~ d"Freedom of Choice" might be misinterpreted if it was unclear whether the question concerned abortion. be less than honest. When asking a bout whether the respandent engages in socially unaccepta bte bebavior.

Even if'the precise topic is not covered in another survey. Exercises Following are soxne survey questions. Identify the problems in each and then write an improved version of the question that would avoid the problems.)ff your survey uses the same wording as another survey has used. Arenk yyou concerned about the state of the economy and in hvor of the bcziariced budget amendment? 2. or should we rely on bilateral negutia tions? 8. f ndeed. Do you think that those money-hungry tobacco companies should be severely puaisl~edfor killing all those innocent people? 6. I. you may gain the added advantage of comparing your results with those from a differerit sample. even novices can draw clcl the experience of others by looking at questions that have been used in other surveys. each of which contains one or more of the common probterns discussed in this chaptea. in well-do~~e surveys researchers ofken select a samyfe of actual respondents far a pretest and conduct a small-scale survey in the same way they proposed for the actual project.Survey R wearch 715 i s ~ r i n gthe final: version. Should the United States use retaliatory tariff barriers tc-. similar wording can Often be adopted. Do you agree that the death penalty should nc>the used as a punisl-rment for murder? .reduce our balance of payinents deficit. Do you think we should do more to reduce crixne? 3. This is not to say that all published surveys. As for experience. but they offer a good stafting paint fc~rthe researcher in training. Do you think that people should be allowed to do things that are not good for them or not? 4. Do you agree or disagree that we stloufd not get involved in the situatioil in Kosovo? 5. are well written. cornrnercial and acadexnic. (~Wanyof the sources of survey data presented in Chapter 4 include the wording of qrzestic>ns. Which candidates for county office did you vote for in the election? '7.

This is a leadirlg question and it is double-barreled Improved: 'V-Iow concerned are you about the state of the economy today-would you say that you are very concerned. where most of the people are of Aibas~ianancestry and where the Serbian government has been accrised of kiIling civilians.. as the respol-rdent would not know what kinds of ""things" are being considered. Improved: "Would you firvor or oppose imposing heavy fines on tobacco companies to cover the costs of health care h r people who smoked cigarettes?" . or not very concerned at all?"" "'Do you favor or oppose the idea: of an amendment to the U. the trtore liberal they tend tct be on social issues. Tmproved: 'T~~outd it be a gaod idea or a bad idea if smoking cigarettes were made ijlegal??' 4. Improved: ""Asyou may have heard. This is an ambiguous question. somewhat concerned. Do you think that the United States should send troops to try to keep the peace in the area or nat?" 5. and operational definitions of all variables (these wilf be the survey questions you would ask). This is an ambiguous question. saiirtpling method. details of the survey (population. as there are many proposals on this topic. sarnple size.S. there is a section of tile former Yugoslavia called Kosovo. The question includes emotional language and it is leading.Survey Research Suppose that you wished to test the hypothesis that the more education people have. Suggested Answers to Exercises l. Propose a research design using survey research to test tl-ris hypothesis. Improved: "Do you fa:vor or sppose Icjnger prison sentences as a means to reduce crime?" 3. Uou should specify the type of design you would use. and interviewing inethod). Constitution that would require a balanced budget every year?'' 2 . This question is stated in the negative and also map raise questions of coxnpetency to answer. as respondents may not he familiar with this situation is1 the former Yugoslavia.

Improved: What should the United Seates do about the trade i~nbalancethat comes af our buying more from atl-rer countries than we sell to them-should we raise our taxes cm goods we import or should we try to work it out with those countries? f3. This question uses unfamiliar language. the dependent variable is the individual's degree of social Iiberalism. Improved: ""Didyou happen to vote in the election last November for Sheriff?" "Did you vote I'c~rJohn Smith.500. attend college. and religion. Would you favor a r appose making it illegal to discrixninate against hiring someone because he or she was a homosexual? 3. graduate from high school. social status. Would you favor a r appose a constitutional amendment that would allow prayer in the public schools? 4. This is a leading qriestion and is stated in the negative. tl-re Republican. government make a payment to all African Americans to make up for what they suffered as a result of slavery in the United States. How far did you go in school-did you attend high school. The pvpuiatlon to he surveyed wouid he the adult population of the United States. Improved: "Do you agree a r disagree that the death penalty should be used as a punishment for murder?" C. The respoildent" seducation would be determined by asking. or graduate from college? Social liberalism could be determined by askirtg the following questions: 1. race. The most appropriate design here would be a correlational design in wl-rich the independent variable is an individual" education. T t has been proposed that the US. Would you hvor or oppose this? . Would you favor or oppose adoption af a constitutional amendment that would make abortion illegal under any circu~nstances? 2. the Democrat?" 7. or Bill jones. and control variables are the inrdividnal's age. The data ctluld be obtained by means of telephone survey using random digit diaiing with a sample size of 1.. because they would probably not remember their votes.80 Survey R wearch Respondents wouid not be competent to answer this question.

black or African American. oppose). The control variables wc~uldbe measured by answers to the following questions: Age: HOWold are y ~ > u ? Social status: Wc~ufdyou describe yourself and your family as generally being in the upper class. Asian Axnerican. working class. oppose. Hispanic or Latino. oppose. or something else? . 3. rniddle class. favor. The answers to these questions would then be coded as to which was liberal ( l . 49 favor. or lower class? Race: Would you describe your racial or ethnic status as white. 5. Jewish. and each respondent h e n would be given a score equal to the number of liberal Rsponses. Would you favor or oppose stranger laws that would restrict the sale of yorn~grap~ly ?. or Native American? Religion: Is your religiorz Protestant. 2. Catholic.Survey Research 81 5.

This page intentionally left blank .

9. Chapters 8. There are many satistical measures. and interval data. the next step in the research process is to analyze those data in order to draw conclusir>ilsabout the hypothesis. and 10 wilt show you how ta compute several of them. that is. hegir-rning with some basic irlformation that is necessary to be able to use any statistical measures correctly. is the nominal level. we need statistics. Levels of Measurement The term Eevel o f n z e a s u r e ~ ~ erefers n ~ tc-. Mwever. This is particularly true in tlze social sciences. ordinal. Tb look over such a vast array of data to "xe" what is there would he a very difficult task. the bits of data are often numerous ir~deed. the feast precise. There are three levels of IneasureInent with which you need to be familiar: nominal. wlzere we may have stlrvey results a n dozens of questions from hundreds or even thousands of respondents.Statistics: An Introduction Once the observations of tlze variables in a hygotlzesis have been rnade and assernbied into a data set. This chapter presents an overview. The ""lowest" Ievef of measurement.the classifications or units that result when a variable has been operationally defined. 12 rtominal variable simply places each case into one . In order to evaluate our data and determille what patterlls are present.

Perat. 1 rnillian to 10 xnitlion. middte class. ordinal variables. and it is preferable to treat such variables as interval. and vote for president (Clinton. whether rank order OF ordered categories. These would be it~&rz~al values (discussed below). Ti. Marninal variables contain inforrnation on "what kind. For example. Ordinal category variables may also come directly from rneasures tkat do not have interval precision. and under I million. with California being first. For example. ordilzal variables rank cases in relation to each other. In the rest of this hook any references to ordinal variables will mean ordr?red catggoYiesr the more common form of an ordinal variable. and so on.The first. Dole. Hispanic. variables are put into categories-as are noirtirtal variables-h~zt the categories have an inhererst order. Asian. This can take two fc~rms. or wt~rking class. Note tkat it would make no sense to describe such variables in quantitative terms. none. For example. other. Note that these rank values do not carry as rnuch ir~formationas the actual population figures on which they are based would. survey respondents might be ranked in social class by asking them if they consider themsefves to be upper class. or other). may he described in quantitative terms. litllike nominal variables. other). puts the cases in exact order according to svrne characteristic.Examples would include an individtlalk raclallethnic stattts (African American. Native American. states could be grouped by population into categories such as aver 10 million. Catholic.'"'Eess sures wauld be silly. speak of "'more religion. A state that is railked tenth in population drxs not have twice as trtan)r people as the state ranked twentieth. we usually would need numerical measures of the actual quantity of the variable. With ordered catqordes. religious preference (Protestant. nut voting). This could be done by taking a variable for which numerical (interval) data are available and grouping the cases into categories. It is . Note that this sheds some of the information originally available.of several u~orderedc~tegories. mnk order. Rank order is not rnuch used in analysis for research pwposes." As the name implies.'"^ "more voting" 'from data t ~ nthese mearace. Jewish. wl-rite." not ""hc>wmuch. New Ycxk second. we could rank states in order of population. In order to get an exact ranking.

For example.or college. and unskilled manual) could be used as an ordinal measure of social status. skilled xnanual. similar level of measurexnent called a m~r'o s a l e ." "not ascertained. The addition of residual categories such as "dc)ri3t know. the total number of votes received by a candidate in a district or a person" annual income. high schr~ol.proper to say that some cases in a data set have more education tl-rall others." or ""other" will always cause the ordinal quality to be lost. it is imyortmt to rexnetnber that all categories xnust fir a pattern of high to iovv (crr low to high) on the variable. it will not be discussed here. Exercise A at the end of the chapter provides additional examples for you to test your understanding. But the application is coi~pticatedby the fact that there are two .000 and $10. There is also anotl-rer. The highest Ievel of measurement is the interval level. though which level applies in some actual cases may be debatabie.000. Or it may be a st. this problem may he avoided if the researcher is willing to exclude all such cases from the analysis.000 and $15.000 of income is the safBe as the difference beween $I 0. However.1 provides a number of examples of variables and their level of measurement. tlie addition of the categov of "farmers and farm laborers'kould render the level as only nominal. Jn determining whether a set of categories may be considered as ordinal. This xnay be an actual count? for example. even tl-rough education is measured only in tenns of grade school.'zndardized form. fn actual practice. the dirference between $5. such as the percentage of the district voting Wernocratic or the average income of families in a state. This means that not only may ir-rterval variables be described in quantitative terms ("the higher the income. Box 6. As the difference between interval and ratio levels is rarely importmt in social statistics. The census categories of sccupation (professional and managerial. clerical and sales. An i n ~ r v a l variable provides an exact rlurnher of whatever is being measured. the lower the percentage Wexnocratic" "j but also exact comparisons may be made. Rulc~fi~r Using Levels of Mm:turement These three levels of measurement are relatively simple concepts.

d ~ l l a r s ) * Voter turnout (as percentage of voting age population) e Perceiltage Ga tholic * Years of education * Crime rate (number of crimes per 100. newspapers. no opinion) Lowest rules that allow variables tc-. middle of the road. Senator Strorn Thurmox~dis first.1 Exampies of Level of Measurement Interval level. somewhat conservative.BOX 6. less developed) * Age ( Z 8-20. n o ~ ~ e ) * Party preference (Republican. keep at present level. military authoritarian. Rule I is that a tlavirzble may always be treated its a lower lezlel of measurement.) * Level of econoxnic development (developed. * Gross national product (in r~illionsof U S . very liberal) Moxnina l: e Region (Northeast. eliminate entirely) * Ideology (very conservative. other) e Source of political infc~rmation(television.be treated as other levels under certain circumstailces. independent.. and an vrdiilal variable . Democrae. decrease. 40-59. etc. newly industrialized.000 population) Ordinal: e Seniority in the Senate (as of this writing. radio. South.talking to others. r~agazines. monarchy. noile) * Opinion on gays in the military (allow. This means that an interval variable rnay be treated as ail ordinal or nr~nzinalvariable. midw west. not allow.2 1-39. 60 and older) * Opinion on dekrlse syendiw (increase. marxist. somewl~atliberal. other. West) * Farm of goverIIment (democrat).

as a nominal variable, Thus, the percentage of a state" vote that
went to tl-re Democratic candidate, an interval variable, could be
used to put the states into rank order from most Democratic to
least Democratic, States csuld also be put into ordinal categories,
such as aver 60 percent Democratic, SO percent to 60 percent Democratic, 40 percent to 49 percellt Democraic, and so on, 3% treat
a l no changes are needed; one simthese categories as n o ~ ~ i n data,
ply ignores tl-re fact that tl-rey l-rave an orderIn applying rule I, it is critical to keep in mind that although
you may go down in level of measurement from interval ttr srdirlal to nominal, it is not permissible to go up, that is, to treat a
nominal variable as ordinal a r an ordinal variables as interval.
There is uile exception to that statement, and it constitutes the
other rule,
Rule 2 is that a dichotomy may be treated as a ~ z ylevel of measurement, A dickotomy is a variable that has two and only two
possible values or categories. An example would be a perso~l'sgender (female or male), assuming that there were no cases in which
that infamation was missing, A state could he classified as having
a Republican or a Democratic governor. This would be a dichotomy as Long as no state had an independent a r third party gavernor, But if there are only two possible categories into which any
cases can fall, the variable inay be treated as interval, ordinal, or
nominal, regardless of i t s substantive concent. Thus, rule 2 might
be expressed as "dichotomies are wildm-in the card-playing sense,
of course,
In order to take advantage of rule 2, it is common for researchers
to modify their data tc-,create dichott3mies. The motivation for this
is that the statistics that can be used only br interval variables are
more powerful than those for ordinal and nominal data. Hence, for
example, the ethnicity of individuals might he condensed from the
nominal set of categories of white, African American, Hispanic,
Asian American, and other into the dichotomy of wl-rite and nonwhite, In political analysis it is common to collapse the regions of
d i c h o t o ~ ~ 50y,
the United States into a S~uthern/Non-Sc)t~ther~l
phisticated multivariate analyses sometimes create what is called a
&mmy variable by using each categclry in a nominal variable, such
as religious prefere~ice,to create new dichotomous variables-for
example, Pratestantmon-Protest nt, CatholiclNon-Cat holic, and
SO on.

Box 6.2 provides some examples of the application of these two
rules, as does Exercise B at the end of tlze chapter,

Why LeveL of Meusuremmt Are Important
The reason it is so important to be able to identif.y the level of measurement and correctly apply the rules is that each of the many statistics designed far data analysis makes assumptions about the variablesqwel of measurement. If you use an inappropriate statistic to
evaluate your data, the results may be ~neaninglessand lead you to
draw erroneous conclnsit>ns, This is something to bear in mind
when using computers in staeisticaf analysis. The coxnpmer programs we use to calculate statistical values do not know what the
content of your variables is and therefore caxlnot determine what
statistics should be used. Since it is common to enter all kinds af
data as numbers, the computer will readily treat any variable as interval data, even though the numbers may represent arbitrary
codes for naxninal categories. A variable such as region may be
coded 1 for Northeast, 2 for midw west, 3 for South, and 4 for West.
To compute the ""average region" would be senseless, hut a sk~tistical program will do it if you request it.
Therefr~re,always be aware of the level of measurement of your
variables and of what leveIs the tvvo rufes will aiXow you to treat
them as. As noted earlier, you may choose to modify a variable,
such as by collapsing it into a dichotomy, tto take advantage of rule
2, Most computer programs can do this for you autctmaticaf!y.

What lls ~ S~ati~tic-7
As noted at the start of this chapter, in social science research we
are often faced with the task of looking at a large collection of observatio~lsand trying to see what patterns are present. Such a task
would be diff"icrtlt.and in many cases impossible if we did nor have
statistics to assist us, A statistic may he defined as a nur~ericatmeasgre t/?at summarizes some characteristic of a larger bod$i of dcntil.
That is why statistics are useful, They can reduce very Large
amouilts of information, such as the census of the United States, to
single numbers that convey information we need.
Statistics are found in everyday life, and everyone uses tlzem. The
most common statistic is the total, such as the total population of
a nation or the total amount of Enolley in one" pocket. Anather

BOX 6.2 Rules far Using Level of Measurement
and Exampies sf Their Application
Rule k '""own, Bttt Not Up": A variable may always be
treated as a Eower level of measurement (is., interval may he
and ordinal may be treated as
treated as ordinal, or nof~ir~aX
a nominal. But never treat a variahle as a higher level.

Rule 2: ""r>ichotr>miesAre Witd" A dichotomy-a varia hle
with only two possihie values-may be treated as any ieve) of
measurement.

Percentage of a nation's budget spent on defense: This is an
i~ltervalvariable, so it could also be treated as ordinal or ncminal (rule l f.
Party competition in a state f highly competitive, less competitive, one party): This is an ordinal variable, so it could also be
treated as x~ominal(rule l f .
NATO membership (~nemkr,mnmember): This is a dichotomy,
so it could be treated as nominal, urdir~al,or interval (rule 2 ) .
Form 01 municipal government (strong mayor, councilmanager, cornmissinn, other): This is a ncjminal variable and
not a dichotomy, so it could only be treated as nominal.
Level of education, variation I (grade scl-rool, some kigl-2
school, high school gmdtrate, some college, college graduate): This is ordinal, so it could also be treated as nominal
(rule I f .
Level of education, variation 2 (grade schtlol, some high
school, high school graduate, some coliege, college graduate,
trade scbooi, stilt in school, unknown): TI.ris is a x~orninal
variable because clre addition of m y of the last three cacegories deprives it of its otherwise ordinal quality, Therefore,
it can be treated only as x~orninal,
csl.tbfzzlc?s

Population density (number of people per square mile): This
is an interval variable, so it could be treated as nominal and
ordinal as well (rule 1).
Legislator's vote oil bill f yea, nay): This is a dichotc.>my,sc.3 it
map be treated as nominal, ordinal, or interval (rule 2 ) .

common statistic is the proportiorz, which can be expressed as a
decimal, a fraction, or a percentage, Ra&s are also a familiar statistic, such as miles per gallon fclr automobile fuel consumption.
The average, the term mast people use for tl-re arithmtt3tZc mean, is
a well-knom statistic. Uewed in this way, the subject of statistics
is not an exotic undertaking, hut simply an extensiorl of a tool you
have been using far years, Since scientific research goes beyond
sirnple descriptioil and attempts to analyze relationships and test
hypotheses, you will need some new tools in your toolbox,

All of the examples of everyday statistics cited above are gniuariate, that is, they describe characteristics of one variable at a time,
Since most readers already have some knowledge of them and since
scientific research is usually concerned with multivariace questions,
the discussion here wilf he brief,

Measures of Cmtml Tendency
The mast familiar univariate statistics are measures of central tendency-r,
as they are cornxnonly called, averages. There is a measure for each Level of measurement. Each one is way of describing
what the "'typical" ccas in a set looks like on some variable.
The best known is the mean, or arithxnetic average, which can be
computed only for interval data. The mean is computed by adding
up alt of the individual values and dividing by the number of cases.
A, similar measure is the median, or "middle" value in a distribution: Half of the cases have higher values and half have lower values. Technically, a inedian can be determizzed froin ordinal data,

000.000.000.one ailother. But it: should be remembered that the mean actually includes more information than the rnedian. Suppose we have a very small town of five farnilies and their incarnes are $2.000.000.000.000. because inore people fail into that category than any other. whereas in the second the incomes are mtlre similar tc-. The meail family income for this town wo~zldbe $20. and $38. income is dispersed over a wide range. the median is often considered to be a better measure of central tendency. The simplest rneasure of dispersion is the mlsge. with highly skewed distributions (i. which can geatly affect the mean). and the atl-rer five families with incornes af $18.e.000. but the median would be only $3. $2.000.000.000. $2. that is. $20.000.S. Modes are not very useful for inrerval data. and in the second it is $4. $38. which is simply the most frequently occurring value or category fn the example above. the mode would be $2.000. $19.000. In both cornmunities the meal? and the rnediar? income is $20.000. the modat ethnic category in the U. For example. Another characteristic af a set of observations is the extent t a which they are dispersed. because it is so easily affected by the presence of even one extreme case. one might have five families with incssrtes of $2.000. is white. $38.000. and $38.. $4. $3. Measures of dispersictn can be cc~mputed only for interval data. to take t w more very small towns. where there are some extreme cases. In the first town the range is $36. But in the first cornmunitgr. however. There are inore sophisticated versions such as the guartike range. $20.000 better describes the typical family than the mean of $20.000.000. We could have two distributions of sbservations with the same mean and rnediall that are very differem from one another. A measure af central tendency that can be applied even to naminal data is the mode. l-row closely. ~Vodesare sometimes useful for describjng orditlaf category or nurni~taldata.000. which is half the difference betweeri the values of the cases that rank one-fourth and .or widely cases are separated on a variable.000. which is simply the difkrerlce between the highest and the lowest values. especially when the values have a large potential range. In cases such as this.000. The range is not a very usefut measure.but it is usually computed for interval values. and $89. In this example. the median income of $3. For example.

one woufd first go through the data and count up how many males voted Republican. ii>r each row. However. But even this sort of Ineastire is not as precise as one xnigbt wish. which is based 0x3 a summation of the differexice of each case from the mean. but x~owwe will see what such relationships look like.three-fourths of the way between the highest and lowest scores. the percentages might add up to 100 for each column. Note that it is desirable to include tl-re N. which is the number of cases on which each set of percentages is based. it is most commonly used in performing certain tests of statistical significance. scientific research is ustially concerned with multz'vavht~:questions-the relationship between two or more variables. The way data on two nominal or ordinal category variables are customarily presented is by use of a cross-tabulation. Coxistructing one is simply a process of counting tip how xnany cases fall into each combinatioil. The variables and categories should also be clearly iabeled.) Box 6. it is usually clearest for the reader if the fotlowjng conventions are followed: ( 1 ) Let the independent variable define the columns and the dependent variable define the rows. then how many females.3A shows a set of "raws3data and the resulting contingency table. In order to do this. Contingency tables are often presented in terms of percentages. the percentages for each coluxnn will add up to 100. The Concept of Relationship As sl-rauld be clear from earlier chapters. ( 2 ) Compute column percentages by dividing the frequency of each cell by the total for that coluxnn. and so on. The most common measure of dispersion is the standard devzatz'on. Box 6. Although tl-ris is sometimes useful as a measure in itself. The concept of relationships between variables was introduced earlier. or contingency table. or for the elltire table.3B shows a contingency tahle with raw frequericies and their percentages in proper form. This can be done in several ways. we must first understand how data can be assembled to view possible relatir>nships. (fl this is done. 1st this example. This is a table showing the frequencies of each comhix~ationof categtjries on the two variables. .

To show interval data in a contingency table would not make much sense. 100 %. Constructing thc Tabfc Contingency T3blr: GENDER GENDER VOTE Male Female M R VOTE Republican: 3 2 F R Democratic: 2 3 M R.3 The Contingency Table A. as there would have to be rows and columns for each of the individual values of the variables. and most cells would have a frequericy of l or 0.4 gives an example of a small set of interval data and the resulting scattergram. Male Female Male Female VOTE VOTE Republican: 557 423 Republican: 56 % 42% Democratic: 439 586 Democratic: 44 58 100 %. F W 1 M D F D M R F R F D M W B. Note that the ehorizrr~fialaxis is a l w y s z-rsed fix the irt- . Box 6. Expressing the Table in Terms of Perccnttlges RAW FREQUENCIES PERCENTAGES GENDER GENDER.BOX 6. reiationships between two interval variables are shown in a scattergram (also called a scatterplot).Instead.

900 $46.500 $19.000 PERCENT" REPUBLICAN 33 46 73 S4 60 62 65 3s Scattergram Median Incaxne ( $ 1 000's) dependent variable and the vertical axis for the dependent vanable. 70 construct this scattergram.4 Constructing a Scattergram Data MEDIAN INCOME $ z 0.000 $2"7"500 $72.000 $40.700 $s2.BOX 6.000 $3 l . one would first go acmss the horizontal axis to the value of the independent variable-income in rl-ris case-and then straight up to the height of the dependent variable--percent Republican--and at that intersection place a dot .

all CathoIics are Wexnocratic. 37 percent of each religiuri is Republican. As one moves across a row. the example of a perfect relationship shows a different situation entirely. But the different types of possible relationsl-rips can best be illustrated with contingency tables and sca ttergrams. m e r e there is no relationship.5attempts to do this by shswitlg what contingency tahles and scattergrams would look like if there were absoluteiy no relationship between two wriables as compared with a "'perfect" rerelationship. that Catholics are more likely to be Democrats than are Protestants. It makes no dit-tierence in this hypothetical data set whether a persc-~nis Protestant. On the other hand. the more education one has. all individuals who went to college have a high income. those who went to high schc~olall have a medium income. for example. When this is done for all cases. and one variable csuld perfectly predict the other.) What Doo. (In some cases. hut rather that there is some identifiable pattern. In the v T .indicating the positisn of the case. With ordinal or interval data this can be described in quantitative terms. The no-relationship example shows that each educational group has exactly the same income distribution. Consider part A for noxninal variables. we can say that tile more education a person has. the percentage colum~lsin the contingency table are exactly the same. for this ilypotl-recical data set. I he same is true of the examples for ordinal variables in part B of Box 6. the higher his or her income. tl-re figures do not change. and those who went only to grade school all have a low incoxne. nuxnbers or letters identiityiag the cases are used instead of dots. Relationships between nominal variables may be described in terms of contrast between categories. for example. But in the example of a perfect positive relationship. the higher one's income tends to be. All Protestants are Republican. Therefore. Religion would be of no value in predicting a person's party affiliation. Box -6. Catholic. This xneans that we could perfecrly predict a person" party identification by knowing his OF her religion. the result is a scattergram. which cart take either a positive or negative fonn with ordinal and interval variables. and all Jews are independent.5. or Jewish.u Relationship Look Like? To say that there is a relationship between two variables implies that the cases are not distributed randomly.

BOX 6.00 P P - EI>UCATION IfFI$ GS Col HS C. Ordinal V~ria61es Pcrfecr Relatiotlships Na Relationship EI3UCATIC)N m C O M E C d HS GS' -30% -30% 30% Hi Med 42 42 42 MeJ Low 28 28 28 tow 100% 100% 100% Correlation = 0.S 100% 0% 0% Hi 0% 0% 100% O 100 O MeJ 0 100 O O 0 100 to\v10O O O 100% 100% 100% 100% 100% fO0% Correlation = + 1. 100% Correlation = 0.5 Examples of No Relationship and Perfect Relationships No Iielarionship REI.00 L 100 IOfb% 100% 100% Currclation = 1.00 R.IC.00 Currclation = -1.IC)N Prot Cktj? feu) In Ind I 39 ) c 39 m x Perfect Iielarionship Prot Cath Jew 39 x L Tlem 100% 10001.00 C01 p NO IiE:I.ATIONSHlI~ Perrcntage Urban p .

5. except that the . This wouM allow us to compute the equation for that scraigfic line and therefore predict the vote for any case from its urbanization score (how to 40 this win be covered in Chapter 9 j. so it is clear that the more urban an area. all the cases fall on a straight line. The same is true in the negative relationship example. In part C of Box 6. In this unlikely exaxnple. all college people have low incomes and ail those who went only to grade scboof have high incomes. the cases are randomly distributed with no patterzl. but in the opposite direction. In the example of a perfect positive relationship. the higher the Democratic percentage of the vote. scattergrams are presented for a pair of interval variables.continued Perfect Positive Relationship Perfect Negative Relationship Percent Urban example of a negative relationship. tl-re predictability is again perfect. In the no-relationship example.

The ""perfect relationship" tables and graphs would each have a correlatic~nvalue of plus one or rr. all will have a value of one. the "no relationship" tables and graph in each part of Box 6. if there is a '"gerfect" relationship. Strength ofa Relationship The s t ~ n g t hof a relationship is a measure of where the relationship falls between no reiationship and a perkct relationship.1. If the dependent variable also increases. and 10.line slopes downward. Jt is critical to understand the difkrence between tlzem. if there is absolutely n o relationship between the variables.5 all would have a correlation of exactly zero. Three characteristics of a relationship between variabkes can he summarized by statistics: stre~zgth. for example. depending on wlzetlser the relationship is in a positive or negative direction. as discussed below Thus. using any of tlze many measures of strengtl-r of association.linus one. depending on tlze direction of tlze relationship. they all have two things in common. and several are presented in detail in Chapters 8. indicating that the more urban an area. First. If the dependent variable decreases. . 9. These are commonly called correlatiuns. (However. It can also be thought of as a relative rBeasure of how good a predictor the independent variable is of the dependent variable. though it might he either pius one or rninus one. Jt alswers the question of what happens to the dependent variable as the independent variable increases. Direction of u Relationship The diwctiovt of a relationship is a simple concept.dkection. and significrlnce. the less Wexnocratic its voting pattern. they will have a value of zero.) Although these statistics are designed for difkrent c s ~ ~ b i n a t i o of n s levels of measurement and differ in their sensitivity to various aspects of the distribttcion of the variables.) Second. the relationship is negative. soxne define ""no relacionship" a little difkrentliy than others. then the relatioi-rshipis said to be positive. (A nrlmber of them are summarized below in 'Table 6. There are many statistics designed to meclsure strength of association.

Significance refers to the probability that ca retattonshii~between variiables could h ~ v occurred e by d a m e irr. such as all fifty U.T lezjel of sigrziJicavrce lapplks t o all signjlicance tests. The probability of a relationship occlrrring by chance is. In most social science research. That is one reason why it is always important to look closely at the contingency table. the probability that one might make a mistake by drawing the conclusion that the relationship observed in the sample is true of the Larger population. preferably one in terms of percentages. states or all I Q0 Senators? . This. such as an individual's religious preference or ethnic it)^. then significance tests have no validity But what if the data are not from a sample at all. In the example in part I3 of Box 6. the smaller that p r o b a b i l i ~ . Therefore. For that reason.O. 9. But the . A purely nominal va ria ble. cannot he said to increase or decrease. is . The direction of relationships as indicated by statistics computed on ordinal category data is completely dependent on the order of the columns and rows. Recall f%omthe discussion of survey sampling in Chapter 5 that even properly taken samples are a matter of chance. are quite a nuxnber of significance tests.5. is the same thing as the 9. though it is expressed differently. if the probabilit).5 percent level of confidence cited in the discussion of survey sampling in Chapter 5.. and I Q. The term significdlzce has a special meaning in statistics.1 and several of which are covered in detail in Chapters 8. some of which are listed below in Table 6. It is important to re~nemberthat szg~zificancetests sliouM be zdsed only if the data are fiom a random sample.S. the% the relationsh@ is s a d t u be s z g n i f i ~ ~ $There l.a rartdom s a ~ ~ p lift ) there . If the data are from a sample that has not been selected by one of the appropriate rnethods described in Chapter S. between them in the p o p ~ l u t i ofiom ~ which the sample was dwwn. incidentally. the more signifisan$ the relafl'onship. but constitute a whofe population. essentially. The same idea applies to relationships between variables in sample data. UI(?'JP EO rel~tio~shiit. reversing the order of the colu~nnson education or the rows on inco~ne(httt not both) would reverse the plus or minus sign for any correlation. before drawing conclusions a bout relatiolls between categorized variables. there is always a confidence interval around an estimate made from a sample.Direction in this sense applies only to ordinal or interval variables.05 or h s .

" ever1 though they are of littfe substantive importance. But it also depends on how large the s m y l e is. But it is highly inadvisable to use a statistic with whicl-r one is not familiar. The same degree of ritrerigth might he significant in a large sample. Exercises For each of the fclllowing variables. the less the probability that it was a chance occurrence and. the more significant it will be. even very weak relationships map be "statistically significant. but these are al! extensions of Pearson" rr. ordinal. only under certain circumstances. while not necessarily inaccurate. are unnecessary If there is even a very weak correlation between two characteristics of the fifty states.1 cailnot cover. such as surveys with over 1. identify the level of measurement (nominal. As will becorne clear when you learn how to conduct surne significance tests in iater chapters. TI~ere are also statistics that deal with the relatioilship between three or more variables. Table 6-1 can be useful when reading the results of someone else's rreearch and encountering m unfamiliar scatistic. therefore. It is important to keep this in mind when interpreting data. All of these are biwriate scatistics-they evaluate relationships becween two variables. or interval). we can now take a Look at Table 6.1. With a11 of this background. then we can be sure that it exists. . The stronger tl-re correlation between two variables.Then signiticarxcr: tests. never). In large samples.so the same assumptions and interpretations apply. whether in analyzing your own or reading the results of another person" rreearch. Opinion on legality of abortion (always. but not achieve significance in a small sample.000 eases. These statistics are discussed in Chapter 10. It can also he useful when analyzing data using a computer program that offers a wide choice of possible statistics. the significance of a relationship is determined by two factors: the strc~gthof the correlcation and the sample size. I . though it may not be of any importance. which summarizes a number of (hut certainly not all) the statistics designed to evaluate relationships. There are many details and variations that a simple summary like Table 6.

faif).E 6. 2. .0 *13earsr>nks -l .0 *<:hi2 F-test t-test I>iffcrenee of Means *Statistics covered in detai! in Chapters 7. incf udirrg the original: level. -1 .0 -Z. Outcoxne af a congressional vote on. Spain. apply rules 1 and 2 and identifii aSI of the levels of measurement the variable could be considered as. Far the examples in Exercise A.TABL. 200. France. 8. 3.1 Cornrnc~nUivariate Statistics Level oJ N easzdremenf Measz-zres of Association Tests of Range Sigazficilnce T k o noxntnal variables Thc ordinal variables Two interval variables One nominal variable and one intavaf variable *Chi" *Lambda *l3l1i Cramer" V F&uB if tocl. Size of largest city (Over 1 million. Nuxnber of irregular executive transfers in a nation since 1980.O Vt) to+ l .0 to 4-1.0 to cl.if to + 1.O *F-test Eta if tocl. S. 4.0 0 to+l . none).0 *[Gamma MendafPs Taug Mendati's Tau. and 9.OQOf.000 to 1 million. a bill (pass.0 -1.O if tocl. Previous coionial power (Britain. other.O to c l . less than lOO.

Ordinal. Retiglsn P G Turnout V Retiglsn V V W V G G . C = C:achofic. Present the table in terms of percentages. ordinal. Moxninal 3. Nominal 5. using proper form. nominal (rule I ) 4. N = Nctt~voter Suggested Answers to Exercises 1. nominal (rule 1 ) . Nominal (neither rule applies) S. Draw a conclusion about the relationship between religion and turnout for tl-rese individuals. Interval 4. nominal (rule 1 ) 2. 3. Interval. Construct a contingency table showing the frequencies. Ordinal 2. 2 . Far these data: 1. nominal (rule 2 ) 3. Ordinal.l G P Turnout V W V W W Codes for briables: Refigion: f3 = I)rotestant. ordinal.l X' G J Turnout V X' X3 W V V V Retigion 6" P . Interval. Ordinal I . j = Jcwisl~ Turnout: V = Vc?ter.Below are data on religion and turnout for fifteen people.

There is a relationship between religion and turnow in chat Catholics have higher turnout than 13ratestants. . and Jews have the highest.Frequency sable Reiigioil Prot Cath Jew lvumout: Voter 3 4 3 Nonvoter 3 2 0 Percentage ta hie Reiigioil Prot Cath Jew Xlmout:Voter 50% 67% 100% Nonvrlter SO 33 Q 100%) 100% 180%~ 3.

This page intentionally left blank .

includi~lgthe distinction between independent and dependent variables and tlie three levels of measurement discussed in Chapter 6. and papers are not familiar with tliese concepts. ~ The first is to illustrate how to construct several common types of graphics. Graphic displays of data can be very useful. The purpose of these graphic displays is primarily to convey important characteristics trtore effectively than a verbat description or table of numhws would be able to do. but doing it correctly involves undersranding concepts covered earlier in this book. This chapter has t w prgrposes. The use of graphics bas increased markedly in the past decade.Graphic Display of Data Popular media such as n e w s p a p e r ~ n dmagazines frequently use graphics to report the distribrrtion of resufts in some form of picture-a chart or graph instead of (or in addition to) reporting the relevant numbers. the grapl-rics that result are frequently meaningless or even misleading. primarily hecause of the ease of constructing and printiilg graphs and charts with widely available computer programs. Construction of graphics may seem simple to do with a coxnputer. Since many people who pr~t graphics into their articles. reports. The second is to explain how to interpret graphics you might ellcounter in your reading-and nat be mislied when others make the cofrtfrton mistakes. while avoiding inany common mistakes. both for conveying infornation to the reader and for re- .

g. such as pie charts and bar charts. A nrtlllber of authorities on graphic presentation advise against using pie charts (e. A more useful method of displaying category fi-equencies is the bar chart. it is comInon to it-rciude the exact nuxnbers or percentages in tl-re pie chart-but tl-ris is exactly the same information that would he presented in a simple numerical table. Figure 7. Tufre 1 983.searchers to better understand their data. For this reason. Typically what is being graphed is a nominal or ordinal category variable or a variable that has been made into one. First. Most readers have trouble making a precise comparison of the size of circular wedges. Here the relative frequency of each category is represented by . graphics of the type preserited in this chapter can almost never present information as complete as a numerical table can-and generatly they present much Less.) But from the standpt~intof scientific research. Such variables can be visually displayed in several ways. they are really not very useful. Mtlaough pie charts are frequently found in newspapers. Second. Pie charts are circles that are divided into segments representing different categories. and similar popular media. magazines. 178).) Graphics for Univariate Distributions The simplest use of graphics is to display the distribution of cases on a single variable such as the proportion of people who belong to different religions. (A brief yet comprehensive treatment of the subject c m be fou11d in Wallgren et ale 1996. such as by placing individt~alsYncornesinto different ranges. two disclaimers are in order. the relative size of:the segment being proportional to the frequelicy of the categov. Often different colors or shadings are used to distinguish the categories.1 is an example (all of the figures in this chapter were produced by Microsoft Excel). reports of scientific research such as those found in scho1arIy journals gerieraily do not use these sixnple grapl-rics.. This chapter provides only a fimited introduction to the topic. (The scattergram described in Chapter B is particularly useful for this latter function.

FICiliRi-, 7-1 130pularvote for president, 1996

Rtchard Al. Scammt>n,Mice V. McCitiivray, and Khodes
V i ~ t e svol.
,
22, Wasl~ington,13C:: Congressional
Quarterly, 1998, p. 13,

SQURC;E:

M ,Cook, America

the height: of a bar. The bars are usually vertical, but may be horizontal. Bar charts are somewhat superior to pie charts in that most
people can xnore easily cornpare the simple lengtlzs of bars ar lines
than the relative sizes of segments of a circle, btlr again the iniormation communicated is less precise than would be a simple reporting of the actual frequencies, especially in terrns of percentagcs.
Therefore, the bar chart, too, may we11 include the precise numbers.
If a bar chart does not include the precise frequencies, then it sl~ould
present a scale on the vertical axis, as was done in Figure '7.2. Unfortunately, such charts in popular media often fail to do this,

Graphics for Multivariate Relationships
There are a nuxnber af ways the relationship between two or mare
variables can be shown graphically, One is to use the bar chart.
Here the different bars represent different categories of the indcpendent: variable, and their heights represent: the dependent variable, Hence, tl-re independent variable must be a norninal or ordinal category variable, and the dependent variable either
frequencies-----whetheractual numbers or percetitages

FIGURE 7.2 X30pular vote for president, 1996

souacr,: Ric-hard M, Scalni~~on,
Atice V. hfcGillivraj~,and Rhodes
AM.

Cook, Anzerzlla Votes, vol. 22. Wa sfiington, DC: C:ongresstonat

val variable. Figure 7.3 is an example. As with the univariate bar
chart, showing the exact nrlrnerical value of the height of the bar,
or at least including a scale, is desirable but unfortunately is not
always done.
Bar charts can also be used to illustrate the relationship between
three variables. These charts use bars whose height represents the
frequency for interval value) of the dependent variable for each
cornbillation of categories of the independent and control variables. (It does not matter wl~ichvariable is tl-re independent and
which is the coiltrol variable,) Such charts could he constructed
from the results of corttroiliq usizg contirtp~cyt~bles,which is
discussed in Chapter 10, This approach could be extended to any
number of independent and/or control variables, but the results
would be very hard for the reader to interpret. Figure 7,4 is an example of a chart showing the effects of controlling.

Line Gruphs
Another method of illustratir-rgthe relationship between an interval.
dependent: variable and an ordinal category independent variable is

FIGURE-,7.3 Reportcci voter turnout, by ethnictry; 1996

White

/

SOURCL:

African Arncrican

C?rher

Center for Political Studies, L996 National ELection Study.

ll

the line graph, Essentially3 a line graph is the saiiBe as a bar chart,
except that instead of using a bar to represent the value of the dependent variable, a single point takes the place af cl-re top of each
bar, and then the points are connected with a line. Although line
graphs can be used where the independent variabk categories are
nominal (such as ethnic groups), it is best reserved for instances
where the independent variable is ordinal. The line graph is prekrable to the bar chart when there are so many categories of the independent variable that a bar chart would be conftzsing, Therefore,
line graphs o h n are used to display data over a iengthy time period, Figure 7 3 is an e x m p l e of a line graph. Note that line g r a p h
sl-rauld ?;rotbe cc~nfgsedwith scat~ergrcams(Chapter 6 ) and the line
connecting the points in a line graph should never be ~07.tfgsedwilFh
the rqrsssion line (Chapter 8).

How Not to Lie with Graphics
How to Lie with StatiStics w u l f 1954) is a famous hook first published nearly half a century ago but still available, Its purpose is to
show how the popular media-par tic dart^.. advertising-frequentiy
rnislead the reader tl-rrough tlzeir presentation of quantitative data,
and frequently involving graphics. The kinds of problems I-fulf
cited, whether committed intentionally or by mistake, are all the

FIGURE-,7.4 Reportcci voter turnout, by ethnictry and cciucation, 1996

White
College

iZlrrcan

Amergcan
Coifege

I

Clther
Whre
African
Other
College High School rimerrcan I-Ilgh School
XIl& School

sor1,tci.: Center for Political Studies, L996 National ELection Study.

I

more common today, (A receilt attempt to make the same point can
be found in Almer 2000,) It is important to he aware of these errors, both to avoid making them oneself and to prevent being misfed when Looking at tl-re work of otl-rers.

The Miislng Zero Point
Perhaps the most frequent problem with bar charts and line graphs
is that the vertical axis either does not go dawn to zero or part of
the axis is omitted. The effect of this is to exaggerate the contrast
between different categories of the independent variable. For example, if we were to draw a graph or chart of the budget of soiBe
government agency over several years, and the budget increased
from $100 million to $105 million, then a correctly rendered
graphic would show what it should-that spending increased only
very slightly, However, if we were to place the horizontal line that
showed the years nat at the zero doltars point on the vertical axis
but at the $95 miliion level, then the graph would at first sight give
the impression that spending had doubled over this period. If we
omitted any specific numbers or scales, the graph would he completely misleading, Including the numbers would ~ ~ a the
k egraphic
technically correct, but it still might rnislead tl-re casual reader. Figures 7,6A and 7.6B show an example of how such a gaphic should
and should not be constructed,

Graphic Display

of

Data

fff

FIGURE-,7.5 Turnout of voting-age population in prcstdcntial elections,
1960-1 991;

A

Sri

60
50

G

40
Sri

30
20

3

l0

K. Abrtlmson, J o l ~ nH, Altlrich, and l3avid W. Rhode,
C h a ~ g earzd C:onthzdit~~
t ~ zthe 2 996 and 2 998 EEections, Washington,
13C: CC) Press, 1999, p. 69.
SCIEIRCE: I%ul

Sc;.ule~and Axes
Line graphs can also he misleading because of problems with how
the hr~rizontaland vertical axes are defined. Assigning the ixldeyendent and dependent variables to the wrong axes can be a major
problem. When the independent variable is erroneously shown on
the vertical axis and the dependent variable is erroneously shown
a n the horizontal axis, the relationship between the two variables
may appear completely the opposite of what it really is. Relationships also may he distorted if the range of possible values for one
variable is sl~ownin a much sl-rorter length than that used for the
other varia hie,

13ictorialsare graphics similar to bar charts, except that rather than
simple bars whose length represents the value of a variable, a picture of some object is used, such as a sack of grain, a dollar sign, or
a person, Pictorials are rlever used in scientific reporting, hut they

7. Since most geaphics present aggregate data. per pupil spending on ed~rcation. suck as cities a r states. LX:. Washington. its picture would give the impression that the value was four (or even eight) tirxtes as great.S. The Need for Standardization The x~eedfor standardization was de~nonstratedin the discussion of operational definitions in Chapter 2. 298. 1998. Thus if one category of the variable has a value twice as high as another. and sometimes in depth.FIGURE-. Whenever we are presenting data on aggregates. A bar graph showing the total number of crimes comhtted in different states might give the impression that Cali- .S.1990-1 996correctly presented U. 1998. And since these pictorials are sometimes presented with no specific values or scales attached. SIIURGE: of the l are found in popular media and advertising. the reader would have n0 way of detecting the misrepresentation. such as percentages or per capita figures. the measure is likely to be meaningful only if it is presented in some way that is standardized. Statktical Abstract Urzited States. this is particularly important. usuaily to population. Bureau of the Census. p. They are particrrlarty likely to he misleading because the picture size is proportional to the variable" value nat only in lzeight but also in widtlz.6A U.

p. 1998. the amounts are ad~ustedfor inflation. we also need to control far inflation. Rureacr of rhc Census. and small states would not always I-rave the Ir~westrates.000 population) would show trtuch less diflerence. resyo~~sible grayf~ics (or verbal presentations of the same information) always present these figures in terms of consunt dollam. 298. But when dealing with variables measured in dollars or any other unit u l currency. crimes per 100.S. SOIIRCE: farnia and New York are far more dangerous places to live than smaller states. pcr pupil spcrldtng on cciueadon.. Washington.S. what are the rules for using graphic displays correctly and effectiveiyi . S.S.Graphic Display of Data FIGURE-. that is. population gri~upin different years will generally show a significant increase over time.tatbtz"clalAbstract ofthe U~zE'tedS. because population sizes change. but that would be largely the result of decreases in the value of the dollar every year for many decades. f 998.e. .tages. The same principle holds when our unit of analysis is time (i. 1990-1 996incc~rrectfypresented U. DC. A graphic showing the incomes of my U. whereas the same chart based on crime rates f i x . Therefore.7.6R U. Principles for Good Graphics Aside from avoiding the errors noted above (it is assumed that you would not want to mislead anyone). comparing different time periods).

or graph that appears in a scientific report ought to have at least a page of discussion. Although a page may be more thall is always necessary. including the uilits in which they are measured. if the data are nut generated from the research you are presenting but are from another source.in the Tewt Too often graphics are tl-rrawn into a paper with little or no discussion in the text. But keep the shadings as simple as possible. again including the variables. avoiding the use of crosshatcfning. it is essential that the variables be clearly Xabeled. some use of words is essential to any chart or graph. If there is nothing to be said about a graphic. ~zsuallyon a line below the graphic. then a table i s a better choice tl-ran a chart or graph. as should unnecessary arwork. The same rules. If' you are printing a graphic such as a pie chart or a segmented bar chart where categories m s t be distinguished by their appearance and it is not possible to print them in diflerent colors. If a large number of categories are rlecessary fc~rfull presentation of the data. Although unnecessary wordirlg within a graphic sho~zld be avoided. There sho~lldalways be a description of the table. certainly a paragraph i s needed. Finaily. incidentally. Witlain the graphic. then dif'ferent shadings must he used. that source should be ideritified. Every graphic should have a titfe above it specifying what the graph is. including the conclusioil that the author wishes the reader to draw. chart. and the like. also apply to any nuxnerical tahles you present. . Extensive verbal expianations in the body of a graphic shc~uldhe avoided. then one would have to question wl-rether it is really worth iilcluding. Large numbers of categories in pie or bar charts are apt to be confusing. hncy borders. Describing the Gruphi~. 117 same circles it is a maxim that every table.The purpose of a gaphic is to convey certain characteristics of data to the reader more effective15 and this is best done by making the graphic as sixnple as possible.

it should be fabeled in its title (e. Exercises Exerc3i3-eA Belr~wis a table sl-rowing tile frequency of poverty in different e h nic groups in the United States for several years. Evaluate tl-ris graphic-is it misleading in any way? Are there any details or inbrmation that should have beer1 includedWas there an adequate discussion in the accompanying text (if any)? Could you suggest a better type of graphic to present this information? . DC. Persor~sBelow Poverty Level 1976-1996 (percentages) l976 1986 1996 A El Races 11.56. Find an exaxnple of one of the types of graphics described in this chapter from a newspaper or magazine.1 11. write a verbal description of what appears to he happening.4 28. table '7.0 29.1 31. 3 998. Washington. Figure 1)and then specific reference can be made in the text to that figure so that the reader will be Looking at the appropriate picture.Jf you have more than one graphic. 13.4 Hispanic 26. and (2)tile change in the frequemy of poverty h r tile whole population ("'A11 Races") from 2976 to 1996 For each graphic. Bureau of the Census..9 29. Design and produce two appropriate graphics (either by hand or on a computer) illustrating ( I ) the relative frequency of poverty in ethnic groups in 1996.7 WI7ite 9.8 13.: U.0 11.6.S.4 sc3r~~cr. Again.g. 1998. Statistical Abstract of the brrrited S t a t e ..2 Black 31. these comments apply to tables as well as to graphics.

1998. 477. FICiliRi-. 477.sertjXce: Bureau of the Census.Suggested Answers to Exercise A FIGURE-. 1376-1396 of the Census. SOL~RCE:Bureau I . Washington. f 998. DC:.7. p. by ethnic W White Black Hispanic . Stagistical Abstrac~ofthe Ufzzted States. 1996 I Percentage of persons beiiow poverty Isvci. S&tistical Abslract of the I_ilzited States. Washington. DC'.7 status. p. 1998. 1998. 7-24 13ercenrage of persons be1tj-w poverty level.

is that knowledge c>f how a statistic is defined and computed provides a deeper understanding of its meaning. Correlations for Naminaf Variables Lamkrdla (2)is a correlational statistic that measures the strength of assocktion between two nominal variables. however.Nominal and Ordinal Statistics This chapter presents detailed explanations of several measures of strength of association (correlations) and one test of significance appropriate for contingency tables with nominal and ordhal variables. h a m nu relationship to a perfect relacionship. One is that you may occasionalty find yourself looking at a simple frequency table for which it itlight be quicker sittlply to compute a statistic by hand than to enter the data into a computer. The range of possible values for lambda is from O to +I. TXierefore. it may be used for any contingency tabie. Students sometimes wonder whether it is practical to learn haw actually to compute such measures. Therefore. There are two reasons why it is useful to have some familiarity with methods of computation. a value of lambda that results in a negative number or a r~urllhergreater than 1 is a resutt of an error in cs~tlputation. The more important reason. computer progralrts are almost afways used for the task. wlzich is valtrahle in understanding how to apply and interpret it correctly. .that is. according to rule 1 for the use of levels of measurement. after all.

but. as that is the best. we would predict that all Jews voted for Clinton. but it can he a Little tricky at first.guess. religion. hut he wrong on the 2 who voted for Dole and the I who voted for Perot. but ws~uldmake errors on tl-re 16 Catholic Dole voters and the 4 13erot-voters. this would he a total of 80 errors.TAamhdameasures proportional redtaction of error.5 who voted for Dole and the 15 wl-ro voted far Perot. 'This is a simple idea. . The formula for l a ~ ~ b dis aa simple one: b-a Lambda = b where b is the nuxnber of errors one would make in predicting the value of each case a n the dependent variable if one did not know the value of the independent variables. it would be best to guess that he or she voted for Glinton. We would predict that aII Catholics. Similarly. but wrong on the 6. We would predict that each Prr~testantvoted for Dole. they are included with the table. VOTE Clint-on Prot 39 Cath Jew (Tc~taif Suppose we had a group of l56 people and k~levvnothing abr~ut them except the overall distribution of their votes (the raw total4 from tl-re table above. it measures how much better one can predict the value of each case on the dependent variable if one knows the value of the independent variable. Consider the c~ntingencytable below Since we will need the marginal row totals. we can xnake another set of predictions using the same method as before. and a is the nrlmber of errors one would make when the value of the independent variable is known. voted for Clinton. Ef we had to guess haw any given individual voted. that is.we would be wrong on the 39 Protestants who voted far Glinton and the 10 Protestants who voted for Perot. We would be correct on the 76 who did vote for Clinton. which is therefore the value of b. and look within each column of the table. But then if we take account of the indeperident variable.

1 sl~owsthat there is some relationship. the value would prove to be 0. Brtt note that in comparison to soirte other correlations (particularly gatrtma. Knowing a person" religion improved our predictiorr by 10 percent. if we used the data from the first example to try to predict a person" religion from his or her vote. Eanzhda must he confpzated fronf a table with " r a z ~ ' ' Jreque~cies.not from a table expressed in percentages.IQ 80 80 The value o f .5 49 5 If you were to compute lambda (you might try this for practice).iMale Female 51 9. that is. This is ant>ther reason one shouIct always set up a contingency ta hle with the independent variable defining the columns and the dependem variable defining the rows. Second. even though that was not the case . it makes a difference which variable is considered the independent and which the dependent variable. larrrhda will be zero.Adding up a11 of these errors made within the religio~lscategories (39 + 10 + 16 + 4 + 2 + l). Eambcia is asy~~unetrtc. values of lambda tend to be low. Whenever all categories of the independent variable have their greatest fi-eyuency in the same categov of the depedent variable. For instance. we would find that the value of IIambda was Q. First of all. Certain other features of lambda should he kept in mind. even though it was to a very different degree. We can then use the formula to compute larnbda: b-a Lambda =b 80-2 8 = =. This is because a table expressed in terms of column percentages will weight each column equally.we arrive at a total of 72. Third. which is the value of a. discussed below). iambda str~rtetimeshas a value of zero evexi though there is a relationship between the variables. The reason is that the largest number of voters in each gender category voted Democratic. Consider the following table: VOTE Democratic Republican GENDER . This is a relatively weak relationship.

for the raw data.) One way tc-.. Therefore. the computation would be: . Phi is anotlzer statistic fur measuring the stretlgth of association between two nr~minalvariables. and both variables are ordinal. to mmpute Yule's Q. as sametirnes occurs with lambda.this could be any two-by-two table. The formula Eor Yule%Q is: where a. using a percentage table will ~zsuaflg result in an incorrect answer. c. It uses a method of prediction that will riot fail tct detect certain relationships.J is similar to lambda. one would simply multiply together the two diagonal pairs of cases and then divide the difference between these products by tl~eirsum. Goodman and Krrrskalk tau-h (z.since bat11 variables would be dichotr>mies. Correlations for Ordinat Variables Suppose we have a table with only two rows and two columns.evaluate the strength of the relationship would be to csmpute a statistic called Y ~ l e kQ. Box 8. Using the frequencies in the table on the right. Additional examples can be found in the Exercises A and B at the end of the chapccr. (Actualt). and d are the frequencies in the h u r cells of the table arranged as shown below. VARIABLE 1 INCOME High Low High Low VARIABLE 2 High a b PQLZTICAL High 8 4 INTEREST Low c ci LOW 2 6 Thus. h. It is discussed in detail later in this chapter.1 summarizes the critical informatioil about lambda and provides another example of its computation.

Example: State Party Competitbn. b-a Lambda = b7 where: b = number of errors in predicting the dependent variable when the independent variable is not known. nor from percentage tables. by Region REGION PARTY GQMPETIDOPIIF NortJ? &lid East West SOU~J?West (Totals) 2 8 High 1 5 (16) Xlcdiurn 6 3 2 3 (14) Law 3 2 10 S (20) .1 Lambda and an Example of Its Computation Statistic: Zamhda (h) Type: Measure of association Assumptions: Two nominal variables Range: O to +l Interpretation: 13roportional reduction of error Notes: Lambda is asymmetric. Tt should be computed only from raw frequencies.BOX 8. a = number of errors in predicting the dependent variable when the indeprildent variable is k~lown.

Althawgh it is not apparent from the computation procedure. it does not make a distinction betweeri the indeperident and Qeperident variables. that is. States in the Midwest tend to have high party competion. Gamma lney also be cumpziteci!fiom percentage t. we need to use a statistic such as pmma. Unlike tambda. while states in the South are the most likely to l-rave low competition. If all tables had only two rows and two columns.: where P is the number of pairs of cases consistent with a positive relationship and Q is the number of pairs inconsistezlt with a positive relationship. the value for gamma may be interpreted as the proportionate reduction in error of prediction of one variable by the other.Conclusion: There is a definite relationship between region and party competition. Consider the following table.The answer will be the same whether percentages or raw frequencies are used. It has a range of possible values from -1 t<>+ g . The formula for gamiBa is. Gamma (y) is a correlational statistic that measures the strength of association between two ~rdiinalvariables.s. Yule's Q is actually a special case of gamma and was presented first in order to show how gamma depends on the extent to which cases are clustered along one diagorial more than the other. The idea of "consistent pairs" and "inconsistetlt pairs" "requires some explanation. . Yule's Q could be used every time. with riegative values indicating a negative relationslnip Ltnd zero indicating no relationship. as was the case with lambda.rzlik. Rut since marry tables are Larger. gamma is symmefric.

e. cells e. f. the calculation w o u l d b e Q = 1(3 + 8 c 2 c 7) + 4 ( 3c 2 f c S ( 2 c 7 ) c 8f2f = 101.SS indicates that there is a xnoderately strong positive relationship between income and political i~lterest. Again.that is.iitical interest example.350+101 -451 M M The value of .e. people with higher incomes tend to have more political interest. helow and to the right on. d. The number of ""inconsistent pairs" is the nuxnber of coxnparisons u l cases that are higher on uile variable but lower on the other. 13utting tl-rese numbers into the formula. and g. h. Cells h. that is. . the n~zmberof such pairs can be calculated by multiplying the frequencies in each pair of "cansistent" cells and adding up the total. cell c is iower olx variable 1. below and to the left. They would include a coinparison of the higwhigh cases on each variable (cell a) with all of those in cells below and to the right (i.VARIABLE 1 INCOME bIigi9 Med Low Hi@ Med Lout VARIABLE 2 X~QLXTICAL Nigh a Medl'gnt J g Edow b e c f h t C. t l ~ etable). the calculation would be P = 6(8 + S + 7 c 9) + 4(5 c 9) c 3(7 + 9) c 8 ( 9 )= 350. and i). Celts b and f also may be compared to cases that are inconsistent. and e also have cases that are lower on both variables (i.Q 350-101 249 = 4-53 Gamma = P + Q . We are not realty interested in individual comparisons. h. hut only in how many such comparisons could he made. every case that was higher on the first variable than another would also he higber on the second variable. In the income-pc. we have: P .e. Such comparisoils are therefore "c~~nsistei~t" with a positive relationship. the total number of inconsistent pairs would be compu~edby xnuttiptying the frequencies of atl of such pairs and summing. XNTEREST H i h Mecfigm 3 Edow 2 4 8 7 1 S 9 I f there were a perfect positive relationship. but higl-rer on variable 2 tllan ceits Q.... fn the exanlpfe above. In the example for income and political interest.

may be used. Note that whenever Q. the value of gamma will be negative* Garnma. Kendail's tau-b is essentially the same as gamxna.T h ~ l sthe c s ~ ~ y u t a tof i ogamma ~~ is the saEBe as that of Yule's Q except that there are more possiHe comparisons. is greater than P. where cases are the same on one variable but difkrent o n the other. another step is taken to determine the associated .it is a h y s appropriafe as far as level of measwement is concerned. Kendall"S. Chi-Square: A Significance Test The most cornmonly used test of significance ior concillgency tables is chi-square jlC9). This would make chi-square difficult to interpret.2 s u m a r i z e s the critical information about gamma and provides another exafnple of its computation. the results are meaningful only if tl-re data come from a random sample. the number of inconsistent pairs. For this reason. One is that it ignores instances where there are "ties." that is. Adctitional examples can be found in Exercises A and B at the end of the chapter. Box 8. chi-sqtxare has a range of O to N.a~-6. except that we rarely make use of the chi-square value directly. The computed value of Kendall's taub will usually he iess than but never greater than the value of gamma for the same table. Unlike any of the other statistics we have presented.even though the relatiollslzip might better he described as a weak one. where W i s the total number of cases in the table. The effect can be seen in a table like this one: POLITICAL INTEREST INCOME Hi& Law kiigh 5 5 Low O 1 The value of gamma for this table would be a "perfect'" +l. but it adjusts the value to take account of ties. Rather. However. the number of consistent pairs. Like laxnbda. has some drawbacks. a similar statistic. like all significance tests. as we will see below.Since it assumes that the variables are rzt>mi~znE.

I$ = number Exztmpfe: Vocer turnout.2 Information About Gamma and an Example of Its Computation Statistic: Gamma jy) Type: Measure of association Assumptions: Two ordinal variables Range: -1 to +l Interpretation: Proportional reduction of error Formula: where: of pairs of cases consistent with a positive relationship.BOX 8. by age 60 TURNOUT Voter Nonvoter AGE 01der 50-59 4 0 4 9 30-39 12 9 13 6 I4 7 9 Il 38-29 7 l4 . Q = number of pairs o f cases not consistent with a positive relationship.

probabiliq-which is always the end product of a sigllificance test. (The row. Chi-square must be comgated from raw fiey~enczes. which indicates that one should perform the operation that hllows for each of the cells and then add up the results.3 showing the relationship between race and voting for a sample of 100 people. In this table it is easy to see how the expected frequencies are determined. the more likely they are to be voters. which is explained below. To make this a little clearer. The formula for chi-square is: where f 3 refers to the observed fieqtrency of each cell. Although one could take the proportion of torai cases in each c o h n and then multiply it by the column tcttal.not from a table expressed in percentages. a perfect nonrelationship would mean that both racial g o u p s were evenly split as welt. coluxnn. a quicker metl-rod tl-rat achieves the sane result is this: (6)) fe = (row total x column total) t table total.Conclusion: This indicates that there is a rnoderately weak positive relationship beween age and turnout. The older people are. consider the example given in Dt~x 8. . that is. Sigma (C) is the summation sign.) The observed f ~ e quencies are the number of cases each cell would contain if there were no relatz'tznship between the varkbles. and fe refers to the expected freqgency of each cell. the value of tl-re expected frequencies is not so obvitrus. given tl-re existing totals for each row and each column. Since the overall distribution of the vote is split evenly between the parties. In most tabtes. and table totals are shown because they will he needed in the computation. the numbers in the table.

.fe) column in srep 3 must always total to zero. Dem.3 is recoxnmended when computing chi-square.17. srep .e white (total's) VOTE Rep.3 Compura~onof Clni-Square Observed Frequencies RACE NorzWj3il. the computation would be fe = (50 x 70) i 200 = 35. (Note that the (fc. the squared values from the previous column are each divided by the value of fe from step 2 in that line. the difference between the first two columns is calculated./(10=35 50~301100=15 30-35=-5 (-.67 20 fc *-fp For the upper left cell in the example (wl-ritelRepubiican).) 117 step 4. STEP 1 L 40 IQ) 30 20 STEP 2 Expected Frequencies RACE NoniVhi~c:white (to~als) (58) VOTE Kej>. In step 1.BOX 8. Finally. In step 5.3. the expected frequencies are cornpured as si~own. STEP 3 STEP 4 3. the values in the previous cslu r n are squared Nhich has the effect of eliminating the xninus signs).5 10-IS=--S (-5)"=2 S2S/lS=1.71 10 . Setting up a table like that in Box 8. The results for the other mIls and the remaining steps in the table are shown in Box 8.5 15 (58) 35 1.!i 20-I5=5 (5)"=25 25135=0.5 (SO) STEP 5 O-te (6 P (6*-fePfJ 40 50~701100=35 40-35=5 (5)"=25 2.In step 3.67 30 5 0 ~ " 71. (SO) Dcnz. fn step 2.50~30/100=1.5)2=2. the observed frequencies from the original table are listed.5135=0.5115=1.1 2.

In Tabfe 8. chi-squt~reis 4. As noted earlier. This means that the probahility is less than the lowest probability found in the table. a version of which is reproduced in Table 8 . such as Fisher3 exact test. For that reason. which is . which is highly significant. In order ttr determine the prob~bility. and 5.76. if the calculated value is less thart any value in the appropriate fine of tl-re table.001. and that for 5.67 would best fit.841. an alternative method.when the expected frequmcy for a cell is small. This is done by multiplying the number of rows minus one by the x~umberof columns ininus one: df = (r .1) ( 2 . because the observed frequencies cannot be frractionat values. in which the table has twr) rows and two columns. the probability is greater than the highest probability shown and is therefore not significant.OS.1. Even when there is no relationship in a table. or a correction of chi-square for continuity. you may sornetimes find that the chi-square you h a w calculated is larger than any value in the appropriate fine. the value of chi-square does not mean much in itself. it may not be pclssi ble for observed frequencies to be exactly eyuai to expected frequencies.Q$ > p z . which is in the . we can conclude that this reiarionshiy is significant because the protlability of such a relationship occurring by chance in a random sa~npleis less than .6 entails tcrtaling the values in step 5. This means that the probability (p) associated wit11 our chi-square value is between that for 3.05. one more calculation is needed: The degrees of freedom (do in the original table must be computed. this would mean that p <r . n we look across the table to see where our chi-square value of 4. less than five. some inflation of chi-square is possible. though.412. the calculation is as follows: df = (2 . that is. Similarly.it is necessary to consult a prc~babl'lity of chi-square table. which produces the value of chi-squt~re. Recalling the discussion of significance in Chapter 6.02.I f .l f (c . can be used. this problem will make no practical difference. F r t ~ ~there. Before looking up the value of chisquare in the table. When the number of cases is large. IMany statistical computer programs provide this when x~eeded. But. in the . We see that it falls between 3.02 column. which is -02. hence .412.In this exaxnple.841. . In the above exampfe.I ) = 1.OS coltirnn.1 . When using a probability of chi-square table. This means that we look to row I in the degrees of freedom columr~on the left side of the table.

790 42.996 22.429 29.412 7.741 37.760 23.475 20.803 11.693 47.553 30.679 21.688 29.706 4.797 48.566 38.795 32.007 33.728 51.264 32.275 18.919 18.289 41.638 42.017 13.595 14.311 17.419 46.924 3.457 24.769 25.893 58.362 23.812 16.59 38.703 .5 6.622 18.697 16 17 18 19 20 20.I0 1 2 3 4 5 1.912 34.472 26.980 44.578 31.191 37.815 9.652 36.465 21.c .251 7.301 28.141 30.289 2.476 56.151 19.030 12.900 25.687 35.869 30.278 49.435 37.557 43.465 20.962 45.528 36.725 26.064 22.631 15.140 45.877 29.549 19.210 11.488 11.892 54.841 5.685 24.675 21.161 16.837 11.5 14.985 18.587 28.668 13.346 33.070 5.633 30.566 39.172 36.517 6 7 8 9 10 8.834 9.827 13.812 21.204 28.219 4.558 9.588 11 12 13 14 1.642 46.645 12.666 23.987 12.027 35.038 23.314 46.eue1.168 19.620 26 27 28 29 30 31.885 40.242 13.179 52.382 32.820 45.171 27.671 33.312 43.209 22.20 .256 38.020 32.250 35.618 24.873 28.410 29.615 22.307 15.916 39.995 32.422 10.268 49.615 30.302 59.02 Degrees qf Freedom .000 33.815 16.1 Probability of Chi-Square Probability I.052 55.675 29.5 26.270 41.963 48.337 42.090 21.779 9.412 26.252 40.989 7.341 13.813 32.773 42.989 27.113 41.296 27.932 40.033 16.123 37.144 31.635 9.507 16.259 24.001 continires .388 6.588 50.277 15.315 21 22 23 24 2.642 5.217 27.086 10.037 19.409 34.026 22.05 .322 26.362 14.684 15.563 36.268 18.805 36.343 37.067 15.5.542 24.6.196 34.139 36.125 27.642 3.812 18.909 34.236 3.0I .087 40.968 40.856 44.054 25.60.129 TABLE 8.991 7.

1%63). note that phi2 is the ratio of the actual value of chi-square to the value it would have if there were a perfect relationship between the two variables. and assigned numbers to the categories).47.4 summarizes information about chi-sqrrare and provides anotl-rer example of its computation. Reprinted by pcrrnissiorl of karson Edueadon. Agricultural.NOTE: Larger tables including bigl~erprobability levels and more degrecs of frccdarn rnay bc found in marly comprehensive statistics texts.t: <>Liverand Uoyct. but the following simpie formula may he used if chi-square has already been co~nyuted: where W i s the total nrlrnber of cases in the table. so it can be used with any contingency table. Phi assuxnes tlzat both variables are nominal. phi has the same value as the interval correlation Pearson" r (if one treated each dichotomous variable as interval. since it is equal to the proportion of variance expiahed. Statistical Fables for Rioiogical. Additional examples rnay he fuund in Exercises A and B at the end of the chapter. AdditiarzaX Correlations for Nominal Variables A s inentioned earlier. However. Indeed. Phi can be computed in a number of ways. Note that the formula calculates phiL (the squared value of phi). .A. Yates. Fisher and Frank Yates. SOURCE: Box 8. it makes no difference which variable i s independent or dependent. a csncept that is explained in Chapter 13. p. Ronald A. phi (@li s another correlation for no~ninal data. Plzi i s symmetric. One can rake the square root to obtain phi. p h i q s often reported. and Medical Research. The range of possible values for phi is O to + 1 for tables up to 2 x 2 (see the cominent in col~tlectiol~ with Cramer's V below). X~tmited. Sixth Edition (Editlkurg1. The interpretarioa tor pl-ri is that its squared value (phi2)is equal to the proportion of variance i~zone vilrinble e z p l a i ~ t ~byd the otl?er. Fisher and F. Recalling that the maximum possible value of cbi-quare i s N. @l<. for a 2 x 2 table.

4 Information About Chi-Square and an Example of Its Computation Statistic: Cbi-squxe (x2f Type: Significance test Assumptians: Two nominal variables. where N is the total numher of cases Formula: where: fo = observed (actual) frequency for each cell fe = expected frequency for each cell Nore: Ghi-square must he computed from raw frequencies.BOX 8.ir17. Example: Form of city government and crime rate Form of City Government Strong C0~4nc1'1 Mayor Manager C017.irissi0~ (TotaEs) CRIME RATE High Medizam Low (Totals) 7 2 S (14) 3 4 8 (15) 9 6 I (16) (19) (12) (14) (45) . not frtm a table expressed in terms of percentages. random sarnpling Range: Q to N.

00= 0.73 2-3.. One problem with pl-ri is that far tables larger than two rows and two columns.99 2.94)f3/73=1.64 14~15/45=4.00=0.64)" 0.00/4.f15316.27 6--4.20 (-3.41 0.63~-3.36 S-4. f "73.f153 1t. it is possible for phi have a value larger than 1. Jn the previous exampIe for race and voting.) = 2 (l.Z ~ 9-6.73~-1. the computation w w l d be phiL = chi-square t N = 4.9 2.91= 1.36= 0.74.57 f2.09 19x15/45=1. Wc cannot conclude that thcrc is any relationsfiip bctwecn form of city gavcrrlrncnt and thc crime rare for thc urholc population Eroi1-t which this sail-tple is drawn.24)'= 5-02 5*02/6./33 3-3.5=3.98)'=15.27=0. Although this is not an impressive figure in terms of strength of association. calculated as fc11Lows: .6"7" 3.09f2=1.5-4.must be emphasized that phi. it. like lambda.05 Conclz~siorz:Since the probability of chi-square is greater than .73 12x15/45=4.70 (0.76~ 2.048.915 df = (3 .45.915 f 4.-3. One of these is Criamer"sV.76=0.80/3.00 12xf 6/45=4.24 16?~14/4.33 14x16/45=4.33)'=ll.19/5.f15314/98=2.24 (O.91=0. it is not considerect significant.80 15. f 7 7.41/4.33)'=ll.5.f113 1t . tends to X-rave relatively low values.73)" 22.91 7-.36=0. Therefore.00 0.76 c 100 = 0.OS.24.19 l.2"7 f -73 14~1414.37 f-3.6".00 44.00 f 2.488 . and gaiiBma would be 0.73)'= 2. 'This shows that race explained a little less tl-ran 5 percent of the variance in voting.915. a number of statistics have been devised to adjust phi to avoid this difficulty.33 1 9f 6145~6.99/4.33=0. l 0 > p > . The value of lambda br the racelvotirtg table is 0.09 (3.79 < Chi2 9.OO)'= 0.7 3 9 2 4 6 S 8 f 19~14/4S=5.l ) f S.74 (-1. particuiarly compared to statistics like gamma.

aclns only of he same statistical measure. and this computation is unnecessary Box 8. highly likely to be misleading. clre best approach for the novice is to think of them as relative measures of strength. particutarly when produced by . Bivariate statistics. When using ordinal statistics. it is very important to be aware that the order in which the categories q p e a d n the rows and columns will determine wlletl-rer tl-re value is positive or negative. statistics are a tool far helping us interpret our data. But what is considered to be a ""strong" association and what is a "Mienk" association? There is no simple answer to that question. But what different statistics teII us can be confusing. such as those presellted in this chapter. such as defining a gamma value af -7% or greater as ""very strong.l . measures of association or correlations (such as lambda.5 summarizes the information about Phi and ayplies it to the example horn Box 8. thus ensuring that a pasitive reiationship will produce a positive value h r gamma. Comparing a gamma value with a lambda value. Interpreting Contingency Tables Using Statistics As stated earlier. Although some authors have suggested ranges. there would have to be different fists for every statistic.I f means the number of rows minus one or the number af col~lmnsminus one.1 are both equal to 1." d-rese ranges are arbitrary. But tables are not always set up that wait. thus facilitating a decision as to whichrelationship was the strongest. r . In the racelvoting example (a 2 x 2 table).where Min(r . which shows the direction of the relationship. tell us something about relationships. But it is important to rernernber to make direct compari. gamma. This can he useful if one is comparing several relationships between similar pairs of variables. and phi) tell us something about the strength of: a relationship. All of the examples in this chapter have the hhighest values of ordinal variables in the tap row and the !eft coluxnn.4. Furthermore. so V = phi.. Although the statistics have varying mathernatictll interprettltions. such as the correlation between tl-re attitude af individuals on the abortion issue and their votes in several presidential elections. for e x a ~ ~ p l is e.1. whicl-rever is less. such as gamma. and c . c .

always look cczrefully at h e cont%~gency &Etlee One c m then see what the direction of the relationship appears to be and what a positive or rlegative value of a correlatio~lwould rBean. Since the table was larger than 2 by 2. This i s a moderalely strong relationship. . etc. and that will often be the code for the Lowest actual value (e.g. 30-49 years = 2. V" 0. the total number of cases in the table Example: For the data in Box 8. To prevent this problem. Cramer's V would be a more appropriate measure.20 + 2 = . Most statistical programs will put the first or lowest value in the left column and top row.10 NOTE: coxnputers.4: Conclzasiun: PhiQshows that 20 percent of the variance in crime rate i s exptained by the form of city gaverrtnnent. age might be coded as 18-29 years = 1 .BOX 8.5 Inhrmation About Phi and an Example of Its Computation Statistic: Phi (@) Type: Measure of association Assumptions: Two nominal variables Range: Q to 1 (for a 2 x 2 table) where N =.)..

Compute chi-square and determine its probability.00 INCOME $2. Is this sigtzificaxzt? 8. What assumptions would have to he made to use chi-square as a test of significance for tl-rese data? 7 . Over $50.z.S. if appropriate. Is it appropriate to coiByute garr.0011 . 4.e A Using the data on educatiorz and ideology in the following table.S. Is it appropriate to compute lambda for these data? Why or why nut? 3.00050. compute gamma. complete itexns 1-1 Q. If appropriate.Exercises Exer~. 2. If appropriate. School IDEOLOGY Libcr~l SO 60 20 10 Consemtive 20 60 30 24) I. draw a conclusion about the relationship. 6. On the basis of all af these computations.lma for these data? Why or why not? S..000 Urzder $25. 10. csmplete items 1-3 0 from Exercise A. using proper Eom. compute lambda.. Present the table in terms of percentages. compute phi. Some Circzdc C:ollege Grad H. Usirtg the data on incsme and vote in the following table.5. Is it appropriate to compute phi for these data? Would Cramer" V be a better measure? 9. EDUCATION H.

3000 and u p ) S. identily all of the foifuwing statistics that would be appropriate: lambda. Social class (upper. Lambda requires oilly llorninal variables. IDEOLOGY Liberal 71% 50 % 4 0% (Jonscrva$ive 29 50 60 100% 100% 100% N=70 N=120 N=50 Grade School 33% 67 100% N=30 2.Perotf Suggested Answers to Exercises EDUCAmQN H. medirtm. decrease) and defense spending (increase. Largest minority group (African American.S. $1.= 350-110 130 --- 20 -. Opinion on welfare spending (increase. Winduism. Some CoElege Grad H. Gender (male. keep the same. working) and vote (Republican. so it may always be used. and phi. Isiam.000 to $2. middle. Buddhism. Hispanic.5 130 . Ciintr>n. $. Yes. Native American) and crime rate (high.S. female) aild vote (Bush.For each of the following pairs of variables.= -1. Democrat ) 4. Lambda .999. other) and per capita GNP (up to $999. low) 3. keep the same. gamma. decrease) 2. Dominant religion (Christianity. Asian. I .

chi-square requi""son1y nominal variables. Gamma requires two ordinal variables.OO1 > p (significant) 8. it is always appropriate. . .2613 < chi2. The more education people have. 16.it is valid as a significance test only if the data come from a random sample. Cramer" V would be the same as phi. Since phi requires only nominal variables. they more likely they are t o be liberal. Education is ordinal and ideology is a dichotoxny. But.l ) f 4. c .900 6. S.1. Since Min(r .i2 dF =r ( 2 .1 ) = 1. so it is always appropriate. Yes. L-f.1)= 3.4.900 Q = 10(20 + 60 + 30) C 20f20 + 60) C 60f20) = 3. In terms of level of rReasurement. P = 50f60 + 30 C 20) = 40(30 C 20) + 20(2Q)= 8. Tlzere is a moderately weak significant positive relationship between education and liberal ideology. g-f. so it may be treated as ordinal. '7. 10.

No. Q. . Lambda requires only nominal variables. But it is valid as a significance test m l y if the data come from a random sample. In terms of level of measurexnent. so it may aiways he used. Yes.04 Lambda = 117 127 4. vote is nominal and not a dichotomy. h = 4 9 + 4 9 + 1 9 = 1 1 7 a = 11 c 9 c 1 7 c 1 9 + 2 3 e 7 + 8 +1 5 c 3 = 112 117-112.S = .i~H 1.Exet-6. Not applicable. Income 2. so it always appropriate. =. chi-squt~rerequires oniy llominal variables. 3. Altl-rough income is ordinal. S. Gamma requires two ordinal variables.

96 0.20 c p c .7 0. Since phi requires only nominal variables.1)= 6.9 -4.1 -2.5 30. there is a tendency for people with higher incomes to be more likely to vote for Dole and Perot. 9. .81 0.84 1.19 + 182 = . and the lower people's income.01 1.2 -5.07 23 17.19 = chi-square df = (3 .34 22 15. c .25 1.1 16.9 3.Nominal and Ordinal Statistics fo f. Phi2 = 10.9 -0.24 0.645.1.5 5.61 0. .04 2.9 0.86 17 21.1)= 2. the more likely they are to vote for Clinton or to be nonvoters.fJ2 (fo.25 9 6.03 10.4 1.1 4. Cramer's V would be a better measure. Since Min(r .1 -4.41 0.26 7 7.1)(4.73 10.1 (fo.2 27.9 24.8 7.80 25 26. it is always appropriate.48 15 13.1 9. 37.04 8 13.l0 (not significant) 8.49 0.81 0. There is a weak relationship that is not significant.06 i2 = .7 -0.05 11 15.06 3 5.2 1. 8.8 3.fJ2/f.06 V = . &-C 6. For the sample data.21 2.9 19 19.51 23 19.588 c chi-square c 10.2 2.4 -1.

so only lambda and phi could be used. and phi c ~ u l da11 be used. 5.1. gaEBrna. 3. so lambda. so only lambda and phi coufd be used (and Gramer" V would he a better measure than phi). and largest minority group is nominal and not a dictlotsm!. and gamlBa could a'iI be used. . phi. 4.so lambda. and religio~~ is nominral and not a dichotomy. GI-ime rate is ordinal. so only lambda and phi could be used (and Cramer" V would he better tl-ran phi). Per capita GNP is ordinal. Social class is ordinal and vote is a dichc~tomy. Both variables are ordinal. and vote is nominal and i s not a dichotomy. 2 . Gerider is a dichotomy.

and even careful visual inspection of a scattergram will tell us only so much about the relationship between the variables plotted. . that this fine is a scraight one. mast relationships are far from perfect. The examples of ""perfect" relationships shown there were instances in which all of the points representing the cases fell along single strai&t lines. These statistics are derived from a procedure called regresszon. we do not have to du this with a rulier. for now.'a Iir-re that describes the relationship betweet1 the variables better than any other line would. The Regression Line The idea of regressir~ilis best illustrated with the use of scattergrams.Interval Statistics In this chapter we will fook at statistics that evaluate the relationship between two interval varialzles. if we he 171. But in the imperfect world of the social sciences. there are formulas to determine the exact locatictil of the Iine and a measure of how good a fit the line is to the points. perfecrly correlated-we wr~uldnot need many statistics. which were introduced in Chapter 6. The key idea of regessian i s that there is a single. Fortunately. the@the total wzll be less thart what the mtal wogM be fi~ran). they and their multivariate extensions fcr>veredin Chapter 10) are by far the mc-1st commonly used statistics in contemporary poliricaf scietice research. If all relationships between variables we= perfect in that way-that is.easur. b6best-fitting. Let us assume.ott~erh e .e the distance of each case from that line and sq~lareeach ualzdc. that is. Regression statistics define this as the least-sqgnrrrs line.

a is the height of the line where it crosses the y-axis.e.. those of the squared values of each variable (i. we need to find the value of five sums: those of the original values of X (i.. (EX)JI.GXY means that one must first multiply the value of X by the value of 3' for each case and then add up these producrs for all cases.1 for every increase of 1 unit in X. E X ) and 3' (i.1 shows an example of a scattergram with the feast-squares fine. Z X q s different from (XX)L. which uses the data for the scattergram in Box 9.. and h is the slope..7 + 1. How did we determine the values of a and b? There are formulas for each. The value of b.e. the slope. The equaticrn far a straight line may be written as Y = a + bX. ZXband ET2).e. EY). Sigma (C)9the stlmmation sign. It is useful to set up a table like the one belou. SirniIarly. Box 9. T e equation for the line is Y = 0.l)(. the number of cases..e.7 and goes up by 1. is cafcnlated as follows: where X and Y are values of the independent and dependent varie rcases.. To calculate b. STEP 1 X Y 1 2 2 3 3 3 4 6 5 6 Sums: "L S20 STEP 2 X2 I 4 9 16 25 55 STEP 3 Y2 4 9 9 36 36 94 STEP 4 XV 2 Q 9 24 30 71 .This rneans that the line crosses the y-axis at a height of 0. and tlnat of the product oi X times Y (i. and PearsonS r (discussed below).Y) means that one first adds up the miginal values of X and Y and then multiplies the products. Nate that ZXV is nof the same as /GX)JZU). CXY). a. W also use N.1 to itlustrate the procedure. X is the independent variable. where Y is the dependent variable.Any straight line can be completeiy described by two facts: the 10cation of a single point through which it passes and the slope or angle at which it rises or falls. ahles and N is the n u ~ ~ bof indicates that one rnust add up the value for all cases.

we square each of the values of X and add up the column to get Z X L 5. we take the original values of X and V and add up each coluxnn. giving us ZX = 1. To calculate the value of a.BOX 9. often called the corzstanf or the yintercept. we do the same for the original values of Y to get Cl' = 94. In step 4. we multiply the value of X by the value of: Y for each case and then add up the column to get EXU = 73. the formula is: . along with the number of cases (N = 5) in the furrnula for b.1 Example of a Scattergram and Regression Line In step 1.5 and ZV = 20. In step 3.5. In step 2. Now we place these sums.

j Pearson" r assumes that there are t w o interval variables. In this sense. And whether it has a plus or a minus sign tells us whether the relationship is positive or negative. it has the disadvantage of being highly Jetpendent on the units in which tl-re variables are measured. it is a measure of hr>w good a predictor one variable is of the other. or Pearson's r.2. picture a horizontal line across the scattergram at the height of the trtean. Although the slope of the line is important. gives us a very important piece of information. This idea of ""explained variance" is a crucial one in statistical theory. (It is so widely used that it is ofren reported simply as '"r. it measures how closely the case points cluster around the regression line. it is common to compute a standardized version of the slope called beta. For that we use a statistic called tlse Pearson product-moment correlatz'on. would be the mean value of Y. Its range is from -1 to +l.It is a measure of association. Age can be measured is days and moilths as well as years. As was the case with Phi" rr is isqgal t o the pmporciorz of vurinnce in one varlialale explained b y the other. thousands of dollars. T h e slope is a direct measure of the effect of the ipzdependent variable on the dependent variable. For that reason. income in dollars. the dependent variable. If we knew nothing about any other variables. then the best predictor of the value of every case of Y. Essentially. b.1.The total variance in U would be the sum of the sq~zareddeviations of the actual cases from this rrtean line. which in this example is 4 (computect by adding up the values of V and dividing by Nj.Thus. using the figures for this example. of the strelsgth of the relatiorzship. other currencies. that is."and rehrences only to a ""correlation" probabfy refer to it as ts"ei1. Making a different choice of units could drastically affect the value of b. in Box 9. a measure that will be discussed in Chapter 10. However. it does not give us a measure of strength of association in the way that other measwes such as gamma and phi do. artd so on. we have: Another example of these computatic->ilsis showil in Box 9. The slt~peof the line. To the . For example.

400 10.900 3.600 2.200 2.600 27.800 1.bX X .35X 70 C* I I 40 0 30 F: 20 E • .400 XY 0 3.000 .000 86.000 4. 10 0 .5 -.(500)(500) N CX"(CX)L 10(33.500 1.675 W N 10 10 10 W 67.000 = . .500 + 175 . 0 20 40 h0 80 100 120 Percent Urban N CXY .500 1.000 -. 50 J Y = 67.000 1.35 336.600 22.000).2 Example of Regression and Computations of b and a % % URBAN TURNOUT X X2 Y 0 100 90 20 50 30 40 70 60 40 500 SUMS: 80 30 50 70 60 40 50 50 30 40 500 Y2 0 6.250.500 3.500 1.500 900 1.600 1.600 900 1.500 .10(22.5 .400 3.100 2.220.600 2.900 2.500 400 4.000 90 80 W 9 U r: 60 t.000 b= a= X Y .(-.000 3.600)-(SO0)' -30.250.000 900 8.BOX 9.600 33.35)(500) .600 4.( C X )(CY) .000 .

the answer will he the same no matter wlzich role the variables are placed in. we have: This value or r. is of some value as a predictor. of course. This test assumes. The formula for Pearson's s is similar to that for b and a in that it uses the sums of the values. tlzeir squares. V" 94. take the results of steps 1 through 4. and tlzeir products: Although it may not seem immediately obvious from a look at the lormuia. shows that there is. which yielded X = IS. -93. note that Pearson" r is symmetrical. XU =: '71. Pearson's r2 directly measures this improvement in prediction. X" 660. f ubstituting these values into the formula. X. that the data come from a randoxn sample. as we would expect from the scattesgrarR?a very strong positive relationship.io. or F. Although the lormuia requires that one variable be designated as independent ( X ) and the other as dependent (V). We can also test the significance of Pearson" r far significance using the F-mt. and N = 5. V = 20. tlzen the deviations around tlze least-squares regression fine will he less. The value of F is computed as fcsltows: .SG.extretlt that an independent variable. The proportion of variance explained is indicated by r" which is .-test. To calculate r for the previous example.

after the values reach 30. so including ir-ztermediate values would be a waste of space. and this would alrnost always lead to the correct conclusion. they skip to 4O. as this one was. requires a table to determine the prohabilir):. this path is not described by a straight fine.2 is the number of degrees of freedom. This value of F. the numbers change very little. we go down to line 3 and look across.1. Nonlinear Relationships Thus far we have assumed tl-rat a ""perfect" relationsl-rip between two interval variables would take the form of a straight line a n a scattergram.2 were SO. Our F value of 18-43 wt>uld fall between 10.Usirtg the values of r = 9 3 and N = 5 from the previous exarrtple. This illustrates tl-re fact that even a tiny random sample of five cases can produce a sigrlificant correlation-if that correlation happens to be very strong. but by a curve (a parabola).f. thougl-r in this one. The table is used much like tl-re chi-square table. But this is not necessarily the case far perftect relationships in the real world. which shows the path of m object hurled in the air.O.01 and would be considered significant. like chi-square values. Box 9.1 that in the N . Therefore. This is silnyly ftrr convenience. Thus if N . which is reproduced in Table: 9. the probability would be between . the best way to proceed would be to use the next Lowest available value. M .2 column. Wheri you have an N . Consider Figure 9. For this example.2 value that does not appear in the table. 120.12. and then to ilafinir5i.13 and 34. It is a perfect relationship in that knowing the horizontal disrai-zce traveIed enables you to predict the height perfectly However. Note in Table 9.OS and .3 summarizes the critical infarmation about Pearson" r and preserlts an additional exa~rtpleof its computation a1-d the Ftest. This illustrates why it is impartant always to look at a scattergram when investigating interval re- .1. as inspection of the values in the body of tl-re table shows. Other examples can be found in the exercises at the end of the chapter. one could use the figures for line 40.

40 8.05 .53 8.28 4.88 7.12 21.74 12.00 98.1 Probability of F PROBABILITY LEVELS N-2 .82 21 22 23 24 25 4.32 5.68 19.25 11.32 4.51 10.38 14.82 7.28 8.14 16.35 8.94 7.10 16.01 .4 18.64 17.86 21.84 4.50 167.20 16.60 4.50 74.04 6 7 8 9 10 5.1 8 8.14 47.148 TARLE 9.54 9.49 34.13 7.001 1 2 3 4 5 161.052.59 5.24 8.61 4.96 13.38 15.26 405.59 16 17 18 19 20 4.86 8.02 7.88 corrtirfrres .26 10.71 6.22 25.75 4.49 4.12 15.72 15.19 14.42 22.99 5.51 29.67 4.38 4.00 998.45 4.03 13.56 10.41 4.30 4.69 18.07 8.08 14.59 14.284.65 9.12 4.04 11 12 13 14 15 4.81 17.26 4.33 9.04 35.77 14.

Agricultural. SS. Statistical Tables for Biulogzcal. Fisher and E Yates. A variety of techniques-all beyond the scope of this hor>k+atl he used to analyze nonlinear or curviLinectr relationships. tlre need for this might never be apparent. l a d Medical Research. Sixth Editzon (Edinburgh: Clliver and Bayd. Table 10. 19631... which would then yield a reasonahlp correct analysis.) But if one rlever looked ar the scaaergram.: This table is destgncd for tesrir~gsignificance whcrc there is only one independent variahte.irnited. O R. (The simplest approach for this example would be to divide the data at the midpoint of the independent variable and analyze each hall separately with linear regression. Viewing the scattergram could prevent accepting that erroneous concIusiotl. the linear correlation and regression statistics described in the previous section (h and r) would indicate that there was nt->relationship between height and distlance. SOURCE: Kcprintect by permissitjn of Pearson Education. In an example like this one. Konald A. pp. fatianshigs.1 may be used for rn~xftipleand partial correlations.NOX'P.53. Larger tables can be h u n d in many comprel~ensivc statistics texts.. l. . Fisher and Frank Vater. A. 57.

3 Information About Pearson's r. the F-Test.000 .6. and an Example of ?'heir Computation Statistic: 13earsonk r Type: Measure of association Assumptions: Two interval variables Range: -1 to +l Interpretation: 13roportion of variance explained (r2) Formula: Exaxnple (Continued from Box 5 3 2 ) ZX=500 EY=500 CXL33.400 EXY=22.BOX 9.00 N=IQ F-test Assumptions: Random sampling Formula: F = 1 -rZ CYZ=27.

such as the t-test and diffireace of merlgs. A number of statistical tests could be used tc_tdo this. so . the lower its level of turnout.r F . . Relationships Between Interval and Nominal Variables There are many instances wl-rere one may want to evalriate the relationship between a nc. where there is a passibility that the relationship is curvilinear.01 (significant) Conclusian: There is a strong significant negative relationship between % U r h n and Oio Turnout.c: 11. qpically this occurs when we are co~nparingtwo groups defined by the noxninal or ordinal variable t o see whether they are &&rent a n the interval variable.>minaIor ordinal variable and an interval variable.Example (from above) 532 . VVe might. have a sample of individuals and wish to determine wl-rether the difference in income between males and females was large enough to be considered significant.26.05 p > . The more urban an area. Exercises Answers to these exercises follow It is suggested that you attempt to cc~mpletethe exercises hefclre lor>kingat the answers. for example. a measure of strength of association similar to Pearson" r railed eta is useful. Altl~oughsignificance tests are the main s a tistics used far the comparisons of groups.

What sort of relationship does there appear to be? 2. 3. Conduct the F-test and dererrnine the significance. k a r s of Education # of Wtes Years of Education # of Votes Years of Education Jf of Votes 1 . Draw a scattergram.Distance Tra veled Using the data in the following table a n the relationship between years of education and nuxnber of times a person voted in the past five elections. Compute Pearson's r. . S. Carngute b and a and draw the regression line on tl-re scattergram. 4. complete items 2-5. Draw a conc2usioil about the relationship.

Conduct a n F-test to determine the significance of this re'latiartship.l 3 between urbar~izatior~ and crime. complete items 1-5 korn Exercise A. Suggested Answers to Exercises Scattergram for Exercise A .Using tile data in the failowing table on the relationship between per capita income (in thousands of dollars) and percentage of a nation%budget spent a n defense. Xncarnc Dcfe'ense Income Defmse Income Dcfensc Suppose a random sample of seventy-two counties showed a value for Pearsor~'~ c of: .

288 Y2 16 1 0 25 25 9 9 4 16 16 16 4 9 25 0 175 XY 32 9 0 80 75 36 39 24 48 56 64 20 33 6 0 576 (TOTALS) .lnterval Statistics EDUCATION AND VOTES X 8 9 10 16 15 12 13 12 12 14 16 10 11 12 12 182 Y 4 1 0 5 5 3 3 2 4 4 4 2 3 5 0 45 X-' 64 81 100 256 225 144 169 144 144 196 256 100 121 144 144 2.

Q1 (significant!) 5.4.000~) . a random sample. so . If the data were horn. we could conclude that tl-ris positive relationship occurs in tl-re popufation from which the sample was drawn. The rnore education people have. the more electioils they tend to vote in.67 C F C 9. Per Capita Income ($1. There is a strong and significant positive relatiorlship between education and frequency of voting.07.05 > p > .

934 .541 1.Interval Statistics 156 INCOME AND DEFENSE X Y X2 Y2 XY 10 3 2 1 20 30 25 7 6 4 12 9 22 15 10 5 1 3 15 15 16 8 7 6 11 3 14 15 100 9 4 1 400 900 625 49 36 16 144 81 484 225 100 25 1 9 225 225 256 64 49 36 121 9 196 225 100 15 2 3 300 300 400 56 42 24 132 27 308 225 166 N = 14 129 3.074 1.

so p > . There is a scrong and significant positive relationship between a nation's per capita income and defense spending. .= 4. ff these data were from a random srrxnple of nations.the more spent t ~ ndefense. we cannclt conclude that there is any relationship for the whole psptllaticzn from which this sample was drawn.05 (NOT significant) Although tl-rere is a relationship between urbanization and crirne for the counties in this sample.00. we could conclude that there is a positive reIatisnship between per capita income and defense spending aERong nations in general. The higher the inctlt~e.5. F .

This page intentionally left blank .

tivariate Statistics This chapter presents techniques for dealing with the analysis of the relatioilship between three or more varidbfes. the use of control variables is essential in the correlational research design. to a variety of a t t i t ~ ~ d eand s opinions. we freq~zetltlyface situations where there are several. a table is constructed sl-rowing tlte relationship between the independent and dependent variables. to news broadcasts and campaign appeals immedia tely before the electio~~. Just think how many different factors might go into an individual's voting decisit>n. or even many. Sorting out potential independent variables is largely a matcer of controlling-and. as yaw know from Chapter 3. learn techniques for iartyositlg those controls. . Give11the nature of tine social and political world. possible causes of some pl-renomenon. ranging from the party identification adopted in childhood. Each of these tables may then he presented in terms of percentages and appropriate statistics may be calculated. Contingency tables also may be used to control for third variables. relationships between categorized nominal and ordinal variabtes are analyzed using contingency tables. This is fairly easily done: For each category of the control variable. Trr this chapter you will. it is Ilecessary to compare the contrt>l tables to a table without a control variabie. Note that to evaluate the effect of the control variable. We will begin with the method for nominal and ordinal category variables and then turn to intervai techniques. Controlling with Contingency Tables As you have already learned.

1 illustrates this procedure for a simple case i r ~which all variables are dichotoxnized. the effect of religious preference s n the vote was tot due ICIa person's income* What Can Happen When You Control Several things can happen to a relationship between two variahles when you control for a third variable. The ""uriginal" ttahle for all of the cases (part A) shows that there is a moderatefy strong. hut significant. chi-square could have heerr used. This is shown in part B of BOX 10.) What does the example in Box 10. Whe.2 wl-ren we coltcrol for gender. larnbda.tl we look at each of the control rabies. The tables far xnales and females are exactly the same and therefore have the same strength of relationship. that is. For this exaxnple. The first possible outcome of controlling is that nothing happens. Note that the frequencies far each cornbination of the independent and depildent variables (such as Protestant Republican) in the control tables add up to the frequency in the original table. but basically they show the same relatioilship as in the original table. re'lationship: People with higher incomes were xnore likely to vote Republican. Then we would construct the same table for each category (high and low) of the control variable (incssrte). Protestants tend to vote Republican. (Assuming the data were from a randc~msample.The statistics measuring the strength of tl-re association vary slightly. Each tahlc could then be expressed in terms of percentages and appropriate statistics computed. In other words. gamxna. (The chi-square values are srrtaller because the control ta- . there is a weak relatio~ishipbetweeri religion and vote. and Catholics tend to vote Democratic.Box 10. but with the small nu~nberof cases it would not have been significmt. and phi are reported.1 shc-IW?For all of the cases.and iowe~incomeresyondet~ts. Box 10. Suppose we wanted to see wl-retl-rerthe relationship between religion and voting was affected by an individual's inco~rtelevel. First we would construct a table showing the relatiansl-rip between the independent variable (religion) and tlze dependent variable (vt~tef. This outcome demonstrates that the control variable (income) had little or n o effect on tl-re rela tionship between the independent variable (religion) and the dependent variable (vote).2 illustrates this with an exa~rtpleof the relationship betweeii income and voting as we control far h u r other characteristics of the individtials. the same is true for both higher. the relationship is unchanged.

ow High I.Mzaltivariate Statistics f 61 BOX 10.1 Coneofling Using Contingency Tables MCOME RELXG'N VOTE XNCOME RELZG9N VOTE XNCQME RELZG9N VOTE High High I. High I. B. Carh. Rep. Pror.61 1-nmbda .amrna = +. Pror. Prot.: . Cath.ow High 1.ow I. When this happens. 2. Percellrage Tables and Statistics CQPITTROLLLPJG FOR INCOm ALL CASES ( H 0 CONTROLS) HIGH INCOME LOW WCOME RELIGION VOTE Prcit Cat/? Rep 62% 29% IJrm 38 71 100% 100% N= 8 7 KELXGXQN W T E Prot Ckth Rep 67% 2.1 . Rep. Dem. Prot.ow Carh. 13rtlr.ow I. Dem. C.) In real-life examples the percentages would rarely stay exactly the same. Dem.50 Phi2 = . we can conclude that the apparent relationship httiveen the illdependent and dependent variables was not caused by the control variable. High I.33 Gamma = +. Rep.ambda .ow 13rtlr. Rep. Rep.25 C. Cath.l 9 Phi" -12 S Cath 33% 67 IOO<Y* 3 bles are based on fewer cases.: . This is the same ourcome as in the example in Box IQ.l 7 I.71 Phi2 = . Rep. Pror. but the important thing is tbat the measures of strength are not much altered. l9em.29 Gamma = +. Cath. Tlern. . L>ern.ow 1. Rep.5% Ilent 33 75 100% 100% S 4 KELIGlQN W T E Prcit Rep 60% Ilet~t 40 100% I. Frequencies CQPITTROLLLPJG FOR INCOm ALL CASES ( H 0 CONTROLS) RELIGION VOTE Prcit Cat/? Rep 5 2 IJrm 3 5 HIGH INCOME KELXGXQN VOTE Prot Cath Rep 2 1 Dem 1 3 LOW WCOME KELIGlQN VOTE Prot Ckth Rep 3 1 Ilent 2.ambda = . L>ern. Cath. Catt~.ow High 13rtlr. Dem.

1 . Relationship Weakened: Controlling for Ideulr~gy LIBERALS INCOME Low N2gi1 VOTE Repzdbliccan 36% 36% CONSERVATIVES INCOME Lout High VOTE Repzdbliccan 63% 63% .3 3 1% = .BOX I Q 2 What Can Happen When Controlling: An Example A.00 .04 Chi2 = 20.001 > p FEMALES INCOME HigCs Lout VOTE Repzdbliccan 60% 40% L>enzc>crat 40 68 1.20 Gamma = .20 Gamma = .04 C h i b 20.1 . All Cases (No Controls) VOTE XNCQME High L.ambda = .001 > p C. Relationship Unchanged: Controlling for Gender MALES INCOME H2gi2 Low VOTE Repzdbliccan 6O% 40% L>enzc>crat 40 68 1.3 3 f3hiL .00 .ow Reptdblica~~ 60% 40% B.ambda = .

g3 f3hi2= ..arnbda = .61 .l8 1.and Iow-income individuals. and this is .&l Phi2 = .Ol Phi" . where we control: for ideology.@01> p .71 f3hi2= .001 > p The second possibility is that the relationship is weakened.Mzaltivariate Statistics f 63 Gamma = .SS 1.08 1.01 p > .arnbda = .OS Garnma = +.arnbda = .C30 D. o ~ VOTE Republican 86% 43% Democrat 14 89 1. N = 320 300 N--180 170 1. A glance at the percentage tables shows that within the income categories there was no difference betweerl the voting of high.axnbda = .00 CbiL= .48 Garnma = +.C30 Gamma = . l 6 Chi" 66. Interaction: Controlling for Region NON-SOUTH SOUTH INCOME INCOME High Low High Low VOTE Republzcan 75% 17% VOTE Republzcan 33% 75% D e n z o m ~ 67 25 100% loo%.@Q1> p E.07 Chi2 = 36.03 f3hi2= .I8 Garnma = +.78 f3hi2= .arnbda = .@l p > .17 Chi2 = 21 .05 Chi2 = 24. perhaps to the point sf disappearing.Cl0 Chi2 = .55 . This is shown in part C. Relationsl~ipStrengthened: Czantrolling for Education COLLEGE INCOME High Low VOTERepublican 58% 1 1 % Democrat 42 89 HIGH SCHOOL INCOME High 1 .54 .axnbda = .g8 Garnma = -.@Q1 >p 1.

This is the more logical iilterpretatioil in this example. where we control for education. One is that the relationship is sptlcrioas-that the indepelldent variahle really does not affect the dependent. This is illustrated by the example in part W of Box 10. It would be reasonable to suppose that income affects a person's ideology and then ideology affects the vocing decision.2. fn real-lik situations it is rare that a relatitznship would disappear as completely as in this example. An3 since there was a strong tendency for conservatives to vote Republicail and liberals to vote Democratic. Determining which interpretation applies in a particular case involves the assumptions one makes about the causal priority of the variables. income did not make any difference within those categories of ideology When we have this sort of outcome. A third possible outcome of controlling is that the origi~~al relationship is strengthened. But it is also possible that the independent variable is an intervenzng factor between the other two variables. In this example. we would say that it was partially caused by the corztrol variable. we conclude tl-rat the original relationship between the independent and dependent variable was caused by the control variable. This reasoning is presented in detail later in tl-ris chapter. If the relationship was weakened but did not disappear. How is this possible? ft c m e about because most of the higl-r-incoxnerespondents were conservatives and most of the low-income respondents were liberals (as can be seen by the N's in the control tables).confirmed by all of the statistics. This ineans that the effect of the control variable was to ""kde" h e relationship between tl-re independent and dependent variable to some extetlt. tl-re control variable apparently was a complete cause of the relatir~nship. but significance tests like chi-square (assuming raildom sampling) tell us whether the relationship still exists or not. There are two possible interpretations of this example. In this exaxnple. and this is confirmed by the higher value of the correlational statistics. where the original relationship completely disappeared.and low-income responderits is greater within the college and high school education categories than it was when alt respondents were pooled in the original table. As the percentage tables show3 the contrast in voting between high. H w can this happen? It occurs beca~zsethe control variable has a relatiansl-rip with the dependent variable zn the opposite direction from that of the independent variable. re- .

we might need to look at variables such as the respondent's race and religion. wilereas highincome whites in the Somh typically suppr>rteda conservative Democra tic party. for the decision must be based on our tlreoretical understanding of the suhject under study as well as on past research findings. Although contrt. Given the range of effects third variables can have on relationskips.tl we coiltrol for region. the effect of education was the apparent correlation between income and voting. Althougl-r. high income is associated with Democratic votkg and IOW income with Repuklican voting.the exaxnple in part E would not he realistic today. Among these southerners. Therefore. it might have been found in earlier decades when there was a tendency h r Africa11 Americans (most of whom were low-income southerners) to vote Republican. people who went to college tend t o have reduce higher incsmes.>iling techniques are riot an inherent part of the experimental and quasi-experirrtental designs. This makes an important point: Even when there appears to be little or no relationship between the indeperident and dependent variables when looking at all the rases at once. But there is a strong positive relationship between education and income. Additional examples of csntrof ling with contingency tables are found in Exercises A and B at tire end of the chapter. The final possible owcome of controlling is that the relationship is dift'erent within the various categories of the control varialsle.2 shows an example of this phenomenon. Part E of Box 10. it may be valuable to control for other factors. Interyreting interactive resulrs is difficult. Flow does one know which variables should he selected as controlsflhere is no simple answer. but actually reverses direction for respondents who live in the South. and region. . but it often suggests that we need to look more closely at other factors that might account for the difference between the categories of the control variable. because the North and South have different distributions on those characteristics. it is extremely important to control far additional variables. which is called interaction.Mzaltivariate Statistics f 65 spondents with college experience actually tend to vote more for Wexnocrats. they can also be applied to tl-te data resulting from those methods. we see that the relationship between income a i d vote becomes stronger for nonSouth respondents. particulariy in the cc>rrelationd design. voting. In this example of income. Whe.

the example in Box 10. the drawback is that each of the resuiting tables would be based on relatively few cases. ~Voreover. N-S SO. . then there would he nr> purpose in using gender as a control variable when investigating the effect of region on anything else. especially if some control variables l-rad highly unequal category frequencies.2 rllipht look like this: &$ale /" Liberal / \ ' .But it is important to remember one principle: A control variable can affect a relui~ionshipo ~ l ify it is velrzted to l>o~h the independent and depende~ztvariablese For example. The interval tecl-rniques described in the next section provide such an alternative.S. Although this could easily he do~le. such as inale conservatives with a college education living in the South.S. \ L i bertll C:ot~servative / \ '' / \ C:ot~servative / "--\ H. C p l l ~ g e H. \ I\\ College \ H. This is done by Looking at the independentldegendent relationship within each possible combination of the categories on two or more control vclriables. 1 . N-S S(>. Our examples here have looked only at cc~ntrolfingfor one variable at a time. N-S So. if there is no difference between geographic regiolls and the relative proportion of males and females (and therefore no correiation between region and gender).S. But it is theoretically possible to control simultaneously for the effect of several varisltltes using contingency tables. the exatrtples we have looked at t h ~ faT but it is comxnon for control variables to l-rave three or more categories. contrcllling simultaneously for several variahles requires another approach. Thus. N-S SO.the control variables in s have been dichotc~mies.especially by a computer. l\. each relatir-rg income and voting for one of the combinations of' categories. ! \ N-S So. / \ Cyllege i'/ \ The result would be sixteen tables. \ N-S SO. College H. unless one bas an extremely large data set. Therefore. / I N-S SO. N-S S(>.S.

00 -.43 Vote ((V) -23 re. Each of the other numbers appears twice because the correlation of . wl-retl-rer fetters or numbers.4 3 1. it must now be designated with subscripts.54> r\rf = -. The partial correlatio~~ measures the relationship betweer1 an independent variable and a dependent variable when one or rBore ocl-rer variables are controlfed. is partldl correlation. and the one most similar to the results of controlling with contingency tables. for example. The simplest: technique. This is because they each represent the correlation of a variable with itself. It is customary to list the dependent variable first..23.ibercalkm (J. = . 54 1. tl-rat is. The partiai correlation coefficient is simply an extension of Pearson" r. meaning that it is the correlation hemeen variable Y and variable X. may be used for this pwrpose. so that the cell at which the row and column for two variables intersect reports the correlation coefficie~~t ftrr those variables.) . Jt has the safBe range of -1 to + 1 and the same interpretation.72..81. It requires that the variables (three or more) he interval. Mulrivariate analyses often use a correhtion rutatrh.4 1 1. LWCQME LIBERALISM I I. An example appears belowVV ELIFSCAnON E E~ciz~ccati:~'u~z(E) 1-00 Irzccznzc?(1) . rle= -43.00 Note that tl-re values alr~ngthe diagonal are all 1. S~lbscriptsare used tct distinguish the different correlations involved. the squared value is equal to the proportion of variance explained.72 .81 1.00.S4 -. r\re = -.8 1 . rvl=-41 VOTE V -.00 -. r. .Mzaltivariate Statistics f 67 Controlling with Interval Variables: Partial Correlations The procedure presented in Chapter 9 for regression and calculation of the Pearson correlation for interval variables can be extended in several ways to look at the relationships between three or more variahles. 72 . Any convenient symbols. Althoud~normally 13earsonScorrelation is referred to simply as r. This is a rectangular listing of a set of variables.23 -.4 1 = -.

. To do this.81. and the matrix shows it to be . of .. x the indepeildeilt variable. seemingly indicating that areas with higher-income residents had somewhat higher crime rates.20. the corvel~tionbetween the independerlt ai"ld depende~tvariables with o ~ l one y COF~~. it is common to see correlation matrices presented as only one diagonal half.4J. per capita income (Jj. we wish to control for percentage urban (U). it is relatively easy to compute a partial correlation Erom the "'simple" "arson correlaticrns between variables...iltrol variable. = . As partial correlations can have any number of control variables.80. HCWever.). we need to employ the correlations of both cri~neand incorne with percentage urban... The iolfowrng example ilibrstrates the computation of partial r. The line under the matrix shows the use of subscripts to report the saiirte inhrmation.. and the indeperident variable. The correlation betweell liberalism and educatioli is r. r. Suppose we took a randoxn sample of I00 counties in eke United States and found that the dependent variable. Wit11 this natation system. = . and z the cc.variable X with variable Y is the same as the correlation of variable Y with variable X ..6O and c.that is.Y(ZIviarlble. Tlie formula is: where the subscript y denotes the dependent variables. Suppose these were r. The correlation between education (E) and income (X) is written as re. had a correlation.. a period is used to separate them from the independent and dependent variables (e. r.. Here we will look only at the formula for the first-order p~artiikl.'To cofrtpute the partial.. as follows: . = .g. Therefore. we need t o substitute these three simple correlations into the formula above. crime rate (C).

3 summarizes the critical infarmation on partial r and gives another example of its cornputatio~. When we control for urbanization.OS level is reproduced in Table 10. the value of (N . that is.I ) was always fN . is: where N is the nuxnber of cases and k is the number of independent and control variables. we see that an even stronger correlate of crime was urbanization. but since there was only one independent variable. And the rnore urban the county. the higher the income. the higher the crime rate. was stronger and negative jr. Also dilferent is cllac in this case we must use a probability of F table that takes into account the number of variables as well as the number of cases. the higher the income. The formula above can be used Eor partials with any number of control variables.Additional exarllgles can be h u n d in Exercise C at the end of the chapter.k . we see that the real relationship betweeri i n c o ~ ~and e crime is negative.1. the lower the crixne rate. This is actualfy the same formula as was used to calculate F for the simple Pearson's r.. The original correlation was positive jr. Sign$cance Test f i r Partial u Assuming that the data are from a random sample. the F-test can be used to determine significmice in much the same way as with 13earsonkreThere are two differences. = --.>iling for urbanimtion.Mzaltivariate Statistics f 69 The result shows that co~itrollingfor urbanization clearly had an effect on the relationship between income and crime. = . .58).20). What occurred here? Altkougli the initial relationship between crirrte and i l ~ c ~ m ievel e was surprisingly negative. both resulting from the fact that a partial correlarian is based on more variables than a simple Pearson's re The forrnula for X. Box 10. however. contrt. thereby removing its effects.2).. but the partial. This necessitates a different table for each level of probability. The table for the .. the more urban an area..

L) Formula: Exaxnple: Given tl-re following correlation matrix af 13earson's r's. it almost wrupletely disappeared when educatior~was controtled for.80 1.00 Education (E) .. the F-Test. calculate the partial correlatiw between a respondent's reported Frequency af Voting (V) with Incoxne (X). Data are from a random sample of 500.80 Frequency of voting (V) .... and Examples of Computations Statistic: Partial r Type: Measure of association Assumption: Three or more interval variables Range: -1 to + 1 Interpretation: Proportion of variance explained (r.. r.00 Conclusion: Although there was an initial fairly strong positive correlation between income and voting frequency.00 .QO V .Q0 2.50 .3 XnformL-ionAbout Padal and Muldple Correlations. i.BOX 20..e. rGTatrix af Pearsun" r 1 Income (If 1.. This suggests chat the tendency Eor respondents with higher education .SO E . controlling for Years of Education (E).

r. Statistic: multiple R Type: Measure of association Assumption: Three or more interval variables Range: 0 to 4-1 Interpretation: Pn~portionof variance explained (RL) Formula: . which is much larger tlzan the E for this example. r. the probability is greater than -05 and this partial correlation is not significant. we locate the F value for N .04.07. Statistic: F-test for partial R Assumption: Random sampliq Interpretation: The probability of F is the probability that the partial correlation observed in the sample data could occur by chance if there were no relationship in the population from which the sample was drawn. N = S00..1 = 120 (the next-lowest to 497) and the column under the heading k = 2. The vaiue &ere is 3.1. Therefore.. We substitute the vaiues into the formula for F: Using Table 10. and k = 2 Ithere are two independent variables). = .l>rmula: Example: Using the partial correlation computed above.to vote more frequently is almost entirely due to their higher level al edmcatian.k .

= . and education. income (I) and education (E). = .. rve = . Example: To test the multiple R previously computed for voting frequency. Statistic: F-test for multiple R Assumption: Random sampling Interpretation: The probability of F is the probabilbty that the partial correlation observed in the sample data could occur by chance if there were no relationship in the population from which the sample was drawn. we substitute the relevant values: . The Pearsank r correlations needed are rvi = .and r.36. rv:. we can calculate the multiple correlation of the independent: variable. Conclusion: Income and edtlcation together explain 36 percent of the variance in kequency of voting. . and k = 2.N = 500.50. voting frequency (V) with two independent variables.&Q. income. and k = number of independent variables.W.Example: Using the correlation matrix in the first part of this table. This is virtualIy n o improvement over the explanatory value of education alone. where N = sample size.

a Republicail identifier will tend to have a higher inct>meand a more conservative ideology*Simply adding up the explanatory value of these separate independent .SS. The Multiple Correlation Depeildent variables in social research conznzr~nlyhave several distinct but related causes.k . Since our F is much larger. This resuks in the following: We now lr~okin Table 10. Since our F i s much iarger. we are sure that the retationship is significant at the . We go down to the fine opposite 60 (the closest one tc-.Therefore.1.We then go to Table 10.3 and in the Exercises at the end of the clzapter. But these factors are themselves interrelated.~for F: N = 100. for example. for example. idealog): and attitudes rovvard a numher of specific issues. illcome.the value of 97 far N .OS. R2 is significant. an individua19svote for a presidential candidate. the e~umberof independent and corztrol variables is 2. r = -. race. This decision c m be partially predicted or explained by each of a considerable number of factors.1 5 would be required to assure that the probability of chance occurrence of this relationship m u I d be less than . Other examples of the F-test for the partial correlation can he fc~undin Box 10.05 level.1. The value there is 3.1 = 120 (the next-lowest value to 4337") and to the coiuxnn beaded k = 2. we can conclude that the probability of chance occurrence i s less than .M. including the person" party identification.07. 1 Tc? find the significance for the partial we just computed. We look down to the line to N . we insert the values into the formul.k .1) and look at the second column. religion. We see that an F value of onfy 3. Consider. because k. and k = 2.

10 4.6 19.41 3.26 3.71 6.00 2.92 2.24 3.44 2.63 2.01 2.35 3.35 3.11 3.16 3.84 4.32 5.95 2.96 2.60 4.84 2.74 4.0 19.48 4.28 6.22 4.18 3.51 2.32 4.16 9.37 3.07 3.13 7.76 2.60 21 22 23 24 25 4.05 234.89 2.49 3.94 5.94 6.54 3.47 3.38 3.33 4.90 2.47 2.66 2.85 2.26 4.77 2.43 2.20 4.5 Probability Level) k = Number of independent and control variables k = l k=2 k=3 k=4 k=5 k=6 1 2 3 4 5 161.TABLE 10.17 3.90 3.63 3.56 2.69 2.59 5.59 3.85 2.39 5.22 11 12 13 14 15 4.55 6.60 2.26 4.96 2.44 3.39 3.52 3.38 4.03 3.1 Probability of F for Partial and Multiple Correlations (0.30 4.28 3.00 9.53 2.21 4.2 19.57 2.70 2.48 3.01 2.02 2.79 16 17 18 19 20 4.16 4.79 215.26 5.11 3.14 4.41 224.4 18.70 2.18 4.57 2.46 4.66 2.05 3.74 2.10 3.07 3.49 26 27 28 29 30 4.06 3.40 3.7 19.59 2.87 3.75 4.53 4.49 3.28 4.81 2.76 4.55 3.80 2.67 4.33 3.61 199.12 6.09 3.5 19.54 2.34 3.98 3.92 2.96 5.49 4.95 6 7 8 9 10 5.42 3.74 2.69 3.68 3.68 2.80 3.93 2.41 4.12 4.96 2.74 3.71 2.63 3.37 3.24 3.87 2.20 3.93 2.71 2.62 2.73 2.99 5.29 3.30 9.74 2.86 3.33 8.12 3.13 3.19 230.34 3.71 4.20 3.01 6.32 2.82 2.59 3.36 3.78 2.46 2.35 4.97 3.45 4.84 3.88 3.58 3.53 2.99 2.59 5.25 9.64 2.51 10.42 N-k-l continues .55 2.

57.. 19631. and the subscripts begin with the depelldent variable. that is. Fisl-ter and E rates.argcr tables showing additional stgrlificancc Ievcls may bc fo~rndin many coil-tprefiensivestatistics texts. in effect.and Medical Research. followed by the independent variables. As with tl-re partial correlation.Rorlald ~: A. A. the dependent variable.. '"overlap" tto some degree. it does not show direction (because sofBe of the independent variables may have a positive relationship to the dependent variable and others a llegative relationship). O R.t: Ofiver and Boyd. measures the total effect of the independent variables. Limited. Normally the square of multiple R is computed. Therefore. s o u ~ c . SZXgh EEdiL-ion (Edinburg1. which tells is the proportion of variance . Statistical Tables fir BioEogilraE. the range of possible values for R is O to i l . pp 53.Mzaltivariate Statistics f 75 N W E : I. rnultipie R can easily be calculated from the simple Pearson's r vvalues. Thus R. Howeve4 multiple R differs from the others in that it can oniy be positive. The details of multiple R are similar to those Pearson3 r and the partial r in that all tlavicjreks w s t be i~tcrrvaland that the sqtrilred w l ~ of e R is the eqgal tu propurgion of'uarIILdl~ceexpkkined.. a n y. 5. Agvicz--tlturaE. variables would be misleading. for their contributi<~nst o the vote. Fisher and Frank Yares.5. x and z. their cox~tri The rnuleiple correlation cuefticient is symbolized by a capital R. Reprinted by perlltission of Pearson Education. The multiple correlation coefficient is designed to measure the total contribution of several independent variables to the explanation of a single depellde~ltvariable while taking into accrlunt any ""overlap" in bution.

. the forxnula is: R itself can be caiculated by taking the square root of the result. = -20. We can illustrate this computation with the previous example for crime rate (C). = . and k = 2. in which R" -77. Multiple correlations with more independent variables may be computed ~zsingmore csmplicated fc3rmulas involving partial correlations. is the sample size and k is the number of independent variables. Substituting the letter identifying the variables for the example in tlze forxnula and then substituting the corresponding values. N = 1 OO. we have: This shows that incsme and urbanization together explain '77percent of the variance in crime rate. ~ the significance of R2 may be Jererrnined by the F-test in rrlucln the same way as for the partial correlation. The formula is: where P. thus. Suppose we wish to coxnpute the multiple correlation of two independent variables (income and percentage urban) with the depelldellt variable (crime rate). For multiple R h i t h two independent variables. The Pearson correlations were r. Significance Test for R' Assuming that the data are from a r a r z d o ~sample. and r....80. percent urban (U) and per capita incc>me(1).. we substitute these values and obtain: . but Rqis rnore meaningful a d hence is tlne figure usually repmted.GO.explained. r. For the preceding example.. = .

bL.3 and in Exercise C at the end of the chapter. and in fact they are almost always done on a corrtputer* However.Mzaltivariate Statistics (Note that the value of -77 previously computed for R2 was already the squared value. to interpret directly because they are dependent on the units in which each of the variables is measured. F would have to equal 3. they are like partial corretations. The equtltion takes the form: where Y is the dependent variable. Mthough the b values for the slopes are quite meaningful.."a d then over to column 3 (headed "k = 2"). and b. The computations for these multiple regression statistics are beyond the scope of this book. Additional examples of the F-test h r R b r e found in Box 10.. In this respect.. and therefore the relationship is significant.77 by chance is less than . we can be confidex~tthat the probability of having obtained an Rhaalue of . Second. they use the standard . First. We see that in order to be statistically significant. f S or more. controlling far all of the other independem variables. we go down column t to where N . X. they show the effect of each independent variable on the dependent variable. Since our F is much larger.) litrning now to the probability figures in Table 10. they can be difficult..1. it is ixnportant to be aware of them as they are widely used in contemporary political science re sear cl^.I is 60 (the table's next lowest value from 9". and so on are the independelit variables. X. and so on are the corresponding values of the siape fnr each independent variable. For that reason. the results of multiple regression analyses are commonly reported in terms of s a ~ d a r d i x e dr e g r g s s i ~coefficients ~ or beta (@) weights.05.k . Beta Weights The process of deterrlnini~~g the "ibest-fitting" rregression line and the equation that defines it can be extended to any number of independent variabtes. Betas are standardized in two ways.

The key point is . F-tests are used with hetas to determine the significance of each independent variable. more precisely. This is the purpose of the controlling tecl-rniques discussed earlier in this chapter. The second criterion is time order or. This concluding section will focus on some principles that are vital Eor interpreting what the results of these techniques mean. The first is cowiariatitsn. The various ineasures of association. Ti.You should now have a much clearer idea of what this meaxls. interpret the results of inultivariate analysis correctly. Causal Interpretstion The chapter thus far has presented techniques far analyzing the relationship of three or more variables.deviations of the variables to remove the effects of the particular units in which tl-re variables are measured. causal priority. Tl-rus if the beta for the first independent variable is twice as high as that far the second independent variable. we must be very clear about our assumptions about tl-re order in which we believe the variahfes occur. its essentials can he simplified and used to analyze a small number of variables with the techniques covered earlier. from Xambda to inuftiple R. or correiatic~n. Although the process of causal modeling in its complete form is rnathernatically sophisticated and beyond the scope af this book. before we can draw any causal inferences. A quick review of the three ""criteria for inkrring causality'9hat were in traduced in Chapter 3 will be useful here. Interpreting the results of multivariate a~lalysisis a process leading to conclusions about patterns of causalion. particularly procedures for Ictoking at the relationship between two variabks wl~ilecontrolling for a third. we must make sure that relationships between variables are vtot spurious. The muIrip)e R% a measure of the expianaeory value of the whole equation. Finally. we can sap that the first variable had twice as rnuch impact a n the dependent variable as did the second. are all measures af covariation.

a person's ideology undoubtedly influences his or her party identification. there is littfe doubt &out which variable "caxne first" "cause we know when the variables occurred. But whatever the basis for the assumptions. But the same reasoning can be applied to noxninal and ordinat data.1 illustrates the need for causal modeling in even the simplest case.I~wever. No reverse causation is permitted-that is. some or all of tile possible intercorreiations are not zero. 18. As Figure-. Y cannot cause X. that are not at all related. for X to ir-rfluence U" while Y influerices X. = 0. Model 1 is the simplest case. X and Y. we must specifv the causal priority before assessing the applicability of any causal models. then X causes l' and Z. the causal priority X. where there are two independent variables. It is quite possible for causatioil to he reciprocal. Vlre first specifj. The exampie in Figure 10. we must assume that causatioil is unidirectional and that we know what the directio~iis. For example. Figure 10. Y. We would conclude that this is the case only if there were no simple Bearson correlation between X and Y. in that case the assumption of causal order must be based on the kind of reasoning presented in Chapter 2 in the discussion of the variables-cheoretical role and the difference between independent and dependent variables. where there are only three variabfes. We must also make tl-re assumption tl-rat tl-rere are no additional variables that could he affecting the relationships. When our data are derived from.. This means that if there is causation between the three variables. Z. this causal order is less clear. . There are a number of techniques for analyzing two-way ca~zsation. and I( causes Z.but they require r ~ u c hrRore satistical background tl-ran can be provided here.1 shows. but party loyalty may also affect ideological views.with a correlatiollal design (which is where we typically use causal modeling). that is. I-. design. We can use Pearsank r and partial correlations to determine whether each model fits any given set of data. a true experiment or a quasi-experimentai. r. there are four passible causal models that might underlie a pattern of observed intercsrrelation between only three variables.Mzaltivariate Statistics f 79 that we must he prepared to a s s w e that arty cagsial relationship between two vnrliahles can be in only one dzrection.1 assumes that we have interval data so that Pearsuil's r and partial r can be computed. as will be discussed later. and Z cannot cause either of the otl-rer two. Therefore. We would undermke causal modeli% for this set of variables hecause we have data that indicate some relationship between them. that is.

it occurs only through V. X causes Y.'" Y Z TEST: r.00 TEST: r. does that mean that the originaX relationship was spurious! No.l \ K Z TEST: r..00.. the test for this model i s tlze partial correlation between Z and X. not equal t o 0. then we can conclude that model 3 can be applied to this data set. = 0.1illttstrates spurious correlation.10.. r. and rzxVy not equal to 0...00 TEST: ry. that is./. = 0.= 0. This means that wl-rile we may have observed some correllatioil between X and Z .00. If the control variable vvas more . then we would conclude that model 2 fits our data. Model 3 iliustrates the presence of an irttervening uariubl'e.= 0. controlling for Y.. The difference between model 2 and rnodei 3 highlights the importance of the assumptions we make about causai priorify.. in wllich there is some apparent relationship between two variables (Y and Z in this case). If we find that a correlation between two variables disappears when we control for a third... not equal t o 0.. Xf r..00 Model 2 in Figure 1Q. not unless the control variable was logically prior to the independent variable..FIGURE-..1 Causal models for three vartablcs and tests 1VOL)EL 1: IGIIODEL 2: INDEPENDENT CAUSAA%'XON SPURIOUS CORRELATION Y X\ z X V X". The test for this model i s the partial correlation between Z and % controlling for X.= 0. the intervening variable. Therefore. and then V causes Z . but that relationship disappears when controlled for a prior variable (X in this case). If r.00 iMODEL 3: INTERVENING VARIABLE MODEL 4: COMPLETE CAUSATION XV .

indepelldent causation. To keep the example simple. If none of the test correlations (r. As Box 10. Hence the causal priority is wealth. military spending. But one may be working with nonsample data. where any correlation. How clr~seto zero must a correlation be? If the data are from a random sample. on tl-re other I-rand. then model 4 applies. so there is no need k>r causal interpretation. then the F-test may be used for Pearson" r a d the partial correlations. one may look at aIX o f the tests and see that because one of the test statistics is extremely weak. model 1. If the probability is greater than . Although examples such as these-in whictl correlations turn out to be exactly zero-can occur with real data. one should not draw such a conclusion until the appropriate partials have been computed.4 shows. But when we test model 3. the '"best fitting. given our assumptions and available information. It is also possible that more than one of these test statistics will be equal to zero. indeed.05 level. because wealth and denlocraq are strongly correlated. we have assumed that the control variable is causally prior to the other two. Model 2.. The dependent variable is military spending (measured as a percentage of:national budget). r ..-. The causal priority of the other two variables is not obvious. is. in a statistical sense. This means that. and r are equal to zero. because it is possible for the value of Pearson's r between two variables to be zero while the partial is significamly positive or negative. In such instances.. If. spurious correlation. however small. also does not apply$because the partial r between military spending and wealth. clearly does not apply.. then their relationship would be spurious." Box 10.Mzaltivariate Statistics f 81 likely a result of the independetit variable. we will assurr. then the mtrdel 3 interpretation of an intervening factor is correct. Intervening Variable. we . the corresponding model is.OS. GNP) and democracy (measured on a tenpoint scale) would have a lengthy history.le that wealth causes democracy. significant. usually they do not.&)r is quite strong. as both wealth (measured as per capita. dexnocracy. or with data from a such large sarnple that even rninute correlations indicating no practical relatirrnship are still significant at tile . then the correlation can be assumed to be zero for the population.4 illustrates the process of causal modeling with an example using data on nations. we can1i~)tsimplify the model and m s t assume that all of the correlations do imply causal linkages. controiling fc~rdemocracy (rgnW. This simply means that some or atl of these variables are not even related. However.

Although that is best done by writing simultaneous equations for all of the possible patterns (BXalock 1964). tl-re apparent relationship of wealth to military spending is a result of the effect of wealth on the type of g u v e m e n t . in which controlIing is dcrr-re usirlg contingency tables as explained in the first part of this chapter. measures of the relative stretlge1-r (in this case. In other words. controllix far dernocritcy. The relatively simple three-variahle example in Box 10. Another example of causal trtodeling can be fotjnd in Exercise C at the end of tlze chapter. wealtl-r. but the relative strength of the different linkages showed that party idetitification declined somewhat as an influewe on votiw while the importance of issues increased. analyze votillg belravior in the 1972 presidential election.2 shows a causal model that fchutman and 130mper (197. As is ccjmmon in the presentation of such models.5)constructed tc-. causal modcling can reveal important generalizations about complex phenomena. and the more democratic. More elaborate models may he constructed for larger numbers of variables. tl-re relatively simple approach using partial correlations can easily be extended to more complex problems (Blalock 1962). To do this for three variables. is very nearly zero (rmdSw Elence we conclude that model 2 is the best fit.4 illustrates how controlling allows us to understand these basic patterns in statistical anatgsis. This mc~delshows how the effects s f social hackground and family partisallship are mediated largely through an individual's party identification. Thus. Figure 10. particularly to distinguish cases of intervening variables from spurious correlations. the higher tl-re military spending. the more democratic it tends to be.find that the partiat correlation between military spending and = . Then three sets of contingency tables m s t . beta weights) are included for each of the causal arrows. almost identical causal patterns were found far elections in three different decades.05f. Party identification then has both a direct effect cm the vote and an indirect effect through i t s influence on attitudes toward particnlar issues and evalut~tionof the candidates. explicit ass~tmptivnsmust be made about causal priorities. The wealthier a nation. Causul Interpretution Using Contingmcy ?bble~ Although the complete cartsal modeling procedure requires interval data and partial correlatioils. Interestingly. the same logic can be applied to nominal and ordinal category data.

l

BOX 10.4 An Example of Causal Modeling
Correlation Matrix (X)earsank sr)
W
D
M
Wealth (W)
1.00 -85 ,S1
Democracy (D)
.85 L.80 .62
Military spending (1M) -51 -62 1.00
W = l 86 Nations

Relevant
Partiais:
r,,,,, = -78
r mw>,c = -.Q5

Assumed causal priority: Wealth, democracy, military spending

I

Model 1 : Independent Causation
Test: Does rdw= O! No, rd, = .g5
Conclusion: Model 1 does not apply-

1Vodel2: Spurious Correla tion
Test: Does rmdSw
= O! No, rmdew
= -78.
Conclusion: LWodel 2 does not apply.

W *D

J

iZ/1

f i s t : Does r

= 01 rtnd+,= -.OS,
wllicb is very close to zero,
Gonclusictn: LMc~del3 may apply.

1Vodei 4: Complete Causation
W *D

'I/
M

Test: Are rdw9rn,d,r, and rmsd a11 not
equal to =so? Since rrrrdew
= .OS,
Mc~del4 does not apply very well.

~vtzliszzled

Conclusion:

Model 3 is the best fitting causal model:

3"

&M

be constructed: (1)tables cross-tabulating each pair of variables
without controls; ( 2 ) tables cross-tabulating the second independent ('"middle") variabie with the depelldeat variahle while controlling for the first independent variable; and (3) tables crosstabulating the first independent variable witl-r the dependent
variable while controlling for the second independent (""middle"')
variable, Appropriate statistical measures of association and (if
randorn sarnple data are used) significance levels are then cornputed. When all of this has been done, it may be possible to distinguish the four possible causal models previously presenwd,
The results of this procedure may be more ambiguous than those
obtained In causal modelinfi for interval variables, The problem is
that there may be substantial ilzteracticm, that is, the relationship
may be of different strengrhs within different categories of a control variable, On the other hand, this can be ail advailtage of the
contingency table method, since partial correlations do nut reveal
whether interaction is present.. The contingency table approach
also may be extetlded tc-,a larger number of variables, which would
require controlling for two or more variables at once, As noted earlier, simultaneously controlling for several variables produces numerous tables, many with inadequate numbers of cases.
Box 10.5 presents the contingency tables rlecessary to mdertake
this version of causal analysis. The example deals with the question
of racial differences in voting participation and the extent to which
these differences can be attributed to education, We assume that
the causal priority is race, education, turnout, That ttlmout could
only be a consequence of the other two is O~VIQLIS, It also makes
sense to assume that race more lilcely influences education (i.e.,
members of minority groups tend to have less education) for a variety of reasoils, whereas the nt>tioilthat educatioil could influence
race and ethnicity does not make sense.

Mzaltivariate Statistics
0.2

F

f 85

An example t>f a causal model: 1972 presidential election

,285

FAiZilILY SOCIOECONOMIC
PARTXSAN PREDIf POSITION

IDENTIFICATION

RESPONDEN
SOCIOECONO
PARTISAN PREDISPOSITXC3N
-i

A/'

X3ARTISAN ISSUES
INDEX

\-:*

",

/'

/

i

'
I

.l38

Y

RESPONDENT" PARTY
IDENTIFICATION
.249
.3lZ

,/

/

/

/+-CANDIDATE
EVALUATION

,/*S l 0

RESPONDENT'S
VOTE

N = 827
RL= ,4713 (p < .OO f )
NOTE:

Figures by arrows are beta weights,

Addagtcd from hlark A. Scbutman and Gerald brnper,
"hriabitity in Electoral Behavior: Longitudinal Perspectives from
Causal &lodeling," Amerzcan jozar~talof Politic~alS~ie$?ceI9 ( f 975),
1-1 7.

SOURCE:

Box 10.5 first presents the relationsl-rips between each pair of
variables. It tl-ren explores the relationship between tl-re dependent
variable burnout) and each of the independent varirtbles (race and
education), Recatlirlg the four causal models presented earfier, we
can easily see that rnodel 1, independent causation, is not a possibility, because the two independent variables (race and education)
are strongly related. The second set of tables jtrtrnout with education, controlling foe race) would test rnodel 2, spurious correlation,
because it determines whether the relationship between the second
and third variables disappears when corztroiling kjr the first. Modet

2, does not fit the data, as the turnoutleducation relationship re-

mains about the same strength and is significant for both racial categories. But when we look at the relationship between turnout and
race, controlling for education, the relationship within each education category virtually disappears, in both strengeh and significance. When we compare individuals of a given level of education,
there is virtually no difference in the turnout rates of whites and
nonwhites. Since we l-rave assumed that race i s causally prior to
education, model 3, intervening variable, fits these data very well,
This analysis aids in our substarttive interpretation of: turnout,
Race is not irrelevant to turxlout, because it is ultimately a cause,
but it had i t s entire effect tl-rrougk education, This might suggest
that if we are concerned about increasing tumout among racial minorities, we shsulct address the larger question of why there are
racial differences in educational attainment,

Exercises
Answers to the exercises follow. T t is recom~neridedthat you attempt to complete the exercises before looking at the answers.

Below are tables showing the relationship between party competition and spending Eor education in the fifty states with a control for
the state" per capita income. Wl-rat conclusion would you draw
about the hypotl-resis that higher Levels of party competition calrse
states to spend more on education?
C:QNTRCILLING FOR ZNGCjM1t-i
(ALL CASES)
COMPETITION
SLrEmMC E-IlgI? Lozv
H~gi?
72% 36%
1-ow

28
64
100% 100%
N = 25
25
p

p

HIGH INCOME:
COMPETITION
SL3EmIIPJG Hzgh Low

85% 83%
15 17
100% lot>%
N = 20
4

LOW INCOME
COMPETITION
SPEmING E-ItgI? Low

20%
80
100%
N=5

21%
79
2110%
19

1C) (JolEege High School N Gamma Ghi" Phi" EDUCATION High School 29% Ciamrma = .63 140. l 8 700 B. Turnout by Education.i = 100 300 Ciarnma = .001) .00 ( p < .001) 1C)O% Phi" .51 ( p c .14 (p < .001 ) Phi2 = . Tables with N o Controls RACE RACE W R N O U T WC~ite Non-whzte \Toter "73% 50% 30 50 100% 100% M = 1.5 (p < .17 NON-WHTES EDUCATION High TURNOUT g Schorjl WO ter 7Q1' 30% Non-voter 70 100% 100% is.000 400 = .OIlO 4110 Garnlna = .C14 Non-voter College Voter 7 2 "0 N ~ P z -ter vo 29 100% IiJ = 700 TURNOUT EuUc~TiQr\l White Non-whzte 60% 25'% 40 75 100% 100% = 1.00 (p < .68 Cht" 50.40 Chi" 49.72 71 C:l-iiL = 257.3.5 Using Contingency Tables for Causal Xnterprearion Assuxnccl causal priority: Race. W l ) Phi2 = . turnout A.12 .BOX 10.71 ChtZ= 168.001) Phi2 = . education. C:antrolling for Race WHITES EDUCATION High TURNOUT Collegre School Voter 72% 30% 70 Non-voter 213 100% 100% N = 600 400 Ciarnma = .

What conclusion would you draw about the hypothesis that people who approve of the president's perhrmance in office are more likely to vote for the calldidate af the president" party? (As you migl-rc guess. (ALL <:ASES) C:ONTRCILLING FOR PARTY IDENTIFICTION APPROVm VOTE Demo. Controllir~gfor Education COLLEGE HIGH SCHOOL RACE RACE TUKNQm W/?l'te Non-white TURNBUT White Non-whit@ Vc~ter 72% 70% Voter 30% 30% Nc11.t-voter '7'0 70 100% 100% IOt7% 10OC% N=600 100 N = 400 300 The best-fitting model would look like tl~is: Below are tables showing the relatioilship between a responderrt's approval rating of the president and his or her vote in the next election with a control far the respondent" party identification. Tumour: by Raec.) Data are from a survey using random sampling. the president in this example was a Democrat. Rej1~6.6". Approve 80% 20 100% DEMOCRATS APPROVa I>isappruve VOTE Approve Disappfiwe 20% 90% 50% 88 I0 5() 100% 100% 100% .t-voter 28 SO P$ol.

).86 Czl-rih 169. 2.86 Czki h 83.:.. cantrolling fur development (rtYd). Calculate the partial correlation between instability and years since independence.alnbda = . Calct~iatethe partial correlation between instability and development.a01) Phi b -68 VOTE L3emo.ewnomic developmellt (measured as per capita GDP). The variables are the number of years since independe~lce. 80 Cht = 61.63 Gamma = +. Use the F-test to determine significance.Q0 Ciarnma = +.Mzaltivariate Statistics f 89 cot~tmussl! N = 500 500 1.Use the F-test to determine significance.ambcfa = .42 Below is a matrix of Pearson's r data on a r a r z d o ~sample ~ of fifty nations that were a11 a t some time in the past under the csntrsl of a colonial power.20 REPIJBHCANS INDEPENDENTS APPROVAL APPROVAL Approve t)&apprave Approve Disapprove 60% 10% VOTE 80% 15% 1.43 (p < . N = 200 100 I.ambda = .00 (p < ..42 (p c: .22 (p . 1.ifQlf 1% G .25 Gamma = +.28 1. and political instability Measured as the relative nur~berof "irregtilar executive transfers" f i a t have occurred in the nation. controlling for years since independence (r. .Q9 (iarnma = c.l.Q01f 1% G .arnbda = .a01) Phi L . Using the correlations in the matrix.88 Chi = 680.

This indicates that the reiationship between cc~mpetitionand spending was due to the effect of income and that these two variables do x~otaffect each other. and those who disapproved voted Republican. When we look at all respondents. Use the F-test to determine significance. when we control for states>er c q i t a income. Therefore. 4.34 -. development. both variaMes. that is. we see that there is a strong and significant relationship between approval and the vote. the relationsl-rip almost completely disappears. states with high competition are more likely to be states with high vending than states with low competition.00 DEVELOPMENT I> INSTABILITY I . YEARS V Ueam (yl) 1.3. Assuming tl-re causal priority years since independence. Calculate the multiple correlation with instability as the dependent variable wit11 development and years since indepelldence as the independent variables. Note that (as you can tell horn the N's in the control tables) party is related tc.52 Suggested Answers to Exercises When we loc>k at all the states. there appears to be a fairly strong positive relationship between party c s ~ ~ p e t i t i oand n spending on educacian. we c m col~clndethat presidetitial approval does affect voting in the next election. When we control for the respondent" party identification. instability3 determine the hest-fitting causal model for these variables. Democratic identifiers are more likely to approve of presider~tiaiperformance . the relationship remains strong and significant within each group of party identifiers. those who approwd of presidential performance voted Democratic. However. that is.

> 3-21 .r . 3.OS. This partial is significant. This partial is significant. F . .OS.21. But the effect of approval is clear even within the party identification cat- F 3. so p c . so p .Mzaltivariate Statistics f 91 and are more likely to vote for the Democratic candidate.

. is wketl-rer the partial correlation between instability and development. model 1 does not apply. The test for model 2.&. = -.Q5level). rd. r. spurious correlation. independent causation.71 and it is significant. As. is whether the simple Pearson correlation between years since ixldepe~iderlceand development is zero. = -34 (and an F-test shows that this is significant at the . As the inatrix shows. controlling for years since independence. model 2 does not apply. the calculations in question 1 above show. Therefore. Therefore.Mode1 1 Made1 2 Made! 3 Model 4 The test for m d e l 2 . is zero.

is whether the partial correlation between instability and years since independence. Bottl years since independence and econaxnic develr~pment (which are themselves interrelated) have a direct effect on political instability. .Mzaltivariate Statistics f 93 The test for model 3. r. Since the data fail to meet any of the tests for the first three mtrdels. = -. we conclude that model il.:. model 3 does not apply.42 and it is significant. controlling for development.is the most applicable. complete ca~zsation. is zero. Therefore. As the calculations in question 2 above show. intervening variable..

This page intentionally left blank .

" PPulzlic Opirtio~iQuarterly 34: 560-572. 62: 6-28. "The Impact u l Party Activity on the Electorate. Processing the News. Katz. 1954. Ennis C. Ansolahehere. Stephen.". How to Lie with Statistics. Graber. Phillips. 1994. Monroe. S . Bernard. New York: W W. Los Angeles: Pyrczak Pu blishing. Bereison. Cutright. 1962.. Kramer. ""Measuring the Impact of Local Party Activity a n the General Election Vote." Americican Journal of Political Scierice 21: 71-81. Ceorge C. "hoes Attack Advertising Demobilize the EIectorate?" American Political Science Review 88: 829-838. 510-512. Geratd H." American Journal of' Sociology 68: 182-194. New Yark: 1. 1970. et al. Chapel I-fill:University of North Carolina Press. 1998. Darrell." h b f l c Opinion Quarterly 25: 1-24. and Sarr. 'TJrbt~nismand Voter Turnout: A Mote on Some Unexpected Findings. 19". Huff. ()The Impact of Party Activity a n the Electorate. 2000. 1977.luel J. Causal Inferences in Nonexperixnentd Research. 2983. The Public Presidency New York: Sr. Mew York: Hafner. -1964. 2d ed. Content Analysis in Cr>mmuilicatioil Research. Eldersveld. Martin's. Alan D." Pubtic Opinion QtiarterIy 2".7 372-3861. Statistical Tricks and Traps. Doris A..References Alxner. Edwards. Huberr: M. Bfaiock. Norton. ""Public Opinion and Pu blic Policy. 1988. ""Four-Variable Causal Models and Partial Correlations. -. Daniel. 1 980-1 9939'a Pul-tlic Opinir~nQuarterly.oxigman. 1963.

Robinson.Mueller. CT: Graphics Press. 1980." Arnerican Journal of Political Science 2 l : 1 .. Gregory M.. Over the Wire and trrr TV. Thousand Oaks.. Edward R. "kriability in Electoral Behavior: Longitudinal Perspectives from Causal Modelirtg. 1975. and Stephen M. 2d ed. "Effects of Public Opinion on Policy. John E. . Prestde~ztj. CA: Sage Publications. Gerald M. Michael J. 1973. Pomper.. 1980. Benjamin I.2d ed. Mark A. New York: RrrsselI Sage. Thomas E. Lederman. Scott. Graphing Statistics and Data: Cresting Better Charts. a d Gerald M. Content Analysis: A Elandlsr>ok with Applications for the Study of international Crisis. Garrison. The Student Politicat Science Writer" ~Vanual. 1980. and Steven J. nlfte.1 8. Robert C. 1983.. Wolfinger. 1963.. IL: Northwestern University 13ress." American Political Science Review 77: 1071-1089. and Robert Shapiro. Anders. North. Raymond E.and PubEic Opinio~z. The Visual Display of Q~~ancicative Xnlormation. 1996. et al. and ~MargaretA. 1983. War. Mew York: Longman. Sheehan. with Susan S. Patterson.. 1998.New York: WiXey. Pomper. Electiolls in America. et al. The mass IVedia Election.. 1983. Upper Saddte River. New York: Praeger. Evanston. Cheshire. Rosenstone. Schufman. Page. Wallgren. Who Wjtes? New haver^: Yale University Press. NJ: Prentice Hall.

56-57 Empirical sentences. John H. 151 Exit poll. 52-54 Dichotomy. 56 Bar chart. 32-37 Factorial design. 108 Coplin. 132-134 Cutright. Stephen. Alfred N. 64 Graphics. 72 Causaiiry. 56 Generaliizations.Index Abramson.. 177-1 78 Bibby3John E. Walrer D. 159-1166. 101. 71 Explanation. 87 Difference of means test.S... 1 10 Analytical sentences. Bernard. 70 Congressional data sources.. 49 Edwards. 101. Doris A. 37 Fisher.--1. Cregory M.. 65 Bereison.5 Btaiock Hubert M. 1 1 1 Aldrich. 106-108 Bar one. Ennis C. Samuel J. 175 F-test. George C. 57 Eldersveld.167-173 Cook.5. 178 Causal modeling..55 . 5. 1i 3 3.. 46 Electicjn return sources.. 182 Burnhaxn. 3-8 Eta. 169-17.182-186 Controll variable. 107. 54 Cramer" V. 178-190 Case study. 176-3 77 Gamma. 54-56 Content Analysis. 101. 46 Data.124-3 32 Cluster sample. Philiips. 4 Ansolabehere. 2. 44 Balachandran. 1 1 1 Almer.. 5 8-64 Contingency tables. 101. 5 1 Carwood. 122-124 Garrison. Paul R. 47 Demographic data sources. 149. William D. 56 Captive population. IV.. LVichael. 56 Balachax~dran.. Joshua. 2.. 159-166. Rona'id A. principles for. 58 Beta weight. 151 Ecological fallacy322.. Rl-rodes M. 146-149.. 92-93. 55 Craber. 21-22. 101. 43 Chi-square. 24. 3 Experimental design. 31-32. 4 0 4 3 . 3 Goldstein.

. 108-189 Local data sources. 147-149. 98-1 Q 1 Natural experiment. Kenneth.. 54 Operational definition. 57 Hovey.56 IVueller. Kendra A. 101. Elizabeth Hann. 71 Phi. 108 ~Vackie. 68-70 Range. 71-72 Janda. 121-122 Level of measurement. S9 O' Lear)i Michael K.. 101.Index Graphics. 173-1 77 Multivariate statistics.Tbomas X. 64 Random digit Qialing. 90. 53. 48. 52-33 Internet sources. 124 Kendall's Tau C.151 Normative sentence. Norman J. 57' Multiple R.AIan D. 84-85 Ornstein. 117-120. Benjamin I. 54 Katz. 54 Nominai variabie. Gerald IV. 85 Intervening variable. 91 Mt~nroe. 64. 56. See Qt~asi-experiinentatdesign Niemi. John E. 109-1 l 2 Elastings. S5 Page. Thornas E. 130-232 Pie chart. 12. 1 44-147 Personal interview.-84 Noniinear relationship. problems with. 5 7 Partial correlation. Darrell. 101. 1 67-1 n3. 72 ~Vean. 46 Kendall's Tau B. 18-1 9. 1 80 Interviewing. 181 Kramer. 184 Inrternational data sources. 52. 37-40 Ragsdaie. 101. 64 Pearsun's r. 106 Pomper.. Daniel. 17-20 Interaction. Phillip K.. 70 Random sample. 42..-9 2 IVode.. 56 McGilfivray.. 83-.. Cerald H.. 46 Lambda. 109 Hypothesis. 5 7 Hastings.. 187. 3 Quasi-experime~ita!design. 54 Jodice. 57 Morgan. Harold A. 58 Intersubjective testhilit5 2 Interval variable. 59. Robert C. 83-89 Line graph. Lyn. S 1.. S6 Elovery. 56 Huff. 1 82 Prediction. Ricbard G. 3-8 North. 2. 179-1 82 Patterson. Katkleen O. David A. 59. 9 1-92 . Larry.. 55 Mait survey. AIice V. 23-28 Ordinal variable. 56.. 60"-61 . 54 Mackinson. 53.90 Median.

Sreven J. Charles L.. 93-95 Science. 65 S o l o ~ ~fo~zr-group o~l design. 90-92 United States data sources. Rohert. 112-113 ft-anley. S4 Statistic. 62. 22 Sampling. 48-49 Wallgren. Stepher1 M. 53-54 Unit of analysis. 5 7 b t e s .. %>avidW. Q5 Univariate statistics. 71-72 Scattergram. 60-6 1 Regression. 73-78 Survey research. 141-145 Research desigil. 5 7 Sheefran. 59 Rose. Raymond E. 67-71 Scarnmon. ~VichaelJ. See Research question Research question.. 108 Schulmail. Margaret A. 178 Ujifusa. 10-1 1 Shapiro. Richard. Difference of Means test Silvard. 175 Yule" Q. 57-58 Survey items. I82 Scott. S1 Seff-administered survey. 12.. 90 State data sources. Harold W*. 2. 120-122 .Recording unit. Ruth. 22 Woad.. Anders. f 06 Wolfirtger. Edward R. 92 Standardization. 8-1 l Rhode. 20-22 Theor)i 17-19 T-test. 49.. F-test. 101 Taylor.100 Significance test. S4 Rosenstone. Floris W.. 67-78 Tau B. 22-25.. 3 1-43 Research problem. 101. 111 Rubinson. 99. S4 Theoretical role of variables. 151 Tufte. Frank. Richard M. IVark A. 59 Significance. Grant. 54.. See Cbisquare. 149. 26. 180 Standard deviation. 37 Spurious correlation. 107.. 56 f tirvey data sources. T-rest..