Documentos de Académico
Documentos de Profesional
Documentos de Cultura
November 3, 2017
RCL 137
Rosenberg
SATTACA
The formation and later criticism of standardized tests as a metric of intelligence and aptitude
In 1997, a science fiction film called Gattaca was released. The movie was, by all
accounts, a flop in its time, earning barely a third of its budget back at the box office. Since then,
Gattaca gained notoriety for its unique premise, and has become a staple in high school biology
classrooms across the country. Its plot was centered around the concept of personal DNA
sequences becoming readily available for anyone to view. The idea was intriguing, both as a
simple sci-fi premise, and given its predictions in technological improvements that occurred after
the movie’s release. But at its core, Gattaca was about more than just genetic code. The film
offered a more fundamentally chilling premise: the idea that a person could be defined by a
simple set of data. That that data could be used to define you, and to discriminate against you. Of
course, no one wants to be transcribed into a tight little box— it’s dehumanizing. We all want to
believe that we’re special in some way, and being described by a number (or numbers) takes that
away.
On the other hand, classifications are often necessary. It’s much easier to take a simple
set of data than a list of vague, qualitative features, and so we often use them for efficiency’s
sake. And, in situations where sorting and classifying a large number of people is necessary,
simple numbers often represent the easiest go-tos. For these reasons, many researchers have
attempted to quantify the abstract quality of “intelligence” for years, with institutions jumping at
the first possible chance to make use of them. However, as time went on, our understanding of
intelligence, and social views on how to assess people, have expanded significantly, and using
these “simple numbers” has gradually fallen out of fashion. However, other applications for
intelligence testing (such as those used in the college admissions process) have persisted, albeit
in a lesser capacity. In this way, the development of intelligence testing has reflected broader
The Intelli-Gents
“book smarts,” and “street smarts,” (forgive my use of jargon). In 1905, this all changed. A man
by the name of Alfred Binet, decided to make a test quantifying intelligence. Binet, along with
colleague Theodore Simon, released a study in 1905, which compiled various established tests
for intelligence with some new ones that they came up with (Boake 385). In Binet’s view, none
of the individual tests mattered too much; rather, “the important information contributed by
intelligence scales was the subject's average performance over various tests” (386). Binet
ascertain their “mental age.” Later revision by psychologist Lewis Terman, modified the test,
replacing the “mental age with the intelligence quotient (IQ),” and, “supplementing the Binet-
Simon test with arithmetic reasoning items.” Following these changes, the new Stanford-Binet
assessment, released in 1916, “quickly became the dominant measure in American intelligence
testing” (388). Further revisions to the Stanford-Binet test added “a point scale format,” and,
“extended scales using similar items and activities” (Becker 2). The latest version of the
Stanford-Binet also seeks to measure eight different abilities, including quantitative reasoning,
visual-spatial processing, and verbal/non-verbal IQ, while the original only measured “general
intelligence” (4).
However, all of the intelligence testing up to that point did have a drawback: namely, the
extreme emphasis placed on verbal skills made the test unsuitable for many candidates. In
intelligence testing done on immigrants arriving at Ellis island, it was discovered that “the task of
mental screening was complicated by the fact that many immigrants spoke no English and had
little or no formal education.” For this reason, “the Ellis Island physicians rejected the Binet-
Simon scale as inappropriate for testing” (Boake 388). In response to such criticisms,
psychologist David Wechsler seeked to create a scale that measured intelligence as “the global
capacity of a person to act purposefully, to think rationally, and to deal effectively with his
environment” (qtd. in Cherry 1). Wechsler divided intelligence into two subcategories, the
“verbal,” and “performance” sections, which had already been the standard for some other tests
at the time. His Wechsler-Bellevue scale (the latter name being the psychiatric hospital in which
he worked), featured “fewer verbal subtests (Information, Digit Span, Arithmetic, and
Comprehension) than performance subtests… and does not include any memory tests” (Boake
397). The ability to distinguish between verbal and performance results, rather than having one
score comprehensively denoting intelligence, meant that Wechsler’s tests had more nuance.
Around the same time (after Binet but a little before Wechsler), some other scientists
attempted to quantify intelligence for different uses. In 1926, the College Board, a group of
universities in the North East, commissioned a team led by psychologist Carl Brigham to create a
test predicting the academic success of individuals based on intelligence. This assessment was
known as the SAT, or Scholastic Aptitude Test (“Brief History of the SAT”). Like other
intelligence tests of the time, the SAT had a verbal component which was integral to its overall
result. The 1926 SAT contained “nine subtests: seven with a verbal component… and two with
mathematical content” (Lawrence et al. 1). However, colleges didn’t necessarily see this as a
flaw: after all, most institutions would prefer students with good verbal skills, even if those don’t
Aside from the obvious need to distinguish between verbal and quantitative ability in
intelligence than those that are), some assert that intelligence is broader than just those two
categories. Howard Gardner famously outlined his own theory on the topic in his 1983 book
Frames of Mind: The Theory of Multiple Intelligences. Gardner’s theory redefined intelligence in
a more specific way than Wechsler’s view, and he was able to identify nine categories that fit
this definition (“Howard Gardner’s Theory of Multiple Intelligences”). While this idea does
provide a more expansive view of the concept than previous schools of thought, it has received
some criticism, namely for Gardner’s failure to provide a way to actually measure the categories
he discussed.
The purpose of Binet’s original research was for use in dealing with Children. The
French government commissioned Binet to determine which students required more assistance in
schools (Cherry 1). While the French government was mainly concerned with their students, the
US had other interests for these assessments. The earliest uses for domestic intelligence testing
was done in the US military. Psychologist Robert Yerkes developed a test, “designed to uncover
recruits with intellectual deficiency, psychopathic tendencies, nervous instability, and inadequate
self-control” (qtd. in Carson 284). These tests were widely used on soldiers during WWI, as a
way to identify suitable army candidates, and measure the mental capacity of officers (286).
Naturally, college-relevant tests, such as the SAT, and later the ACT, were used to
measure college readiness and candidate quality among applicants. After all, “the SAT was
intended to assess aptitude for learning rather than the received knowledge tested on the College
Boards” (“The History of the SAT”). One important fact to recall is that:
“The development of the IQ test in 1905 would eventually cause the College Board to
rethink its approach to the evaluation of university applicants. World War I-era U.S.
Army experiments with the IQ test led directly to the creation of the SAT” (1).
This means that the same criticisms that can be applied to other IQ testing of the time, can also
be applied to these tests as well (given that they use similar methodology, and have basis in the
Critical Mass(es)
For starters, it’s worth noting that the army testing was abandoned pretty much
immediately after World War I. Part of the reason being the nefarious roots of said programs. As
Carson writes:
“it is important to note the invisible hands of American Progressivism and American
eugenics throughout this story. For the army even to have entertained the notion of
The adoption of army intelligence testing did, in fact, coincide with the rise and fall of certain
movements. “Although social Darwinism was highly influential at the beginning of the 20th
century, it rapidly lost popularity and support after World War I (1914-1918).” (Bannister 1).
The ties of intelligence testing to eugenics-related ideology led to its quick downfall.
While fascism hasn’t necessarily had a hold in the practice for a while, IQ testing has
come under fire in the modern day for consistent error in outcomes among races. In a meta-
analysis of studies on intelligence testing, it was found that “reviews of the empirical literature in
the area focus on the Black-White differences, and the reviews conclude that the mean difference
in cognitive ability (g) is approximately 1 standard deviation” (Roth et al. 1). The problem of
minorities scoring lower on IQ tests has been evident for decades, but the issue came to a head in
1994, with the infamous release of The Bell Curve. The Bell Curve, a book written by Charles
Murray and Richard Herrnstein, examined the variation in intelligence of the American
populace, as well as its effect on various components of life. The book sparked a large
controversy, with criticism that it implicitly asserts minorities are of a lower innate intelligence.
This criticism persists into today, as Eric Siegel writes in an article published this year:
“The Bell Curve” endorses prejudice by virtue of what it does not say. Nowhere does the
book address why it investigates racial differences in IQ. By never spelling out a
reason… the authors transmit an unspoken yet unequivocal conclusion: Race is a helpful
While Murray’s observations of the existence of a black-white achievement gap are legitimate,
there are reasons for its existence. In his book Race and Intelligence: Separating Science from
Myth, Jefferson Fish writes that, “Cultural content, values, and assumptions are an inherent part
of IQ tests. Formal schooling teaches people ways of thinking that are then measured by the
tests” (xii). In this way, innate intelligence is inseparable from education. Any intelligence test
that takes place after a certain age, will necessarily have results influenced by environmental and
impossible to formulate, as it’s simply not feasible to fully control for all exogenous factors.
So where do college assessments fit in? Well, the same discrepancies that are common
for IQ tests in general are found to hold for college test (including the SAT) as well
Recall that the original intent of the SAT’s creators was to form a test that would be able to
quantify intelligence, with the theory being that a higher measured score would correspond to
better performance in-university. But if the SAT is similarly unable to account for external
variables, then this use is non-sensical: how can we employ an estimator known to be biased for
purposes of trying to discern natural intelligence? This problem pointed to a fundamental flaw in
SAT testing which administrators had pretty much no choice but to address.
In response to this argument, the rhetoric of the SAT has changed significantly. The
College Board's own website now reads that its test is “focused on the skills and knowledge at
the heart of education,” measuring, “what you learn in high school,” and, “what you need to
succeed in college” (“Inside the SAT”). The change in language implies that the newest version
of the test does not attempt to measure “natural” intelligence, but rather, skills and knowledge
that are generally helpful for college curricula. This change in language is in line with a
suggestion made by Christopher Jencks and Meredith Phillips in their book, The Black-white
Test Score Gap, in which they write, “testers should be able to rebut most charges of cultural
bias by labelling their products more accurately” (59). The idea of flawed labelling changes the
main point of criticism for tests like the SAT, by reframing the argument altogether. The new
language asserts that the tests are not psychometrically flawed; rather, their validity is simply
limited to measuring a specific skill, and not some greater innate quality.
Of course, regardless of how you define the SAT’s purpose, the actual scores matter far
more to those individuals taking it. In this respect, the test’s racial bias is harmful to minority
applicants. After all, lower scores mean lower chances for college admissions. This bias presents
itself in two ways: internal problems with the test, and larger, societal flaws. To the former,
certain portions of the SAT have been found to unfairly discriminate against certain races. For
instance, the previous versions of the “writing” portion had a vocabulary section, in which
participants were asked to identify the meaning of relatively obscure words. In an analysis of
questions which exhibited differential item functioning (meaning that they showed different
“Some of the easier verbal questions… favored white students. Some of the most difficult
verbal questions... favored black students… easier questions are likely reflected in the
cultural expressions that are used commonly in the dominant (white) society, so white
students have an edge… the more difficult words are more likely to be learned, not just
Recent redesigns of the test have attempted to rectify this issue, but there’s no evidence that the
To the latter case, studies have found that, “large gaps reflect the inequities in American
society -- since black students are less likely than white students to attend well-financed,
generously-staffed elementary and secondary schools, their scores lag” (1). But even if this fact
removes some of the blame from the test itself, it doesn’t really help solve the larger issue. If
minority candidates are being adversely affected by environmental factors which limit test
performance, the practical outcome is that they’re still at a disadvantage in the applications
process. To fix this, some universities have employed policies to “even things out.”
Specifically…
The NCSL defines affirmative action as policies “in which an institution or organization
actively engages in efforts to improve opportunities for historically excluded groups in American
society.” The most common form this takes is the lowering of specified admissions standards
when considering minority candidates. In theory, this makes perfect sense: going off of the
assumption that races are pretty much equal in terms of innate intelligence and exhibit similar
capacities for learning, then all AA does is correct for a fundamental error in testing. In fact, the
widespread adoption of affirmative action could be seen as a paradigm shift in and of itself. That
same article explains that since 1965, when President Lyndon Johnson signed an executive order
requiring affirmative action, more and more colleges have begun to use it, leading to increased
Of course, not everyone benefits from affirmative action. In a book called No Longer
Separate, Not Yet Equal: Race and Class in Elite College Admission and Campus Life, author
Thomas Espenshade presents data stating that Asian applicants are effectively penalized 140
points on the SAT relative to white applicants, and 3.4 points on the ACT (qtd. in Jaschik 1).
Using whites as a “baseline,” this implies that Asians are actively discriminated against in the
college admissions process. This problem is the root of an ongoing “lawsuit accusing Harvard of
discriminating against Asian-Americans by imposing a penalty for their high achievement and
giving preferences to other racial minorities” (Hartocollis and Saul 1). This throws the purported
“fairness” of affirmative action policies into question, as while certain groups may be put in
In trying to quantify something as abstract and nebulous as intelligence, one runs into the
fundamental issue of either trying to account for various extraneous factors, or simply ignoring
them in testing and acknowledging their presence after the fact. In either case, whatever
measurement is reached does not accurately represent any semblance of innate human
intelligence. Interestingly enough, the solution to the problems posed by intelligence testing is
simple: ignore them. By disregarding intelligence tests, we can easily curb their use. And this has
happened: since the court cases of the 70’s, traditional IQ tests have been rarely seen in schools,
Unfortunately, the college dilemma is not so simple. While schools have shifted away
from using SAT scores as a primary metric for acceptance, the benefits of a standardized
assessment are almost too good to give up. After all, a 1600 on the SAT in the wealthiest
Massachusetts prep school is the same 1600 in a poor urban school in Chicago. Conversely,
other parameters for applicant evaluations (transcript, GPA, extracurriculars, achievements) vary
significantly by school. In a sense, these other measurements could be seen as generally more
discriminatory than the SAT itself, specifically because they put poorer (and, by extension,
minority) students at a greater disadvantage. Regardless, it’s unlikely that college assessments
will go anywhere in the near future, even if the emphasis on them gradually diminishes. That
being said, things could certainly be worse: going by Ethan Hawke movies, maybe it’s better to
Boake, Corwin. “From the Binet-Simon to the Wechsler-Bellevue: Tracing the History of
Development and Cognition: Section A), vol. 24, no. 3, Jan. 2002, pp. 383–405.,
doi:10.1076/jcen.24.3.383.981.
www.verywell.com/history-of-intelligence-testing-2795581.
www.pbs.org/wgbh/pages/frontline/shows/sats/where/history.html.
Lawrence , Ida, et al. “A Historical Perspective on the Content of the SAT.” The College Board,
Oct. 2003,
http://www.ets.org/Media/Research/pdf/RR-03-10-Lawrence.pdf
https://www.niu.edu/facdev/_pdf/guide/learning/howard_gardner_theory_multiple_intelligences.
pdf
Bannister , Robert. “Social Darwinism .” Social Darwinism, Microsoft Encarta Online
Encyclopedia,
autocww.colorado.edu/~toldy2/E64ContentFiles/SociologyAndReform/SocialDarwinism.html.
Carson, John. “Army Alpha, Army Brass, and the Search for Army Intelligence.” Isis, vol. 84,
“The History of the SAT.” Test Prep for GMAT, GRE, LSAT, SAT, ACT, TOEFL by Manhattan
Fish, Jefferson M. Race and Intelligence : Separating Science from Myth. Lawrence Erlbaum
ezaccess.libraries.psu.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=nle
bk&AN=63483&site=ehost-live&scope=site.
“Inside the Test.” SAT Suite of Assessments, College Board, 31 Jan. 2017,
collegereadiness.collegeboard.org/sat/inside-the-test.
Jencks, Christopher, and Meredith Phillips. The Black-White Test Score Gap. Brookings
Institution, 1998.
Roth, Philip L., et al. “Ethnic Group Differences In Cognitive Ability In Employment And
Educational Settings: A Meta-Analysis.” Personnel Psychology, vol. 54, no. 2, 2001, pp. 297–
330., doi:10.1111/j.1744-6570.2001.tb00094.x.
Siegel, Eric. “The Real Problem with Charles Murray and ‘The Bell Curve.’” Scientific
blogs.scientificamerican.com/voices/the-real-problem-with-charles-murray-and-the-bell-curve/.
Jaschik, Scott. “New Evidence of Racial Bias on SAT.” Inside Higher Ed, Inside Higher Ed, 21
overview.aspx.
Jaschik, Scott. “Inside Higher Ed.” A Look at the Data and Arguments about Asian-Americans
www.insidehighered.com/admissions/article/2017/08/07/look-data-and-arguments-about-asian-
americans-and-admissions-elite.
Hartocollis, Anemona, and Stephanie Saul. “Affirmative Action Battle Has a New Focus: Asian-
Americans.” The New York Times, The New York Times, 2 Aug. 2017,
www.nytimes.com/2017/08/02/us/affirmative-action-battle-has-a-new-focus-asian-
americans.html.