Documentos de Académico
Documentos de Profesional
Documentos de Cultura
ABSTRACT.
different
predictions,
the same
criterion
be used
to choose
between
theories
that are
predictively equivalent? The work of the statistician H. Akaike (1973) is discussed in con?
nection with this question. The results are applied to two examples inwhich parsimony has
been invoked to choose between philosophical theories - Shoemaker's (1969) discussion
of the possibility of time without change and the discussion by Smart (1959) and Brandt
and Kim (1967) of mind/body dualism and the identity theory.
razor -
Ockham's
solve
scientific
between
to a second
imperfectly)
ny is used
to choose
sometimes
it is used
two applications
of the principle
related? This
and philosophical
is related (perhaps
arguments
sometimes
the principle of parsimo?
distinction;
between
theories
to discriminate
Does
the rationale
ly equivalent.
rationale for the other?
that make
between
different
theories
predictions;
that are predictive?
provide
1. REICHENBACH'S THESIS
In Experience
and Prediction,
Hans Reichenbach
argues that a difference
in simplicity
can
two quite different
have
among competing
hypotheses
sorts of significance.
two theories fit the available
When
data equally
well
168
ELLIOTTSOBER
substance
theories
of what
are related
claims
one
can be
incompatible
even if we
with
and Prediction
about predictive
that predictively
as a critique
equivalence
equivalent
to set this idea to
want
of simplicity
remains.
do
so, Reichenbach
The
one circumstance,
but not in the other.
To evaluate this thesis, we first would have to understand what justifies
the use of a simplicity
criterion in the case of predictively
non-equivalent
theories. We then would have to determine whether
that rationale transfers
to the case of theories
view
to draw
short-circuits
the problem;
it does
not solve
it.
a way
approach to Reichenbach's
that is more subtle. A number
when
theories
are
and why
simplicity
are not empirically
then will
Only
equivalent.
to see if an inferential principle
that makes good sense in one
empirically
is relevant
when
we be able
theories
leads to nonsense
circumstance
Reichenbach
in another.
his thesis
about
the principle
of
simplicity.
on
ever
to
will
sets
size
of
data
converge
eventually
increasing
simplicity
the truth, if there is a truth on which inference could converge. Reichenbach
the curve-fitting
this general
line of argument by discussing
illustrated
His
problem;
the accompanying
figure
comes
from page
375 of Experience
169
PARSIMONYAND PREDICTIVEEQUIVALENCE
Figure
I. Reichenbach's
(1938)
illustration
of
the curve-fitting
problem.
and Prediction.
this figure is preferable over the curve that ismore complex. By simplicity,
a curve obtained by connecting
meant
Reichenbach
the data points by
the curve to remove discontinuities.
straight lines and then smoothing
Reichenbach
this method
more
and more
that perfectly
there be).
The standard
to this justification
is that other principles
objection
on
the principle
of simplicity
the truth in the infinite
converge
limit. A procedure
that introduces crazy bumps into the curves it postu?
as the size of the data grows,
their magnitude
lates, but which diminishes
will agree with the principle
of simplicity when
the data set is infinite.
besides
However,
principle
the limit
for finite data sets, a procedure of this sort will disagree with the
as to which curve is best. In short, convergence
of simplicity
in
is not a sufficient condition
for justifying
the principle
(Hacking
1965).
There
is a second
to Reichenbach's
objection
sensible
inference procedures
argument.
to show that quite
It is possible
sometimes
violate the requirement
of convergence
in the limit.2 A method
is conver?
to
if
the
method
certain
is
the
truth
when
gent only
yield
applied to an
we
set.
data
infinite
If
abandon the demand for certainty in the face of finite
data, why
should we
impose
it in the hypothetical
circumstance
in which
170
ELLIOTTSOBER
Reichenbach
point (x,yc) on the curve; then square this distance and sum the squared
distances
for the entire data set. The fact that a curve's SOS value is greater
than zero hardly disqualifies
it from scientific consideration.
That Reichenbach's
argument
ignores the impact of error
the reason is that simplicity
criticism;
conflict. A simple curve will usually
curve
exactly; a sufficiently
complex
and goodness-of-fit
fail to pass through
is no
idle
are usually
in
the data points
means
assuming
are
of the problem
glides over this matter by
selects curves whose SOS values
of simplicity"
treatment
one? Reichenbach's
zero.
evaluate
bach's
mere
theories,
nonequivalent
plicity is relevant to choosing between empirically
but it is certainly a very central case.3 We will see that this treatment of
in the curve-fitting
the role of simplicity
considerations
problem provides
no rationale
whatever
for choosing
between
theories
prove
equivalent. This doesn't decisively
in the case of predictively
for nothing
equivalent
conclusion.
does lend support to that epistemological
in one context because
itmakes
of simplicity
commit an epistemological
equivocation.
Besides
Reichenbach's
thesis,
discussing
role of parsimony
considerations
good
in philosophical
theories.
it
However,
To use the principle
sense in the other is to
I also want
theorizing.
to consider
The
the
idea has
171
PARSIMONYAND PREDICTIVEEQUIVALENCE
gained
currency
of parsimony
can be invoked
to evaluate
raises
the question
of whether
use of a parsimony
philosophical
to the use made of that principle
in
2.
The
statistician
H. Akaike
philosophical
why
argu?
contexts.
AKAIKE'S THEOREM
and his
school
a set of ideas
developed
a
curves
of
may be
family of
et al. 1986, Forster and Sober
have
= a + bx.
(LIN) y
In this equation, a and b are adjustable parameters.
Once values are fixed
once
are
a
for these parameters
(i.e.,
adjusted),
they
specific straight line
is obtained.
use families of curves to predict new data from old data. The
Scientists
process
comes
best-fitting
smallest SOS
to predict what
member
score). Then,
new data will
is how well
the curve
in the family that best fits the old data will do in fitting the new data. A
in this two stage prediction
task on one occasion,
family might do well
so
on
but not
well
another. Intuitively
the predictive
speaking,
accuracy
of a family is how well
it would perform on average, were this two step
process repeated again and again.
To make this idea more precise,
to fix ideas, imagine that x is the temperature of the gas inside a kettle
and y is the pressure
that the gas exerts on the sides of that rigid chamber.
heat the kettle to different temperatures
and observe what pressure the
We
172
kettle
which
ELLIOTTSOBER
then experiences.
We
each of
thereby obtain several observations,
- a
can be represented
as a pair of numbers
in
the x-y
point
(x, y)
plane.
even if the
influences
the observed value obtained for y. For example,
true curve happens to be a straight line, the data points will almost certainly
fail to be exactly collinear. The reason is that observation
is always subject
to error, at least to some degree.
In the kettle example,
there is some
also
true relation
and
between
temperature and pressure, but the thermometer
accurate.
the pressure gauge don't always report values that are perfectly
the data don't necessarily
that comprise
the
The (x,y) values
represent
a
true pressure
associated
with
value; they represent
temperature
given
the observed pressure gauge reading associated with a given thermometer
reading.
Given
(PAR) y = a + bx + ex2,
curves? Because
the family of parabolic
(LIN) is a special case of (PAR)
=
that (PAR) will fit the
0), we know in advance
(obtained by setting c
that
this does not guarantee
data at least as well as (LIN) will. However,
new
a
true
at
If
relation
data.
the
better
do
will
(PAR)
predicting
job
than
is linear, (PAR) will probably do worse
of temperature and pressure
task. This is because
(PAR) will "over-fit" the data.
(LIN) in this prediction
(PAR) will interpret the data's departure from linearity as an indication that
oix and y is genuinely
nonlinear;
(LIN), on the other hand,
so to speak, is
as
error.
to
due
these
deviations
Over-fitting,
interpret
be
this mistake
of confusing
the mistake
signal and noise. How might
the true relation
will
avoided?
As noted earlier, given a
this problem more
generally.
level of goodness-of-fit
may be obtained
by
body of data, any desired
a
curve
often
is
that
sufficiently
complex.
Simpler hypotheses
constructing
new
do worse at fitting the data at hand, but do a better job of predicting
Let
us pose
is influenced
data. The predictive
accuracy of a family of curves apparently
it contains).
by how simple it is (i.e., by how many adjustable parameters
in
this is just a brute fact or can be understood
is whether
The question
some general and mathematical
way.
173
PARSIMONYAND PREDICTIVEEQUIVALENCE
of curves.
to data set D:
respect
Predictive
inaccuracy
[Bestfit
Average-SOS
of family F =#
(F, Dx),
D2l
When
We
predictive
generally.
that errors are symmetrically
distributed
around a curve, with large errors
less probable
than small ones, the SOS value of the best fitting
being
curve in a family F has a special meaning.
The member
of F that has
Fisher
a desirable
is obviously
feature of families
of
accuracy
use
we
want
to
to
a
families
when
fitted
old
will
do
curves;
that,
data,
new
at
data. However,
for all that, predictive
good job
predicting
accuracy
seems to be epistemologically
inaccessible.
It seems that we can't tell,
Predictive
data.
Akaike's
remarkable
epistemologically
An
unbiased
family F,
SOS
given
shows
The
estimate
[Best-fit
k is the number
- the
variance
degree
to have.
observations
Here
theorem
accessible.
of the predictive
data set D,
(F, D)]
that predictive
says that
is, in fact,
accuracy
theorem
is provided
+ 2ka2
of adjustable
of dispersion
The constant
inaccuracy
of
by the quantity
+ constant.
in Akaike's
theorem
dis
174
appears
ELLIOTTSOBER
when
ignored. Notice
the conclusion
member
Second,
second
data poorly
may fit the available
(i.e., have a high SOS score).
the family may have a large number of adjustable parameters. This
term in Akaike's
theorem gives simplicity
its due; the complexity
of a family ismeasured
it contains.
by how many adjustable parameters
see
that the number of adjustable parameters
It is important to
is not a
= ax + bx" and
an
of
the
feature
syntactic
equation. Although
equations "y
= ax + bz"
(a and
may each seem to contain two adjustable parameters
"y
?
can
so.
not
is
The former equation
be reparameterized,
let a!
a+b,
b), this
can be restated as "y ? a!x". For this
case the first equation
in which
in fact contains one adjustable parameter, while
the first equation
in
contains two. If you like, think of the number of parameters
a family as the number of quantities whose values need to be fixed for the
about data (given standard assumptions
about
family to make predictions
reason,
the second
error).
The
second
term in Akaike's
theorem,
also mentions
of adjustable
parameters,
variance
is large, this second
to the number
adverting
error
the
variance. When
this
besides
noisier.
the data
theorem to (LIN) and (PAR). Suppose
Let us apply Akaike's
at hand fall fairly tightly around a straight line. In this case, the best fitting
So Best-fit
straight line will be very close to the best fitting parabola.
(PAR, D) will have almost the same SOS values.
(LIN, D) and Best-fit
theorem says that the family with the smaller
In this circumstance,
Akaike's
to be more
is the one we should estimate
number of adjustable parameters
if it fits the data about
accurate. A simpler family is preferable
predictively
theorem
describes
how much
as well as a more complex
Akaike's
family.
a
more
must
in
family
provide
complicated
goodness-of-fit
improvement
sense to prefer the complex
for it to make
family.5
Akaike's
ever
theorem
it is, remains
of predictive
in the definition
hood function is "asymptotically
so that the
175
PARSIMONYAND PREDICTIVEEQUIVALENCE
As
noted
before,
Akaike's
theorem
a family's
individual
The
This
identifies
an unbiased
estimate
of
that
open the possibility
predictive
inaccuracy.6
estimates may
from this true value.
stray quite considerably
there are other unbiased
theorem does not say whether
estimators.
leaves
In addition,
there are other desirable
statistical properties
of an estimator
so
a
as
is
besides
to how various
there
unbiasedness,
genuine question
ought to be traded off against each other. However,
optimality
properties
the fact that important details remain unsettled
should not obscure
the
fact that Akaike's
in the task of
approach has made
significant headway
in hypothesis
the role of simplicity
evaluation.
explaining
to curve-fitting,
Akaike's
theorem
the illu?
applies directly
Although
is more general. The theorem explains why a unified
mination
it provides
to a disunified
theory is sometimes
preferable
theory; it also shows why
tomodels
that postulate fewer causes are sometimes preferable
that
more
Nor
and
Sober
should
the
surface
(Forster
1994).
postulates
appear?
ance of the curve-fitting
lead one to think that the Akaike
format
problem
models
in the Akaike
framework.
Akaike's
theorem
addresses
the gen?
the
of
selection", meaning
problem
evaluating
that contain adjustable parameters. The focus on the predic?
propositions
in no way limits the theories that can be considered
tive accuracy of models
to ones expressed
in some sort of "observation
language". The quantities
eral problem
of "model
is called "unidentifiable".
this requirement
So as to give the reader more of a feel for how the Akaike
let's now turn to two famous controversies
in the history
works,
violates
framework
of physics.
The Copernican
and Ptolemaic
fit
the
observations
then
available
systems
the
relative
of
bodies
about
positions
concerning
heavenly
equally well.
a
Ptolemaic
the
model
included
far larger number of adjustable
However,
these represent the "epicycles"
that made Ptolemaic
parameters;
astronomy
the very paradigm of an unparsimonious
theory. Although many philoso?
that the virtues of the
1957, p. 181) have claimed
phers (notably Kuhn
are
the Akaike
offers
framework
Copernican
hypothesis
purely aesthetic,
a much more down-to-earth
of
the
model
is
explanation
why
Copernican
its estimated
is much higher.
accuracy
predictive
this example with the controversy
that arose in connection
with Newton's
postulate of absolute space. Leibniz and many others took
preferable;
Contrast
176
ELLIOTTSOBER
in Newton's
element
this to be a defective
absolute
model;
space seems
of an unparsimonious
to be a perfect example
this
However,
postulate.
no
not
in the Akaike
is
framework. There
is
way to
analyzable
example
that represents
the value of a parameter
the velocity of a physical
estimate
framework applies
object relative to absolute space. It isn't that the Akaike
us
with
what
is
Newton's
the
framework
and tells
model;
wrong
simply
the Newtonian
is not identifiable.
model
does not apply at all; because
itmight seem that parsimony
considerations
over
should
be
the Ptolemaic
astronomy
preferred
At first glance,
Copernican
same sense
that parsimony
considerations
absolute
space is better than one
without
Akaike
strongly
approach
suggests
explain why
that includes
explain why
system in the
a physics
that does
equivocation.
3.
this
I'll be using the concept of "predictive
equivalence"
throughout
a
means.
not
it
about
what
This
is
I
should
say something
paper,
big job,
of observation
it requires a treatment of the concept
least because
(on
a few remarks are worth
which see Sober 1990a, 1993b, 1993c). However,
Since
here, incomplete
making
There
is first of all
equivalence
assumptions
Typically,
ries when
but theo?
that make predictions,
by themselves
means
This
that the
with
auxiliary assumptions.
supplemented
a
not
should
be
concept of predictive
two-place equiv?
equivalence
it isn't theories
primary
alence relation,
of antecedently
but a three-place
relation
of equivalence
relative
to a set
assumptions.
accepted background
is the idea that theo?
important, and less often recognized,
do not have
with background
ries, even when supplemented
assumptions,
about
deductive
observations;
rather,
they assign probabilities
implications
Far more
outcomes.
This means
that we should
observational
possible
terms
in
theories entail
of what
not understand
predictive
equivalence
be understood
should
about observations.
Rather, predictive
equivalence
to different
When
PARSIMONYAND PREDICTIVEEQUIVALENCE
theories
as making
only
probabilistic
contact
with
observations
177
(Forster
1988).
in what follows will require the distinction
my discussion
Although
between what is observed
and what is not observed but only inferred, the
I'll deploy is not a rigid or absolute one. Whether
a
concept of observation
an
a conjecture
observation
under
report, or formulates
given statement is
state?
test, often depends on the problem at hand. In addition, observation
as I'll use the term, often employ
theoretical
and
their
ments,
concepts
confirmation
and disconfirmation
often depends on the use of instrumenta?
tion and background
theories. Roughly,
the idea is this: When
the question
is raised
about which
statement"
will
of two theories
describe
a detectable
is more
feature
an "observation
plausible,
of the environment
about
to make a reasonable
it is possible
judgment without
already hav?
as
an
to
formed
true.
sense that
which
is
It
is
in
this
ing
opinion
theory
are relatively
not
observations
so
(Sober 1990).
theory-neutral,
absolutely
about the "observa?
Despite widespread
among philosophers
skepticism
which
tion/theoretical
of observation
practices make
able to provide
theorem
when
is to estimate
we
apply
to be predictively
There are two possibilities.
The
equivalent?
first is that their Akaike
estimates
of predictive
turn
out
to
be
accuracy
the same; the second
is that they do not. In the first instance, we find
that the estimation
reinforces what we already knew; in
procedure merely
the outset
the second,
we
178
ELLIOTTSOBER
of a simplicity
applications
the Akaike
without
others;
are all of a piece.
criterion
in philosophy
it is quite
framework,
4.
any
change? That is, can time pass without
mere
the
other
than
of
time
itself?
If
the
external
changes occurring,
lapsing
are
of perceivers
world
is frozen for a while and if the thought processes
the passage of time during
likewise frozen, none of them will experience
there be
Can
time without
one might
In addition,
that a moment
of
imagine
leave no trace that later perceivers
will be able to
that universal
freezes are not the sorts of events for
arrest will
This
detect.
suggests
evidence
observational
which
conclude
that it could
could be mustered.
never
be reasonable
And
occur.
(1969) has
Sydney Shoemaker
to show that this line of reasoning
a clever
invented
is mistaken.
He
that seems
example
us
to imagine a
asks
and Z
At
a freeze
planet
5 6 7 8 9 10
see
periodicity;
pattern holds
in the data
freezes
12
every
through year 59.
14
F
F
so far
13
F
F
we
11
(1)
What
planets
or
observe
is that each
3 years,
planet has
every 4, and Z
15
F
...
59
. . .
...
. . .
its own
every
fixed
5. This
179
PARSIMONYAND PREDICTIVEEQUIVALENCE
the events
Although
recorded
ismore
happens subsequently
their
ing what the people on the planets observe. They note that when
a
there is subsequently
calendars
read "60", no one is frozen. However,
in play. This is how they record their
shift in the timing of the regularities
observations:
...
(2)
After
55
. . .
...
. . .
the hiatus
56
58
57
59
60
61
62
(3)
66
67
...
69
68
. . .
115
116
x
Y
Z
117
118
around
...
65
64
F F
continue
again. They
at which point
63
. .
119
120
121
122
...
123
F
F
F
FF
. . .
. . .
the 3,4, and 5 year cycles begin again, but once again there is a hiatus
after a certain number of repetitions. We may imagine that the inhabitants
this pattern numerous
times. They have lots of data.
experience
we
If
take this data at face value, we will infer the following
general
Then
pattern:
(NUF)
years o,
Planet X
freezes
Planet Y
freezes
Planet
Z freezes
where
c is the
largest
integer
+
+
+
c)/3]
c)/4]
c)/5]
=
=
=
0,
0,
0,
180
ELLIOTTSOBER
its cycle
of periodic
freezes.
asks us to entertain
Shoemaker
an alternative
there is a universal
freeze
Suppose
a
and
that
lasts
year 59
single year;
the same thing happens
after the observable
year
suppose
immediately
118. These years in which universal freezes alleged occur are "hidden", in
what
theory concerning
that occurs right after
the sense
is happening.
the observable
that no one
are happening,
nor could
that the universal freezes
to a new,
rise
by hidden
supplemented
observable
12
...
58
59
augmented
12
...
58
59A61
60
61
...
117
118
62
"
118
119A121
...
120
122
120
can describe
the value
119
?60
If O
are related:
is how
years. Here
A as a function
years and A
of O. Given
a =
a =
o+
a =
o + 2for
118
111
In other words,
=
o + c where
1+ 59c < o.
Note
year
is numbered
this explanation
of what
can state the universal
freeze
(UF)
the
largest
in the augmented
such
that
calendar,
years
the augmented
calendar amounts
terms
of it:
in
hypothesis
to, we
a,
Planet Y freezes
Z freezes
Planet
integer
isn't true.
Given
now
is
0,
= 0,
= 0.
the
in data set (2), but views
recorded
(UF) agrees with the observations
calendar it uses as incomplete.
observable
(2), arriving
(UF) supplements
in the augmented
at the following
pattern for what supposedly
happens
calendar:
181
PARSIMONYAND PREDICTIVEEQUIVALENCE
...
ber
. . .
. . .
. . .
X
Y
Z
(2a)
56
55
57
58
59
60
61
62
64
Whereas
. . .
. . .
. . .
F
F
65
F
F
F
F
63
(NUF) postulates
fixed peri?
calendar,
(UF) postulates
sequences
uninterrupted
containing
as a
In particular,
ods in the augmented
calendar.8
(UF) can be viewed
one
a
of
whose
is
other
is
and
the
(NUF)
conjunction,
postulate
conjuncts
about hidden years:
(H)
mod(o/59)
universal
Since
after
Immediately
=
there
observable
is a hidden
o
year
year in which
such
that
is a
there
freeze.
is equivalent
(UF)
0,
each
to (NUF) &
(H), we must
be careful
that our
these
two hypotheses.
are from which
hypotheses
families must
the families
Each
of
of these
contain
the data. At
can be obtained
member
of
3 + 2x
from
the family
adjustable parameter.
which family we should consider.
Two ideas should guide our choice, however.10
First,
about the families we should
inquiry tells us something
the context
associate
of
with
the families
should endorse
(NUF) and (UF). In the Shoemaker
problem,
or deny the existence
of universal freezes,
it being left to the data to decide
what specific patterns are asserted to obtain. The second piece of guidance
is that it must be possible
for the adjustable
to be estimated
parameters
from the data. Let us begin by considering
two families:
the following
182
ELLIOTTSOBER
For all observable
FAM(NUF)
years
o,
c is the largest
c[LCM(x,y,z)-
1]
integer
< o.
such that
& Immediately
FAM(NUF)
such that mod{o/[LCM(x,
FAM(UF)
hidden
year
in which
listed.1
calendar
and parameters
First, that is how the data are described,
from the data. Second,
it is hard to see how (NUF) could
even be described
in the augmented
calendar.
(NUF) denies that there is
such a thing as the hidden years postulated
the
calendar; this
by
augmented
must
be estimated
cannot be described
hypothesis
in hidden years or as remaining
as denying
that there are universal
freezes
what
about
agnostic
happens during hidden
years.
are obtained
I concede,
is an intuitive one. How,
then, are we
judgment,
in this case?14
for the fact that intuitions are misleading
PARSIMONYAND PREDICTIVEEQUIVALENCE
183
postulated
of parameters
in each family of hypotheses whose values must be estimated
This is why the calendar of observable
years is fun?
from the observations.
damental. The observable
calendar and the augmented
calendar have quite
different
is reflected in the
epistemological
standings, and this asymmetry
no
to
It
attention
isn't that Akaike pays
theoretical
the
analysis.
simplicity;
of a theory must be judged by seeing how the
point is that the simplicity
One of the beauties of Shoemak?
theory makes contact with observations.
er's example
is that it illustrates an important difference between "simplic?
and the simplicity captured
ity of abstract pattern" (algorithmic
simplicity)
framework. Besides
the fact that these two approaches
by the Akaike
yield
analyses of the problem at hand, there is an additional difference
that is quite fundamental: Whereas
the Akaike
approach
explains why
is epis
(as measured
simplicity
by the number of adjustable parameters)
different
with
Akaike
should
framework
of their intuitions
ask
freeze
in this case
hypothesis
postulates
less parsimonious
than the
ontology,
over
does
take
pattern
hypothesis. Why
precedence
ontology
when the overall simplicity
of the two theories is compared?15
Given
that Akaike's
theorem concerns
the estimate of predictive
accu?
no
it
is
that
the
theorem
to
should
fail
between
racy,
surprise
distinguish
two predictively
theories.
It is nonetheless
to see
instructive
equivalent
a simpler
alternative
abstract
pattern,
but a
184
ELLIOTTSOBER
5.
THE MIND-BODY
PROBLEM
Brandt and Kim (1967) go further; they maintain that dualism is quite
consistent
the existence
with
of perfect
correlations
between
the mental
physical properties, we will begin with this idea, even though there ismore
than this.
to the theories
Let's
property
measure
M.
of their degree
Pr(P &M)
covariance
will
a physical
property P and a mental
such as these, the standard
characters
=
is their covariance;
of association
Cov(P, M)
two properties
dichotomous
With
consider
Pr(P) Pr(M). IfP occurs when and only when M does, their
be positive
and will
Pr(P)
Pr(?P),
which
are "perfectly
their
that when properties
correlated",
as
a
other
On
a
the
of
function
but
varies
isn't
covariance
constant,
Pr(P).
of
is probabilistically
of one property
hand, if the occurrence
independent
more
extreme.
Note
185
PARSIMONYAND PREDICTIVEEQUIVALENCE
the occurrence
of the other,
of Pr(P).
It doesn't make much
the covariance
will
be zero,
of the
independent
value
sense
of the mental
the probabilities
of these con?
are one and the same property,
joint events?
=
the identity theory asserts that Pr(P & -M)
& M) = 0, and that
Pr(-P
and Pr(-P
& -M)
take any pair of values
that sum
Pr(P & M)
may
to one. Thus, the identity theory endorses a model
that contains a single
What
does
parameter:
adjustable
(Ident)
p and Pr(-P
& -M)
= 1p.
three adjustable
parameters:
now
& M)
= p2,
& -M)
= 1-
Pl
p2
p3.
an experiment
that assembles
empirical frequencies
likelihood estimates of the parameters
in (Ident) and
be obtained. This experiment will mimic
the structure of the
imagine
from which maximum
(Dual) may
simple inference
that was
discussed
problem concerning
at the beginning
in a psychology
experiment
monitors
record when subjects
and
are
with
with
and pressure
the subjects
of time. The
This
186
ELLIOTTSOBER
"ouch"
says
Subject
no
yes
yes
meter
c-fibre
says
"firing"
no
error is possible,
then all four possible
at least sometimes,
if the data set is sufficiently
If observational
occur
observations
will
characteristics
causes
of observational
To describe
tions, we must
tional error. A
(Ident)
supplement
reasonably
general model
contain
four param?
eters:
(Error) Pr(Subject
Pr(Subject
says "Ouch"
Pr(Meter
says "c-fibres
Pr(Meter
firing"
IPain) = e\
= C2
I
No
firing"
c-fibres
Ic-fibres
e^
= e$
firing)
firing)
If we assume
on the conjoint
states (?Pain
underlying
firing"), conditional
are firing), as products
and their
of the above probabilities
complements.16
In the Akaike
advance
error probabilities
framework,
or are estimated
from
of the experiment
are either
the data,
in the models
in
specified
case
in which
under evalu?
parameters
they represent further adjustable
is
viable. The
the
former
In
the
ation.
present experiment,
only
option
2x2
table tell us the four frequencies
observations
given in the previous
sum to unity.
says "c-fibres firing"]. These must
we know
that
that there are three independent
frequencies
we
as
treat
If
in
unknowns
the parameters
the observations.
(Error)
of [?Ouch
This means
from
& ?Meter
187
PARSIMONYAND PREDICTIVEEQUIVALENCE
be estimated
of values
to these parameters
the probabilities
of the
observations.
If this were
of error in the
say about the occurrence
then the Akaike
framework would not underwrite
the parsi?
experiment,
mony argument in favor of the identity theory that Smart, Brandt, and Kim
put forward.
all one
However,
evidence
could
it is not absurd
that allows
independent
(Error). Perhaps we have
associated with utterances
conjoint frequencies
find the maximum
of "ouches"
likelihood
readings now
of the parameters
and meter
estimates
can be used
in (Ident)
to
and
(Dual).
Because
while
consider
what
and causation
Dualists
mental
the identity
of behavior.
say about
the explanation
often maintain
but
properties,
in a way that the purely physical
of organisms
cannot. Even
properties
if the physical
characteristics
of an organism were fully specified,
this
account of why the organism behaves as it
would not provide a complete
does. Parallel
remarks pertain to the issue of prediction;
for the dualist,
188
ELLIOTTSOBER
the physical
traits of an organism provide, at best, an incomplete
basis on
to predict what the organism will do. To be sure, it is possible
to
which
a
formulate
dualism
in such
have no causal
way that mental
properties
or
own.
a good part of
But
of
their
power
explanatory
efficacy
historically,
in dualism has centered on its claim concerning
the irreducible
the
of
and explanatory
mental.17
importance
If we understand
in this way, what are we to make
of the
dualism
a
are
a
If
mental
and
then
identical,
property
identity theory?
physical
the causal efficacy
of the one just is the causal efficacy
of the other. I
the interest
causal
see nothing
are
to fault in the idea that names for the same property
is perhaps
in causal contexts, salva veritate. Explanation
intersubstitutable
a less straightforward
has to do with gains in
since explanation
matter,
In any case, I'll
and so may involve subjective
elements.18
understanding,
set to one side the question
of what the identity theory should say about
on what it says about causation.
concentrate
and
explanation,
I want to
the identity theory and dualism directly,
addressing
a
framework
make
explains why paucity of pos?
general point: the Akaike
a theory's estimated
tulated causes enhances
accuracy
(Forster
predictive
Before
and Sober
The
example.
Suppose B
causes.
dichotomous
first model
whether
causally
(One Cause)
(Two Causes)
a difference
and M
may
in
be
Pr(B
P&M)
Pr(B
P&-M)
Pr(B
Pr(B
-P&M)
= c
-P)
Pr(B
P & M)
Pr(B
P&-M)
= b
Pr(B
-P
&M)
= c
Pr(B
-P
&-M)
=
=
= a
Pr(B
P)
Pr(B
-P&-M)
= a
= d
Notice
a 2-by-2
table in which cell entries indicate
each treatment who exhibit the effect B:
the fraction
of individuals
in
189
PARSIMONYAND PREDICTIVEEQUIVALENCE
to estimate the
If we use the empirical frequencies
(w, x,y,z)
the two models,
itmust emerge that the likeliest member
of
will fit the data at least as well as the likeliest member
of
The two models will show the same degree of goodness-of-fit
w ? y and x = z; if the four frequencies
differ even slightly,
will fit the data better.
parameters
in
(Two Causes)
(One Cause).
only when
(Two Causes)
The Akaike
framework
tells us how to evaluate
these two families
of models
of
considerations
i.e., how to bring together the conflicting
w
x
are
as
are
If
and
and
then
and
z,
close,
y
simplicity
goodness-of-fit.
the slightly better goodness-of-fit
of (Two Causes) will not compensate
In this case, we choose
is less parsimonious.
the simpler model because
it has a higher estimated predictive
accuracy.
if w, x, y and z are all very different,
then we should sacrifice
However,
and prefer the more complex model.19
parsimony
for the fact that this model
represented
by (One Cause),
family
= Pr(P P) = Pr(P M) = a,
|P & M)
|
|
=
=
-P
&
Pr(P |
-M)
Pr(P |-P)
Pr(P |-M) = d,
and Pr(P |P & -M) and Pr(P | -P & M) are both not
(Ident-2) Pr(P
defined.
Note
that (Ident-2)
parameters
whereas
(Two Causes)
has four.
I so far have provided
two treatments
The first has them postulating
different
which
mental
endorsing
and physical
models
different
of dualism
properties
of how mental
and physical
properties
confer
190
ELLIOTTSOBER
and
three adjustable
their efficacies.
parameters
a model with
identity theory endorses
two representing
the efficacies
of a single
The
-
and one
on the
the probability
of that cause [Pr(P & M)].
Dualism,
representing
other hand, puts forward a model with seven independent
parameters
four for the efficacies
of different combinations
of causes
[Pr(i? \?P &
? M)]
When
favors
occur.
and three for the probabilities
with which these combinations
these models
fit the data about equally well,
the Akaike
framework
the model
us now
associated
return
the identity theory. But they deny that this is so. Brandt and Kim suggest
in the same sense that
that the identity theory doesn't explain observations
theories do. They conclude
(pp.
evolutionary
theory and other scientific
the identity theory and the theory of
that Smart's analogy between
533-^)
"is pernicious
in that it lends, or at least tends to lend, a false
evolution
a philosophical
to what
is essentially
respectability
the mental
correlations
between
of
interpretation"
perfect
air of scientific
and
speculative
the physical.
and
bach's
ical" character
191
PARSIMONYAND PREDICTIVEEQUIVALENCE
the identity theory provides no reason
is false.
theory is true and dualism
It should
be clear
dualism
make
6.
I have
defended
quite
different
CONCLUSION
assessments
of
examples
of time without
in this paper. Shoemaker's
discussion
change
and the Smart, Brandt and Kim discussion
of the mind/body
problem both
are
a
to
of
that
theories
be
present
supposed
predictively
pair
equivalent;
discussed
have
fitting problem
to discriminate
does
between
as a justification
for using simplicity
theories.
The second con?
equivalent
predictively
is that a difference
in simplicity
between predictively
equivalent
theories counts as a merely
aesthetic or pragmatic consideration;
it is not a
one
for
that
true
is
and
the
other
is
I
false.
ground
thinking
suppose
theory
it is possible
to accept the first of these conclusions
but not the second. I see
no reason to do this, but those inclined to extract this lesson should do so
clusion
with
their eyes
192
ELLIOTTSOBER
is to be judged holistically
economy
by the twin criteria of conceptual
and empirical adequacy
(Quine 1953); to think that scientific propositions
answer to one set of standards while philosophical
answer to
propositions
belief
is to be trapped by an untenable
dualism. Kuhn's
(1970) work in
a similar picture. For Kuhn,
the history of science has defended
science
the
is unavoidably
saturated with philosophical
perhaps
presuppositions;
in Kuhn's
view of science has been to downplay
the
impulse
principal
another
else.
Scientists
desert
This
of the boundary
the
crisp contrasts
largely replaced
but for reasons that bear rethinking.
blurring
between
science
and philosophy
has
and positivism,
posited by empiricism
In "Empiricism,
and Ontol?
Semantics,
In Experience
internal and external questions.
viewed
the Akaike
framework
as elements
framework,
that can be estimated
the fundamental
193
PARSIMONYAND PREDICTIVEEQUIVALENCE
are known
to be false,
reasonably
accurate
predictions.
these elements
Although
in the Akaike
approach do not accord well
said in favor of scientific
realism, the approach
is equally at odds with some forms of empiricism.
There is no distinction
between
theories that are strictly about observables
and theories that are not.
with much
of philosophy.
Philosophy,
they believed,
allow us
epistemological
critique would
are.
to see these problems
for what they
The subject of the present paper
more modest.
has been more
is accordingly
limited, and my conclusion
and
when
in terms of the
understood
considerations,
Parsimony
simplicity
important
was mired
Akaike
framework,
do not license
the parsimony
constructed.
that philosophers
have sometimes
are always misguided;
as we have seen, the identity
of parsimony
can
over
a
be
in
defended
dualism
way that fits fairly well into the
theory
are more alien
Akaike
framework. However,
other appeals to parsimony
uses
to the Akaike
are more
dissimilar
NOTES
*
I am grateful
to Martin
Walsh
for comments
Denis
Barrett,
Eells,
Ellery
on earlier drafts. My
Malcolm
thanks
also
and
Forster,
Gregory
Mougin,
to members
of the philosophy
departments at London School of Economics andWayne State University for useful dis?
cussion.
1
Examples
would
include
the constructive
empiricism
of Van
Fraassen
(1980),
the con
trastive empiricism of Sober (1990a, 1993a, 1993b), and the gentle empiricism of Earman
(1993).
2
I argue this point in connection with maximum likelihood estimation in Sober (1988a,
1988b).
3
In Sober (1990b), I discuss examples in which differences in simplicity or
parsimony
reflect differences in likelihood or differences in prior probabilities. From a Bayesian point
of view, these considerations must exhaust the relevance of simplicity. The approach to
the curve-fitting problem I'll outline in the next section fits neatly into neither of these two
formats.
Bayesian
4
We assume
here
and new
data
sets contain
the same
number
of observations,
so that what is defined above is predictive accuracy with respect to data sets of size n.
Without this restriction, the definition should be given in terms of predictive accuracy per
datum.
5
Akaike's theorem also reflects the fact that the amount of data is relevant to
deciding how
much weight simplicity deserves. If there is a very slight parabolic bend in the data, itmay
194
ELLIOTTSOBER
sense
to favor (LIN) when
if the data set
the data set is relatively
small; however,
a curve's
Since
is quite
SOS almost
goes up as
large, (PAR) may be preferable.
inevitably
this means
is increased,
the number
of data points
that the Akaike
estimate
of predictive
more
and less and
is determined
and more
considerations,
by goodness-of-fit
inaccuracy
as the data set increases
in size.
less by simplicity,
6
n ounces. A
To understand
consideran
that weighs
the idea of an unbiased
estimate,
object
an
of
if
estimate
the
balance
unbiased
repeatedly
spring
object's weight
weighing
provides
make
of n ounces;
he goes
of what
these
Whether
events
the physical
are consistent
are
laws
with
how we would
epistemically
8
Scientists
about
theorize
them.
not whether
possible,
to express
often prefer
that permit
the
is an
physics
at hand. In effect,
current
the calendar
might
prefer
For
(UF')
all augmented
If planet
their
and
is
freeze
possible.
freezes
The
fact
years
a,
in year
generalizations
that X\
first
in a format
that doesn't
a, then
it freezes
freeze
in year
a +
3,
4,
5.
If planet
freezes
in year
a, then
it freezes
in year
a+
If planet
freezes
in year
a, then
it freezes
in year
a+
in year
o, then
it freezes
every
in which
in year
case X
o+
3 years,
in year
case Y
o+
4, unless
next
freezes
For
(NUF')
it is physically
a universal
is whether
question
depend
to occur
in a year
happened
the number
the point. We
3 is besides
of observed
years
assigns
to consider
of the two hypotheses:
free" representations
"coordinate
on a unit of measure.
essentially
to which
Shoemaker's
of
may
interesting
Shoemaker
therefore
on to describe.
measurements
individual
all observed
years
If planet X freezes
in a series of freezes
If planet Y freezes
in a series of freezes
If planet Z freezes
in a series of freezes
o,
in year
o, then
it freezes
every
4 years,
in which
in year
o, then
it freezes
every
5 years,
in which
3, unless
next
freezes
o+
in year
5, unless
case Z next freezes
o is the
19th
o+
in year
o is the
in year
in year
14th
o+
o is the
5.
7.
11th
o+
9.
if these for?
not be affected
The conclusion
I'll reach concerning
(UF) and (NUF) would
were used instead.
mulations
9
status. This
is an
cannot
in their epistemic
differ
I assume
that equivalent
hypotheses
basic
is
the
in
which
common
theories
to
all
confirmation
concept.
probability
assumption
10
in Forster
and Sober
"error theorem"
discussed
I omit here discussion
of the important
is often more
families
of lower dimensional
the predictive
accuracy
(1994).
Estimating
of families with higher dimensionality.
to error than estimating
the accuracy
subject
11
to assume
that the periodicities
in advance
In setting up this problem,
there is no reason
decimal
have up to, say, a hundred
that they may
must
So let's allow
be integer valued.
places.
12
In describing
to say anything
observa?
about whether
I neglected
Shoemaker's
example,
that inhabitants
this idea; just assume
to error. It would
are subject
be easy to introduce
on others. However,
the conclu?
what happens
in observing
of one planet can make mistakes
makes
this detail superfluous.
have the same number of parameters
sion that the two families
tions
Regardless
estimate
of whether
of predictive
there
accuracy.
is error
or not,
the two
families
must
have
195
PARSIMONYAND PREDICTIVEEQUIVALENCE
13
Of
a
to associate
it is easy enough
that has more
(UF) with
adjustable
family
I doubt
the one associated
with
is that there is any reason
to
(NUF). What
a family
that has fewer
than the one we should
associate
(UF) with
parameters
course,
than
parameters
associate
with (NUF).
14
One possibility
a quite different
inference
Let us forget about the epistemological
problem.
of local and universal
freezes
and imagine
that inhabitants
of the planets
"F" (e.g., the occurrence
have property
of snow storms) periodically
planets
about
thinking
consequences
that various
1-59. Given
years
the data
in (1), how
presented
should
these
people
choose
between
see
in
the
following hypotheses?
(HI)
(H2)
Notice
that neither
sumably,
so,
(HI)
observed
of these hypotheses
mentions
the calendar
of augmented
years. Pre?
the "years"
is recorded
in the observed
calendar.
If
they talk about concern what
and (H2) are not predictively
about what will happen
in
equivalent;
they disagree
are about to make
both hypotheses
false pre?
year 60. And as data set (2) shows,
dictions. (HI) falsely predicts a universal F in the observed year 60; (H2) falsely predicts
that the three planets will
have
not confuse
Shoemaker's
should
in the observed
problem
with
years
the problem
63,
64,
and 65,
of choosing
We
respectively.
between
(HI) and
(H2).
to Denis Walsh
151 am grateful
16
It is a substantive
assumption
to this question.
are probabilistically
inde?
readings
on
state
conditional
the
of
and
c-fibre
of error
pendent,
underlying
pains
firings. A model
in which
is possible,
this is not assumed
and it will
contain more parameters
than (Error)
does. Adopting
this more
model
would
not affect
the points
I'll argue for in what
complex
follows.
17
Although
dualism,
for drawing
that ouches
my
this view
about causal
captures
efficacy
not correspond
to the type of property
Kim holds
that the mental
and physical
it does
Kim
attention
and meter
(1984).
is a cause
that each
of behavior,
but
that mental
something
dualism
of
defended,
properties
an irreducible
do not make
properties
no difference
makes
to the occurrence
of
M
contribution.
is a cause of B, even though M
P that the individual
B, once one holds fixed the physical
properties
possesses.
18
In this connection,
it is worth
Enc's
idea that theoretical
(1986)
considering
identity
an explanatory
made
of H2O
asymmetry.
Being
explains
why
is made
of water,
but not conversely.
one's
c-fibers
fire explains
something
Having
why
one is in pain, but not conversely.
so on.
And
19
This decision
not just on the spread among
will depend,
the four empirical
frequencies,
but on the amount
of data, as pointed
out in footnote
5.
20
The same style of argument
can be developed
to permit dualism
to be compared
with an
statements
anti-reductive
often
involve
physicalism
in which
the mental
supervenes
on
the physical
(Fodor
1975;
196
ELLIOTTSOBER
Kim
For
1984).
example,
that a mental
suppose
property
is "multiply
realizable"
by
REFERENCES
Akaike, H.: 1973. 'InformationTheory and an Extension of theMaximum Likelihood Prin?
in B. Petrov
ciple',
and F. Csaki
Second
(eds.),
International
on Information
Symposium
Carnap,R.:
and Ontology',
Semantics,
'Empiricism,
Chaitin,
west
Enc,
'Randomness
1975,
J.: 1993,
Earman,
in Philosophy,
Studies
B.:
and Mathematical
'Underdetermination,
of Notre
University
without
'Essentialism
1986,
Realism,
Individual
Revue
Internationale
de Philoso?
American
Proof,
Scientific
and Reason',
in H. Wettstein
Dame
Notre
Press,
Essences:
Dame,
Causation,
232,47-52.
(ed.), Mid?
pp.
Kinds,
19-38.
Superve
Fisher, R.: 1925, Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh.
Fodor,
J.: 1975,
Forster,
M.:
The Language
1988,
Studies
Mechanics',
of Thought,
Thomas
Crowell,
New
York.
of Causes
'Unification,
in Newtonian
55-101.
Forster,M. and Sober, E.: 1994. 'How toTell When Simpler, More Unified, or Less ad hoc
Theories Will Provide More Accurate Predictions', British Journal for thePhilosophy of
Science
1-36.
45,
I.: 1965.
Hacking,
J.: 1984.
Kim,
'Salmon's
32, 269-271.
Philosophy
of Science
and Supervenient
In P. French,
Causation'.
T. Uehling
Vindication',
'Epiphenomenal
and
Kuhn,
Revolution,
Harvard
University
Press,
Cambridge,
Mass.
of Chicago Press,
Chicago.
H.:
Putnam,
1975,
'Mathematics,
Matter,
and Method',
Philosophical
volume
Papers,
I,
Mass.,
pp. 20-46.
Cambridge,
and Prediction,
of Chicago
1938, Experience
Press,
University
and G. Kitagawa:
Criterion
Y., M. Ishiguro,
1986, Akaike
Information
Dordrecht.
Publishers,
Press,
H.:
Chicago.
Statistics,
J.: 1959,
'Sensations
and Brain
Processes',
Philosophical
Review
68,
141-156.
E.:
Sober,
E.:
Sober,
E.:
1988a,
'Likelihood
and Convergence',
55, 228-237.
of Science
Philosophy
the Past: Parsimony,
and Inference, MIT
Evolution,
1988b, Reconstructing
Mass.
Cambridge,
1990a,
'Contrastive
of Minnesota
versity
Biological
E.:
Sober,
Point
1990b,
inW.
Empiricism',
Press,
Minneapolis,
pp.
of View, Cambridge
University
'Let's Razor Ockham's
Razor',
Reprinted
Press,
Cambridge,
in D. Knowles
Its Limits,
Sober,
Uni?
Theories,
Scientific
a
in E. Sober, From
(ed.),
Savage
392-412.
Press, Cambridge,
University
Cambridge
From a Biological
Point of View, Cambridge
Press,
Mass.,
pp.
University
1994.
Mass.,
(ed.),
Explanation
and
Reprinted
in E.
73-94.
Press,
Cambridge,
Mass.,
1994.
Sober,
E.:
for Empiricists',
'Epistemology
of
Notre
Dame
Press,
University
Philosophy,
Sober, E.: 1993b,
Sober,
Van
E.:
Fraassen,
1993c,
B.:
'Mathematics
and
Philosophy
Scientific
February
September
Department of Philosophy
of Wisconsin
Notre
Indispensability',
Westview
of Biology,
1980, The
submitted
Manuscript
Final version
received
University
in H. Wettstein
1993a,
Madison
13,
26,
Image,
1995
1995
Oxford
(ed.), Midwest
pp. 39-61.
Review
Philosophical
Studies
Dame,
Press,
Boulder,
University
Colorado.
Press,
Oxford.
102,
35-58.
in