Parsimony and Predictive Equivalence

ELLIOTT SOBER
PARSIMONY AND PREDICTIVE EQUIVALENCE
ABSTRACT.
different
If a parsimony criterion may be used to choose between theories thatmake

may
predictions,
the same
criterion
be used
to choose
between
theories
that are
predictively equivalent? The work of the statistician H. Akaike (1973) is discussed in con?
nection with this question. The results are applied to two examples inwhich parsimony has
been invoked to choose between philosophical theories - Shoemaker's (1969) discussion
of the possibility of time without change and the discussion by Smart (1959) and Brandt
and Kim (1967) of mind/body dualism and the identity theory.
razor -
Ockham's
solve
scientific
cal theories. How

contrast
between
the principle of parsimony

has been invoked to help
it also has been used to evaluate philosophi?
problems;
are these
scientific
to a second
imperfectly)
ny is used
to choose
sometimes
it is used
two applications
of the principle
related? This
and philosophical
is related (perhaps
arguments
sometimes
the principle of parsimo?
distinction;
between
theories
to discriminate
Does
the rationale
ly equivalent.
rationale for the other?
that make
between
different
theories
predictions;
that are predictive?
for one use of the principle
provide
1. REICHENBACH'S THESIS
In Experience
and Prediction,
Hans Reichenbach
argues that a difference
in simplicity
can
two quite different
have
among competing
hypotheses
sorts of significance.
two theories fit the available
When
data equally
well
but make different predictions

about what new data will look like, the
theories may differ in their inductive simplicity,
if so, this difference
counts
as a reason to think that the
more
is
simpler theory
plausible. However,
when two theories are predictively
equivalent,
agreeing not just about the
extant data but about all possible observations,
they can differ only in their
descriptive simplicity. In this instance, the difference in simplicity ismerely

aesthetic or pragmatic; here itwould be wrong to think that a difference in
is a basis on which
to attribute different truth values.
simplicity
Reichenbach's
discussion
anchors this distinction
to a view of theories
now accept. He says that theories that are
that few would
predictively
are "logically
equivalent
equivalent";
they differ only verbally, not in the
Erkenntnis 44: 167-197, 1996.
1996Kluwer Academic Publishers. Printed in theNetherlands.
168
ELLIOTTSOBER
substance
theories
of what
are related
thinks that predictively

they say. Reichenbach
equivalent
to each other as different systems of measurement
are
related. Even if the metric

than the system of inches
system is "simpler"
and feet, it would be foolish to conclude
that "the box is 25.4 centimeters
wide" and "the box is 10 inches wide" could differ in their plausibility.
says that he wrote Experience
of logical positivism.
Yet, this pronouncement
is echt positivism.
Those
of us who
think
Reichenbach
claims
one
can be
incompatible
even if we
with
and Prediction
about predictive
that predictively
each other will
as a critique
equivalence
equivalent
to set this idea to
want
's claim about two types

of
is not whether
the term
course,
question,
is
the
word
like
is
"simplicity"
ambiguous.
Surely
nothing
"simplicity"
asserts
the word "bank". Rather, what I'll call Reichenbach's
thesis
that a
in simplicity
is grounds for assigning
difference
in
different
truth values
side. But
of simplicity
remains.
do
so, Reichenbach
The
one circumstance,
but not in the other.
To evaluate this thesis, we first would have to understand what justifies
the use of a simplicity
criterion in the case of predictively
non-equivalent
theories. We then would have to determine whether
that rationale transfers
to the case of theories
that are predictively

Those
inclined to
equivalent.
as a sui generis constituent
the "principle of simplicity"
of "rational?
remark
ity" may think this problem has an easy solution. But if Russell's
about the advantages
of theft over honest toil applies here, the conclusion
view
to draw
is that this approach
short-circuits
the problem;
it does
not solve
it.
the mark, but in

thesis also misses
of epistemological
which
positions,
reasons
termed
end
that
be
up denying
might loosely
objective
"empiricist",
can be given for choosing between
theories.1 Even
equivalent
empirically
an
turns
out
to
sort
if
of this
be plausible,
itmay or may not
epistemology
is
deliver a full assessment
of Reichenbach's
thesis. For what is wanted
Another
a way
approach to Reichenbach's
that is more subtle. A number
not just a verdict
on the role of simplicity

considerations
an
of whether
but
equivalent,
understanding
when
theories
are
and why
simplicity
are not empirically
then will
Only
equivalent.
to see if an inferential principle
that makes good sense in one
empirically
is relevant
when
we be able
theories
leads to nonsense
circumstance
Reichenbach
in another.
faced head on the problem of justifying

of
idea was
that repeated applications
his thesis
about
the principle
of
simplicity.
on
ever
to
will
sets
size
of
data
converge
eventually
increasing
simplicity
the truth, if there is a truth on which inference could converge. Reichenbach
the curve-fitting
this general
line of argument by discussing
illustrated
His
problem;
the accompanying
figure
comes
from page
375 of Experience
169
PARSIMONYAND PREDICTIVEEQUIVALENCE
Figure
I. Reichenbach's
(1938)
illustration
of
the curve-fitting
problem.
Imagine a set of data points; the goal is to choose the best

curve. Many
curves pass through the data points; some are smooth while
in
others are bumpy. Reichenbach
argues that the simple curve depicted
and Prediction.
this figure is preferable over the curve that ismore complex. By simplicity,
a curve obtained by connecting
meant
Reichenbach
the data points by
the curve to remove discontinuities.
straight lines and then smoothing
Reichenbach
this method
more
should guide our choice of curve because

says that simplicity
will lead to the true curve in the infinite limit. As we examine
and more
in each instance selecting

data points,
the simplest curve
we
fits the data,
will recover the truth (if truth
eventually
that perfectly
there be).
The standard
to this justification
is that other principles
objection
on
the principle
of simplicity
the truth in the infinite
converge
limit. A procedure
that introduces crazy bumps into the curves it postu?
as the size of the data grows,
their magnitude
lates, but which diminishes
will agree with the principle
of simplicity when
the data set is infinite.
besides
However,
principle
the limit
for finite data sets, a procedure of this sort will disagree with the
as to which curve is best. In short, convergence
of simplicity
in
is not a sufficient condition
for justifying
the principle
(Hacking
1965).
There
is a second
and less familiar
to Reichenbach's
objection
sensible
inference procedures
argument.
to show that quite
It is possible
sometimes
violate the requirement
of convergence
in the limit.2 A method
is conver?
to
if
the
method
certain
is
the
truth
when
gent only
yield
applied to an
we
set.
data
infinite
If
abandon the demand for certainty in the face of finite
data, why
should we
impose
it in the hypothetical
circumstance
in which
170
ELLIOTTSOBER
the data are infinite? Inmy view, convergence

is not a necessary
condition
to be a reasonable one to use.
for a principle of inference
There is a third objection
that bears mentioning.
The two curves that
considered
in the accompanying
figure both pass through
the data points exactly. However,
in the real world, observation
is always
error.
a
to
Scientists
curve's
this; they compute
sum-of
subject
recognize
how far it is from the
squares (SOS). For each data point (x, yd), measure
Reichenbach
point (x,yc) on the curve; then square this distance and sum the squared
distances
for the entire data set. The fact that a curve's SOS value is greater
than zero hardly disqualifies
it from scientific consideration.
That Reichenbach's
argument
ignores the impact of error
the reason is that simplicity
criticism;
conflict. A simple curve will usually
curve
exactly; a sufficiently
complex
and goodness-of-fit
fail to pass through
is no
idle
are usually
in
the data points
can always be made

to do so. This
a
that the justification
for using
criterion in curve-fitting
simplicity
an
must
account
include
how
and goodness-of-fit
of
problems
simplicity
rate of exchange
should be traded off against each other. Which
is the right
means
assuming
are
of the problem
glides over this matter by
selects curves whose SOS values
of simplicity"
treatment
one? Reichenbach's
that the "method
zero.
I conclude, did not succeed

in justifying
the thesis I have
Reichenbach,
on
as
an
named for him. But his thesis lives
important claim with which
to
to reckon. As long as the rationale for using simplicity
considerations
theories remains a mystery, Reichen?
predictively
nonequivalent
as a
It cannot be dismissed
thesis should also remain a puzzlement.
residue of positivism.
evaluate
bach's
mere
I'm going to outline a solution to the problem of under?

truth values in
is a ground for assigning
different
standing why simplicity
will
focus
be on the
the case of predictively
theories.
My
nonequivalent
cover
in
sim?
which
all
circumstances
this
doesn't
curve-fitting
problem;
In this paper,
theories,
nonequivalent
plicity is relevant to choosing between empirically
but it is certainly a very central case.3 We will see that this treatment of
in the curve-fitting
the role of simplicity
considerations
problem provides
no rationale
whatever
for choosing

count
that simplicity differences
between
theories
prove
equivalent. This doesn't decisively
in the case of predictively
for nothing
equivalent
conclusion.
does lend support to that epistemological
in one context because
itmakes
of simplicity
commit an epistemological
equivocation.
Besides
Reichenbach's
thesis,
discussing
role of parsimony
considerations
good
in philosophical
theories.
it
However,
To use the principle
sense in the other is to
I also want
theorizing.
to consider
The
the
idea has
171
gained
that the principle
currency
of parsimony
can be invoked
to evaluate
philosophical theories because it is a legitimate criterion in scientific infer?

ence. This
raises
the question
of whether
use of a parsimony
philosophical
to the use made of that principle
in
is related only metaphorically

principle
science. So, after outlining a solution to the problem of understanding
1*11discuss parsimony
in curve-fitting
problems,
simplicity matters
ments
that have been made
2.
The
statistician
H. Akaike
in two quite different
philosophical
why
argu?
contexts.
AKAIKE'S THEOREM
and his
school
a set of ideas
developed
a
curves
of
may be
family of
et al. 1986, Forster and Sober
have
how the predictive

accuracy
1973, Sakamoto
(see Akaike
show how simplicity
(as measured
1994). Their theorems
by the number
are relevant
in an equation)
and goodness-of-fit
of adjustable parameters
an
accurate
to estimating
how predictively
is.
equation
concerning
estimated
accuracy, we first must be careful to

the specific curves that are members
consider the infinite family of straight lines in
of that family. For example,
the x-y plane. These all have the form:
To explain the idea of predictive

a family
of curves from
distinguish
= a + bx.
(LIN) y
In this equation, a and b are adjustable parameters.
Once values are fixed
once
are
a
for these parameters
(i.e.,
adjusted),
they
specific straight line
is obtained.
use families of curves to predict new data from old data. The
Scientists
process
comes
best-fitting
smallest SOS
to predict what
they use the available data to obtain the

of the family with the
(i.e., the member
this best-fitting member
of the family is used
in two steps. First,

of the family
member
score). Then,
new data will
look like. The question
is how well
the curve
in the family that best fits the old data will do in fitting the new data. A
in this two stage prediction
task on one occasion,
family might do well
so
on
but not
well
another. Intuitively
the predictive
speaking,
accuracy
of a family is how well
it would perform on average, were this two step
process repeated again and again.
To make this idea more precise,
let us begin by considering

how the old
data are obtained. Our ultimate goal is to discover what the true relationship
is between
the independent
variable x and the dependent
variable y; just
to fix ideas, imagine that x is the temperature of the gas inside a kettle
and y is the pressure
that the gas exerts on the sides of that rigid chamber.
heat the kettle to different temperatures
and observe what pressure the
We
172
kettle
which
ELLIOTTSOBER
then experiences.
We
each of
thereby obtain several observations,
- a
can be represented
as a pair of numbers
in
the x-y
point
(x, y)
plane.
set of data points is generated

curve;
by the true (but unknown)
an
value
for
the
observed
value
for
is
obtained
because
of
x,
y
input
given
that links x to y. However,
there is a second factor that
the true equation
This
even if the
influences
the observed value obtained for y. For example,
true curve happens to be a straight line, the data points will almost certainly
fail to be exactly collinear. The reason is that observation
is always subject
to error, at least to some degree.
In the kettle example,
there is some
also
true relation
and
between
temperature and pressure, but the thermometer
accurate.
the pressure gauge don't always report values that are perfectly
the data don't necessarily
that comprise
the
The (x,y) values
represent
a
true pressure
associated
with
value; they represent
temperature
given
the observed pressure gauge reading associated with a given thermometer
reading.
Given
a data set thus obtained, which

family of curves should be used
to predict new data in the way sketched above? For example,
should (LIN)
it be better to use
be used, or would
(PAR) y = a + bx + ex2,
curves? Because
the family of parabolic
(LIN) is a special case of (PAR)
=
that (PAR) will fit the
0), we know in advance
(obtained by setting c
that
this does not guarantee
data at least as well as (LIN) will. However,
new
a
true
at
If
relation
data.
the
better
do
will
(PAR)
predicting
job
than
is linear, (PAR) will probably do worse
of temperature and pressure
task. This is because
(PAR) will "over-fit" the data.
(LIN) in this prediction
(PAR) will interpret the data's departure from linearity as an indication that
oix and y is genuinely
nonlinear;
(LIN), on the other hand,
so to speak, is
as
error.
to
due
these
deviations
Over-fitting,
interpret
be
this mistake
of confusing
the mistake
signal and noise. How might
the true relation
will
avoided?
As noted earlier, given a
this problem more
generally.
level of goodness-of-fit
may be obtained
by
body of data, any desired
a
curve
often
is
that
sufficiently
complex.
Simpler hypotheses
constructing
new
do worse at fitting the data at hand, but do a better job of predicting
Let
us pose
is influenced
data. The predictive
accuracy of a family of curves apparently
it contains).
by how simple it is (i.e., by how many adjustable parameters
in
this is just a brute fact or can be understood
is whether
The question
some general and mathematical
way.
173
now are in a position

to define the predictive
accuracy of a family F
we will do this by characterizing
For convenience,
the concept
of predictive
of family
(F, D) denote the member
/?accuracy. Let Bestfit
F that best fits data set D. Let SOS (C, D) denote the sum-of-squares
that
We
of curves.
curve C has with
to data set D:
respect
Predictive
inaccuracy
[Bestfit
Average-SOS
of family F =#
(F, Dx),
D2l
the quantity on the right side of this equation

is large, the family is
not very good at predicting new data by fitting itself to old data; the family
is predictively
inaccurate.4
When
We
so far have defined
in terms of small SOS, but

accuracy
If we adopt the standard assumption
predictive
the idea can be stated more
generally.
that errors are symmetrically
distributed
around a curve, with large errors
less probable
than small ones, the SOS value of the best fitting
being
curve in a family F has a special meaning.
The member
of F that has
the smallest SOS value is the member

to the
of the family that assigns
data the highest probability. This best fitting curve is the likeliest member
sense of likelihood
of the family,
in the technical
introduced by R. A.
is the hypothesis
H in F that maximizes
(1925). Best-Fit^,
D)
are families whose
accurate
the quantity Pr(Z) H).
families
Predictively
|
likeliest members,
relative to old data, also have high likelihoods
relative
to new data.
Fisher
a desirable
is obviously
feature of families
of
accuracy
use
we
want
to
to
a
families
when
fitted
old
will
do
curves;
that,
data,
new
at
data. However,
for all that, predictive
good job
predicting
accuracy
seems to be epistemologically
inaccessible.
It seems that we can't tell,
Predictive
from the data at hand, how predictively

accurate a family of curves will
be. To be sure, it is easy to determine
how well a family fits the present
is how well the family will do in predicting
data; what seems inaccessible
new
data.
Akaike's
remarkable
epistemologically
An
unbiased
family F,
SOS
given
shows
The
estimate
[Best-fit
k is the number
- the
variance
degree
to have.
observations
Here
theorem
accessible.
of the predictive
data set D,
(F, D)]
that predictive
says that
is, in fact,
accuracy
theorem
is provided
+ 2ka2
of adjustable
of dispersion
The constant
inaccuracy
of
by the quantity
+ constant.
in F and a2 is the error

parameters
around the true curve that we expect
third term
in Akaike's
theorem
dis
174
appears
ELLIOTTSOBER
when
ignored. Notice
the conclusion
member
Second,
second
are compared with each other, and so may be

hypotheses
that there are two properties
of a family that can lead to
that it will be predictively
inaccurate. First, its best-fitting
data poorly
may fit the available
(i.e., have a high SOS score).
the family may have a large number of adjustable parameters. This
term in Akaike's
theorem gives simplicity
its due; the complexity
of a family ismeasured
it contains.
by how many adjustable parameters
see
that the number of adjustable parameters
It is important to
is not a
= ax + bx" and
an
of
the
feature
syntactic
equation. Although
equations "y
= ax + bz"
(a and
may each seem to contain two adjustable parameters
"y
?
can
so.
not
is
The former equation
be reparameterized,
let a!
a+b,
b), this
can be restated as "y ? a!x". For this
case the first equation
in which
in fact contains one adjustable parameter, while
the first equation
in
contains two. If you like, think of the number of parameters
a family as the number of quantities whose values need to be fixed for the
about data (given standard assumptions
about
family to make predictions
reason,
the second
error).
The
second
term in Akaike's
theorem,
also mentions
of adjustable
parameters,
variance
is large, this second
to the number
adverting
error
the
variance. When
this
besides
term plays a larger role in estimating

the
is
free
when
observation
of
error,
inaccuracy;
largely
family's predictive
cannot
contribution.
the second term makes
only a negligible
Simplicity
matter when observation
is error free; itmatters more and more as the data
become
noisier.
the data
theorem to (LIN) and (PAR). Suppose
Let us apply Akaike's
at hand fall fairly tightly around a straight line. In this case, the best fitting
So Best-fit
straight line will be very close to the best fitting parabola.
(PAR, D) will have almost the same SOS values.
(LIN, D) and Best-fit
theorem says that the family with the smaller
In this circumstance,
Akaike's
to be more
is the one we should estimate
number of adjustable parameters
if it fits the data about
accurate. A simpler family is preferable
predictively
theorem
describes
how much
as well as a more complex
Akaike's
family.
a
more
must
in
family
provide
complicated
goodness-of-fit
improvement
sense to prefer the complex
for it to make
family.5
Akaike's
ever
to ask what the assump?

is a theorem, so it is essential
assumes
that the true curve, what?
it derives. Akaike
the same for both the old and new data sets considered
theorem
tions are from which
it is, remains
of predictive
in the definition
hood function is "asymptotically
that the likeli?

accuracy. He also assumes
that the
normal". And finally, he assumes
sample size is large, in the sense that enough

value of each parameter can be estimated.
data are available
so that the
175
As
noted
before,
Akaike's
theorem
a family's
individual
The
This
identifies
an unbiased
estimate
of
that
open the possibility
predictive
inaccuracy.6
estimates may
from this true value.
stray quite considerably
there are other unbiased
theorem does not say whether
estimators.
leaves
In addition,
there are other desirable
statistical properties
of an estimator
so
a
as
is
besides
to how various
there
unbiasedness,
genuine question
ought to be traded off against each other. However,
optimality
properties
the fact that important details remain unsettled
should not obscure
the
fact that Akaike's
in the task of
approach has made
significant headway
in hypothesis
the role of simplicity
evaluation.
explaining
to curve-fitting,
Akaike's
theorem
the illu?
applies directly
Although
is more general. The theorem explains why a unified
mination
it provides
to a disunified
theory is sometimes
preferable
theory; it also shows why
tomodels
that postulate fewer causes are sometimes preferable
that
more
Nor
and
Sober
should
the
surface
(Forster
1994).
postulates
appear?
ance of the curve-fitting
lead one to think that the Akaike
format
problem
models
applies to inductions over observational

that postulate
unobservable
mechanisms.
difference
in the Akaike
framework.
and not to abductions

regularities
a
This is a distinction
without
Akaike's
theorem
addresses
the gen?
the
of
selection", meaning
problem
evaluating
that contain adjustable parameters. The focus on the predic?
propositions
in no way limits the theories that can be considered
tive accuracy of models
to ones expressed
in some sort of "observation
language". The quantities
eral problem
of "model
in (LIN) and (PAR) may be as "theoretical"

as you please.
a
it remains true that each parameter
in family of curves that is
However,
to be treated within the Akaike
framework must be such that its maximum
value can be estimated
likelihood
from the data. A family of curves that
represented
is called "unidentifiable".
this requirement
So as to give the reader more of a feel for how the Akaike
let's now turn to two famous controversies
in the history
works,
violates
framework
of physics.
The Copernican
and Ptolemaic
fit
the
observations
then
available
systems
the
relative
of
bodies
about
positions
concerning
heavenly
equally well.
a
Ptolemaic
the
model
included
far larger number of adjustable
However,
these represent the "epicycles"
that made Ptolemaic
parameters;
astronomy
the very paradigm of an unparsimonious
theory. Although many philoso?
that the virtues of the
1957, p. 181) have claimed
phers (notably Kuhn
are
the Akaike
offers
framework
Copernican
hypothesis
purely aesthetic,
a much more down-to-earth
of
the
model
is
explanation
why
Copernican
its estimated
is much higher.
accuracy
predictive
this example with the controversy
that arose in connection
with Newton's
postulate of absolute space. Leibniz and many others took
preferable;
Contrast
176
ELLIOTTSOBER
in Newton's
element
this to be a defective
absolute
model;
space seems
of an unparsimonious
to be a perfect example
this
However,
postulate.
no
not
in the Akaike
is
framework. There
is
way to
analyzable
example
that represents
the value of a parameter
the velocity of a physical
estimate
framework applies
object relative to absolute space. It isn't that the Akaike
us
with
what
is
Newton's
the
framework
and tells
model;
wrong
simply
the Newtonian
is not identifiable.
model
does not apply at all; because
itmight seem that parsimony
considerations
over
should
be
the Ptolemaic
astronomy
preferred
At first glance,
Copernican
same sense
that parsimony
considerations
absolute
space is better than one
without
Akaike
strongly
approach
suggests
explain why
that includes
that this intuitive
explain why
system in the
a physics
that does
that postulate. The

involves an
judgment
equivocation.
3.
SOME PHILOSOPHICAL PRELIMINARIES
this
I'll be using the concept of "predictive
equivalence"
throughout
a
means.
not
it
about
what
This
is
I
should
say something
paper,
big job,
of observation
it requires a treatment of the concept
least because
(on
a few remarks are worth
which see Sober 1990a, 1993b, 1993c). However,
Since
here, incomplete
making
There
is first of all
equivalence
assumptions
Typically,
ries when
though they must be.

the familiar Duhemian
that the predictive

a
set of background
of two theories must be gauged against
the theories to make contact with observations.
that allows
point
but theo?
that make predictions,
by themselves
means
This
that the
with
auxiliary assumptions.
supplemented
a
not
should
be
concept of predictive
two-place equiv?
equivalence
it isn't theories
primary
alence relation,
of antecedently
but a three-place
relation
of equivalence
relative
to a set
assumptions.
accepted background
is the idea that theo?
important, and less often recognized,
do not have
with background
ries, even when supplemented
assumptions,
about
deductive
observations;
rather,
they assign probabilities
implications
Far more
outcomes.
This means
that we should
observational
possible
terms
in
theories entail
of what
not understand
predictive
equivalence
be understood
should
about observations.
Rather, predictive
equivalence
to different
in terms of identity of probability distributions. It is sometimes thought that

for statistical theories such as popu?
turn" is appropriate
this "probabilistic
theories
but not for deterministic
and quantum mechanics,
lation genetics
mechanics.
This is entirely wrong!
such as relativity theory and Newtonian
even deterministic
is subject to error, we must model
observation
When
theories
as making
only
probabilistic
contact
with
observations
177
(Forster
1988).
in what follows will require the distinction
my discussion
Although
between what is observed
and what is not observed but only inferred, the
I'll deploy is not a rigid or absolute one. Whether
a
concept of observation
an
a conjecture
observation
under
report, or formulates
given statement is
state?
test, often depends on the problem at hand. In addition, observation
as I'll use the term, often employ
theoretical
and
their
ments,
concepts
confirmation
and disconfirmation
often depends on the use of instrumenta?
tion and background
theories. Roughly,
the idea is this: When
the question
is raised
about which
statement"
will
of two theories
describe
a detectable
is more
feature
an "observation
plausible,
of the environment
about
to make a reasonable
it is possible
judgment without
already hav?
as
an
to
formed
true.
sense that
which
is
It
is
in
this
ing
opinion
theory
are relatively
not
observations
so
(Sober 1990).
theory-neutral,
absolutely
about the "observa?
Despite widespread
among philosophers
skepticism
which
tion/theoretical
of observation
I hope these few words show that the concept

distinction",
I'll use is fairly innocuous.
It is a routine feature of scien?
tific testing to ask what types of information

constitute
the data; it also is
to distinguish what is observed
routine for scientists
from what is inferred
on the basis of observation.
Iwill be content if the reader allows that these
have not been
sense, even if philosophers
good scientific
a completely
of
this
is so.
adequate analysis
why
In light of my brief exposition
of Akaike's
theorem in the previous
a streamlined
to provide
it is possible
version
of the kind of
section,
I'll
construct
in
what
The
follows.
whole
argument
point of Akaike's
practices make
able to provide
theorem
when
is to estimate
then, should we expect

predictive
accuracy. What,
the Akaike
to two theories that are known at
framework
we
apply
to be predictively
There are two possibilities.
The
equivalent?
first is that their Akaike
estimates
of predictive
turn
out
to
be
accuracy
the same; the second
is that they do not. In the first instance, we find
that the estimation
reinforces what we already knew; in
procedure merely
the outset
the second,
we
find that the procedure

us with misleading
has provided
In short, we know in advance
that the
estimates, which we then discard.
Akaike
framework will never give us a justification
for choosing between
theories.
predictively
equivalent
The question may
then be raised of why one should work
through
the details of the philosophical
I'll present, since the "take home
examples
is already apparent at the outset. The main reason is that attention
message"
to these details illuminates both the
theories and the intuitions
philosophical
about simplicity
that have been cited in their defence. As we will see, some
178
ELLIOTTSOBER
of a simplicity
applications
the Akaike
without
others;
are all of a piece.
criterion
in philosophy
it is quite
framework,
are better grounded

than
to
think that they
easy
SHOEMAKER'S EXAMPLE OF TIME WITHOUT CHANGE
4.
any
change? That is, can time pass without
mere
the
other
than
of
time
itself?
If
the
external
changes occurring,
lapsing
are
of perceivers
world
is frozen for a while and if the thought processes
the passage of time during
likewise frozen, none of them will experience
there be
Can
time without
one might
In addition,
that a moment
of
imagine
leave no trace that later perceivers
will be able to
that universal
freezes are not the sorts of events for
that frozen moment.

universal
arrest will
This
detect.
suggests
evidence
observational
which
conclude
that it could
from this one might

to infer that universal
freezes
could be mustered.
never
be reasonable
And
occur.
(1969) has
Sydney Shoemaker
to show that this line of reasoning
a clever
invented
is mistaken.
He
that seems
example
us
to imagine a
asks
of three planets, each inhabited by intelligent beings.

composed
Shoemaker
stipulates that no planet will be able to detect that it is freezing
each planet can observe
the other two.7
this is happening. However,
while
we may imagine that the three planets use the same
For convenience,
i = 1,2,3
calendar (with years numbered
...) and that they start observing
each other in year 1. In year 3, planets Y and Z observe that X has frozen.
universe
the end of that year, Y

calendars have been frozen,
and Z
inform X that this has happened; X's

so the people on X must correct their calendars
to them. In year 4, X and Z observe
to take account of what has happened
that Y has frozen, so at the end of that year, Y must similarly be brought
that Z has frozen. The pattern
up to date. And in year 5, X and Y observe
in
the following
table. During a given year, there is
of freezes is displayed
At
a freeze
planet
on the planet or planets marked

what has happened:
1 2 3 4
X
5 6 7 8 9 10
see
periodicity;
pattern holds
in the data
freezes
12
every
through year 59.
14
F
F
so far
13
F
F
we
11
(1)
What
"F" and the unmarked
planets
or
observe
is that each
3 years,
planet has
every 4, and Z
15
F
...
59
. . .
...
. . .
its own
every
fixed
5. This
179
the events
Although
in data set (1) are uncontroversial,

what
to
I'll
describ?
subject
interpretation.
begin by
recorded
ismore
happens subsequently
their
ing what the people on the planets observe. They note that when
a
there is subsequently
calendars
read "60", no one is frozen. However,
in play. This is how they record their
shift in the timing of the regularities
observations:
...
(2)
After
55
. . .
...
. . .
the hiatus
56
58
57
59
60
61
62
(3)
66
67
...
69
68
. . .
year 60, the familiar

with perfect regularity
there is again an interruption:
114
115
116
x
Y
Z
117
118
around
...
65
64
F F
continue
again. They
at which point
63
. .
3, 4, and 5 year cycles begin

until the calendar reads 119,
119
120
121
122
...
123
F
F
F
FF
. . .
. . .
the 3,4, and 5 year cycles begin again, but once again there is a hiatus
after a certain number of repetitions. We may imagine that the inhabitants
this pattern numerous
times. They have lots of data.
experience
we
If
take this data at face value, we will infer the following
general
Then
pattern:
(NUF)
For all observable
years o,
Planet X
freezes
in year o iff mod[(o
Planet Y
freezes
Planet
Z freezes
where
c is the
largest
integer
+
+
+
c)/3]
c)/4]
c)/5]
=
=
=
0,
0,
0,
such that 1+ 59c < o.
is the remainder of dividing n by m. (NUF) entails that

Here mod(n/m)
are
no
there
universal freezes in what I am calling "the observable
years".
These observable
that
years are the intervals marked off by the calendars
the people on the planets use, once they are corrected for the local freezes
that occur on one planet and which are observable
by the people on another.
a pattern
(NUF) perfectly fits the available data, it postulates
Although
for each planet that is somewhat
Each
freezes
with a
complicated.
planet
a
for
certain
a
number
of
then suffers
regular period
repetitions,
slightly
it resumes
longer sequence of years in which there is no freeze, after which
180
ELLIOTTSOBER
its cycle
of periodic
freezes.
asks us to entertain
Shoemaker
an alternative
there is a universal
freeze
Suppose
a
and
that
lasts
year 59
single year;
the same thing happens
after the observable
year
suppose
immediately
118. These years in which universal freezes alleged occur are "hidden", in
what
theory concerning
that occurs right after
the sense
is happening.
the observable
that no one
in the three planets could experience

them while they
traces
they be detected after the fact by observing
leave. The postulate of these hidden years gives
in which
the observable
calendar,
years are
"augmented"
are happening,
nor could
that the universal freezes
to a new,
rise
by hidden
supplemented
observable
12
...
58
59
augmented
12
...
58
59A61
60
61
...
117
118
62
"
118
119A121
...
120
122
120
is the set of augmented

years, we
a value o G O, we can construct
a G A for the same year as follows:
is the set of observable
can describe
the value
119
?60
If O
are related:
the two calendars
is how
years. Here
A as a function
years and A
of O. Given
a =
o for 1 < o < 59
a =
o+
a =
o + 2for
1 for 60 < o<
118
119 < o <
111
In other words,
=
o + c where
1+ 59c < o.
Note
that each observable
but the converse
year
is numbered
this explanation
of what
can state the universal
freeze
(UF)
the
largest
For all augmented
in the augmented
such
that
calendar,
years
the augmented
calendar amounts
terms
of it:
in
hypothesis
to, we
a,
Planet Y freezes
in year a iff mod(a/3)

Z freezes
Planet ?" freezes
Planet
integer
isn't true.
Given
now
is
0,
= 0,
= 0.
the
in data set (2), but views
recorded
(UF) agrees with the observations
calendar it uses as incomplete.
observable
(2), arriving
(UF) supplements
in the augmented
at the following
pattern for what supposedly
happens
calendar:
181
...
ber
. . .
. . .
. . .
X
Y
Z
(2a)
56
55
57
58
59
60
61
62
64
Whereas
. . .
. . .
. . .
F
F
(UF) may seem

Although
that they are formulated
65
F
F
F
F
63
to be simpler than (NUF), it is well to remem?

in terms of two quite different
calendars.
with
in
the
observable
patterns
interruptions
(NUF) postulates
fixed peri?
calendar,
(UF) postulates
sequences
uninterrupted
containing
as a
In particular,
ods in the augmented
calendar.8
(UF) can be viewed
one
a
of
whose
is
other
is
and
the
(NUF)
conjunction,
postulate
conjuncts
about hidden years:
(H)
mod(o/59)
universal
Since
after
Immediately
=
there
observable
is a hidden
o
year
year in which
such
that
is a
there
freeze.
is equivalent
(UF)
0,
each
to (NUF) &
(H), we must
be careful
that our
assessment of the simplicity of (UF) also applies to (NUF) & (H).9 It is

of (UF); since (UF)
by the syntactic simplicity
important not to be misled
and (NUF) & (H) say the same thing, they must be equally plausible.
Shoemaker
to (NUF) on grounds
argues that (UF) should be preferred
of simplicity. We now need to assess what the Akaike
framework
says
about
these
two hypotheses.
are from which
hypotheses
families must
To do this, we must ask what

(UF) and (NUF) are obtained.
the families
Each
of
of these
values are estimated

parameters whose
adjustable
this point, however, we face a problem
that is quite
a
that contains no adjustable
general in the Akaike framework
hypothesis
is a member
of many different families. For example,
parameters
from
contain
the data. At
can be obtained
member
of
3 + 2x
from
the family
(LIN) and also from (PAR); indeed, it also is a

contains
(a + 1) + ax'\ which
"y
just one
The syntactic form of a hypothesis
does not tell us
adjustable parameter.
which family we should consider.
Two ideas should guide our choice, however.10
First,
about the families we should
inquiry tells us something
the context
associate
of
with
the families
should endorse
(NUF) and (UF). In the Shoemaker
problem,
or deny the existence
of universal freezes,
it being left to the data to decide
what specific patterns are asserted to obtain. The second piece of guidance
is that it must be possible
for the adjustable
to be estimated
parameters
from the data. Let us begin by considering
two families:
the following
182
ELLIOTTSOBER
For all observable
FAM(NUF)
years
o,
Planet X freezes in year o iffmod[(o + c)/x] = 0,

Planet Y freezes in year o iffmod[(o + c)/y] = 0,
Planet Z freezes in year o iffmod[(o + c)/z] = 0,
where
c is the largest
c[LCM(x,y,z)-
1]
integer
< o.
such that
after each observable

year o
=
0, there exists a
y, z)
1]}
a
there is universal
freeze.
& Immediately
FAM(NUF)
such that mod{o/[LCM(x,
FAM(UF)
hidden
year
in which
of the three numbers

y, z) is the least common multiple
LCM(x,
I've formulated
the two families
in terms of the observable
listed.1
calendar
for two reasons.
and parameters
First, that is how the data are described,
from the data. Second,
it is hard to see how (NUF) could
even be described
in the augmented
calendar.
(NUF) denies that there is
such a thing as the hidden years postulated
the
calendar; this
by
augmented
must
be estimated
cannot be described
hypothesis
in hidden years or as remaining
as denying
that there are universal
freezes
what
about
agnostic
happens during hidden
years.
The fundamental point is thatFAM(NUF) and FAM(UF) have precisely

con?
framework
of adjustable
The Akaike
parameters.
that they have the same degree of estimated predictive
accuracy.12
Other specifications
of the two families
from which
(UF) and (NUF)
the same number

cludes
are obtained
one might associate with

For example,
might be considered.
a
are
for each
in which
there
three adjustable
(NUF)
parameters
family
occur
we
in "the
must
for
that
each
estimate
the
parameters
planet;
planet,
a
a
b
is
hiatus
which
there
freezes
for
after
every
years
planet
repetitions
of c years, at which point the cycle recurs". There is nothing wrong with
is that a nine
this nine parameter family. The point Iwould make, however,
parameter family can now be associated with (UF).13
to be simpler than its
Shoemaker
judges the universal freeze hypothesis
turns
out
not
to
true
to
be
This
the present analysis.
competitor.
according
Shoemaker's
to account
I concede,
is an intuitive one. How,
then, are we
judgment,
in this case?14
for the fact that intuitions are misleading
In part, our intuitions rest on something

that is true: (UF) postulates
a pattern in the augmented
calendar
that is simpler than the pattern that
is a
in the observable
calendar. Constant
(NUF) postulates
periodicity
a
inter?
with
constant
than
of
pattern
periods
simpler pattern
two-phase
ruptions. The
this intuition;
(Chaitin 1975) underwrites

theory of algorithmic
simplicity
the abstract sequence of events
given a canonical
language,
183
calendar can be specified by a shorter

by (UF) in the augmented
than
that
the pattern that (NUF) says
any algorithm
generates
algorithm
in
the observable
obtains
calendar.
All this is true, but entirely irrelevant as far as the Akaike
is
approach
concerned. What
is relevant is not the two abstract patterns, but the number
postulated
of parameters
in each family of hypotheses whose values must be estimated
This is why the calendar of observable
years is fun?
from the observations.
damental. The observable
calendar and the augmented
calendar have quite
different
is reflected in the
epistemological
standings, and this asymmetry
no
to
It
attention
isn't that Akaike pays
theoretical
the
analysis.
simplicity;
of a theory must be judged by seeing how the
point is that the simplicity
One of the beauties of Shoemak?
theory makes contact with observations.
er's example
is that it illustrates an important difference between "simplic?
and the simplicity captured
ity of abstract pattern" (algorithmic
simplicity)
framework. Besides
the fact that these two approaches
by the Akaike
yield
analyses of the problem at hand, there is an additional difference
that is quite fundamental: Whereas
the Akaike
approach
explains why
is epis
(as measured
simplicity
by the number of adjustable parameters)
different
has ever been developed

relevant, no comparable
temically
explanation
for the theory of algorithmic
simplicity.
To explain the intuition that the universal
freeze hypothesis
is simpler
than the hypothesis
that denies
that there are universal
I have
freezes,
focused on the abstract pattern of freezes that each hypothesis
forward
puts
calendar. However,
there is something
respect to its own proprietary
on
which
else
this intuition depends. One must ignore (or discount)
the fact
that the universal freeze hypothesis
whose
existence
postulates
something
the competing
the universal
freeze hypothesis
asserts
denies;
hypothesis
that there are such things as hidden years. Those
inclined to dismiss
the
with
Akaike
should
and insist on the authority

themselves
this: The universal
framework
of their intuitions
ask
freeze
in this case
hypothesis
postulates
less parsimonious
than the
ontology,
over
does
take
pattern
hypothesis. Why
precedence
ontology
when the overall simplicity
of the two theories is compared?15
Given
that Akaike's
theorem concerns
the estimate of predictive
accu?
no
it
is
that
the
theorem
to
should
fail
between
racy,
surprise
distinguish
two predictively
theories.
It is nonetheless
to see
instructive
equivalent
a simpler
alternative
abstract
pattern,
but a
why this is so in the case of Shoemaker's

problem. The hypotheses
(UF)
and (NUF) are each obtained from families containing
adjustable parame?
ters whose
values are estimated
from the data. The important point about
is that the same information
in the data is consult?
process
ed to construct each hypothesis.
It is for this reason that the number of
this estimation
184
ELLIOTTSOBER
in the two families must be the same; they are the

adjustable parameters
in the two families are
same in number because
the adjustable parameters
in fact identical.
5.
THE MIND-BODY
PROBLEM
with the existence

of regu?
is perfectly
dualism
compatible
Mind-body
take
event types. When
and physical
mental
larities connecting
people
want
to
take
when
off
their
headaches
their
they
usually disappear;
aspirin,
to do so. J. J. C. Smart (1959) and
their arms usually move
eyeglasses,
Brandt and Kim (1967) go further; they maintain that dualism is quite
consistent
the existence
with
and the physical.

If such perfect
of perfect
correlations
between
the mental
obtain, how are we to decide whether dual?

than the identity theory? To use an example
ism is more or less plausible
that was popular in the 1960's, why say that the property of being in pain
if we observe
is identical with the property of having one's c-fibresfire,
in a person at a time if and only
is instantiated
that one of these properties
in that person at the same time? Why not say,
if the other is instantiated
correlated properties?
instead, that these are distinct though perfectly
correlations
Smart, Brandt, and Kim point out that the ontology

by the
postulated
If
than that demanded
by dualism.
identity theory is more parsimonious
are
there
is
then
a mental property and a physical
identical,
just
property
there are two. This seems to provide
one property; but if they are distinct,
the identity theory over dualism.
a parsimony
argument for choosing
the identity theory and dualism from the point of view of
To evaluate
we must associate with each theory a family of
the Akaike framework,
are adjustable parameters whose
values can be
there
in
which
hypotheses
on what
and
Kim
focus
Since
estimated from observations.
Smart, Brandt,
and
of mental
the identity theory and dualism
say about the correlation
physical properties, we will begin with this idea, even though there ismore
than this.
to the theories
Let's
property
measure
M.
of their degree
Pr(P &M)
covariance
will
a physical
property P and a mental
such as these, the standard
characters
=
is their covariance;
of association
Cov(P, M)
two properties
dichotomous
With
consider
Pr(P) Pr(M). IfP occurs when and only when M does, their
be positive
and will
ismaximal when Pr(P) = Pr(-P)
take the value
Pr(P)
Pr(?P),
which
- 0.5 and declines as Pr(P) becomes
are "perfectly
their
that when properties
correlated",
as
a
other
On
a
the
of
function
but
varies
isn't
covariance
constant,
Pr(P).
of
is probabilistically
of one property
hand, if the occurrence
independent
more
extreme.
Note
185
the occurrence
of the other,
of Pr(P).
It doesn't make much
the covariance
will
be zero,
of the
independent
value
sense
as a simple and unana

to view covariance
and
under study. Rather,
lyzable
physical properties
their degree of covariance
is the upshot of a complete
of val?
specification
ues for the four conjoint probabilities
Pr(P & M),
Pr(P & -M),
Pr(-P
& M), Pr(-P
Once these are specified,
& -M).
the degree of association
the two properties
between
is a consequence.
feature
of the mental
the probabilities
of these con?
are one and the same property,
joint events?
=
the identity theory asserts that Pr(P & -M)
& M) = 0, and that
Pr(-P
and Pr(-P
& -M)
take any pair of values
that sum
Pr(P & M)
may
to one. Thus, the identity theory endorses a model
that contains a single
What
does
the identity theory say about

Since itmaintains
that P and M
parameter:
adjustable
(Ident)
Pr(P & M)=
p and Pr(-P
& -M)
= 1p.
on the other hand, is compatible with the four

Dualism,
conjoint probabil?
ities having any values at all, as long as they sum to 1. The properties P
and M may be independent of each other; they also may show any degree
of positive
As a result, dualism deploys a model
association.
(or negative)
that contains
three adjustable
parameters:
(Dual) Pr(P & M) = pu Pr(P & -M)

Pr(-P
Let's
now
& M)
p3, and Pr(-P
= p2,
& -M)
= 1-
Pl
p2
p3.
an experiment
that assembles
empirical frequencies
likelihood estimates of the parameters
in (Ident) and
be obtained. This experiment will mimic
the structure of the
imagine
from which maximum
(Dual) may
simple inference
that was
discussed
the kettle's temperature

of this paper. Suppose
wear monitors
for some length
problem concerning
at the beginning
in a psychology
experiment
monitors
record when subjects
and
are
with
with
and pressure
the subjects
of time. The
say "ouch"; they also detect c-fibre firings

record when those events take place. I'll assume that both indicators
of "ouch" need not be perfectly
associated
subject to error. Utterances
and
detector
pains
neurological
readings need not always coincide
the occurrence
of c-fibre firings.
This
will yield data that describe how often subjects say

experiment
and how often the neurological
detector says "c-fibres are firing".
one
For example,
might obtain frequency data like the following:
"ouch"
186
ELLIOTTSOBER
"ouch"
says
Subject
no
yes
yes
meter
c-fibre
says
"firing"
no
error is possible,
then all four possible
at least sometimes,
if the data set is sufficiently
If observational
occur
observations
will
large. The identity

on
as erro?
must
all
observations
this
table's
anti-diagonal
theory
interpret
as the
all
observations
neous; dualism need not, though it will
interpret
the individual's
and
result of an interaction between
underlying
physical
mental
characteristics
on the one hand and various
causes
of observational
error on the other.

how
To describe
tions, we must
tional error. A
(Ident)
supplement
reasonably
and (Dual) make contact with these observa?

each of these theories with a model of observa?
of error will
general model
contain
four param?
eters:
(Error) Pr(Subject
Pr(Subject
does not say "Ouch"

I
No
Pain)
says "Ouch"
Pr(Meter
says "c-fibres
Pr(Meter
says "no c-fibres
firing"
IPain) = e\
= C2
I
No
firing"
c-fibres
Ic-fibres
e^
= e$
firing)
firing)
If we assume
that the way "ouches" indicate pain states is probabilistically

of
the way meter readings indicate c-fibre firings, then we can
independent
state the probability
of the conjoint observations
(?Ouch & ?Meter
says
"c-fibres
& ?c-fibres
on the conjoint
states (?Pain
underlying
firing"), conditional
are firing), as products
and their
of the above probabilities
complements.16
In the Akaike
advance
error probabilities
framework,
or are estimated
from
of the experiment
are either
the data,
in the models
in
specified
case
in which
under evalu?
parameters
they represent further adjustable
is
viable. The
the
former
In
the
ation.
present experiment,
only
option
2x2
table tell us the four frequencies
observations
given in the previous
sum to unity.
says "c-fibres firing"]. These must
we know
that
that there are three independent
frequencies
we
as
treat
If
in
unknowns
the parameters
the observations.
(Error)
of [?Ouch
This means
from
& ?Meter
187
from the data, then neither

(Ident) nor (Dual) is
five
&
contains
(Ident
Error)
identifiable.
adjustable parameters, where?
as (Dual & Error) contains
seven. There are infinitely many assignments
that must
be estimated
of values
to these parameters
that will maximize
the probabilities
of the
observations.
If this were
of error in the
say about the occurrence
then the Akaike
framework would not underwrite
the parsi?
experiment,
mony argument in favor of the identity theory that Smart, Brandt, and Kim
put forward.
all one
However,
evidence
could
to imagine that we might possess

us to assign values to the parameters
in
in other settings how often pains are
observed
it is not absurd
that allows
independent
(Error). Perhaps we have
associated with utterances
of "ouch"; it is similarly quite possible

that we
should have neurological
evidence concerning how sensitive the meter is to
c-fibre firings. If so, we can assign values to the parameters
in (Error), and
so this model of error will contain no adjustable
parameters. The observed
conjoint frequencies
find the maximum
of "ouches"
likelihood
readings now
of the parameters
and meter
estimates
can be used
in (Ident)
to
and
(Dual).
Because
than (Ident), the former mod?

(Dual) contains more parameters
fit the data at least as well as the latter. (Dual) will almost always do
for some specific arrays of empirical frequencies,
the two
better; however,
models will tie. (Dual) tends to beat (Ident) when
it comes to goodness
el will
of-fit, just as (PAR) tends to beat (LIN).

In any event, the greater goodness-of-fit
of (Dual) is just one of the
factors that is relevant in the Akaike
framework. The other is simplicity as
measured
It is here that (Ident) has
by the number of adjustable parameters.
an advantage
over (Dual), since the former has one adjustable parameter
the latter has three. If the data are such that the two models
fit the
data about equally well, we should prefer (Ident) over (Dual) on grounds
of simplicity,
just as Smart, Brandt, and Kim maintain.
This would be the end of the story if the identity theory and dualism did
nothing more than make claims about how mental and physical properties
co-occur. But there is more
to both theories
than this. I next want
to
while
consider
what
and causation
Dualists
mental
the identity
of behavior.
theory and dualism
say about
the explanation
not just that some organisms

have irreducibly
that those properties
explain and cause behavior
often maintain
but
properties,
in a way that the purely physical
of organisms
cannot. Even
properties
if the physical
characteristics
of an organism were fully specified,
this
account of why the organism behaves as it
would not provide a complete
does. Parallel
remarks pertain to the issue of prediction;
for the dualist,
188
ELLIOTTSOBER
the physical
traits of an organism provide, at best, an incomplete
basis on
to predict what the organism will do. To be sure, it is possible
to
which
a
formulate
dualism
in such
have no causal
way that mental
properties
or
own.
a good part of
But
of
their
power
explanatory
efficacy
historically,
in dualism has centered on its claim concerning
the irreducible
the
of
and explanatory
mental.17
importance
If we understand
in this way, what are we to make
of the
dualism
a
are
a
If
mental
and
then
identical,
property
identity theory?
physical
the causal efficacy
of the one just is the causal efficacy
of the other. I
the interest
causal
see nothing
are
to fault in the idea that names for the same property
is perhaps
in causal contexts, salva veritate. Explanation
intersubstitutable
a less straightforward
has to do with gains in
since explanation
matter,
In any case, I'll
and so may involve subjective
elements.18
understanding,
set to one side the question
of what the identity theory should say about
on what it says about causation.
concentrate
and
explanation,
I want to
the identity theory and dualism directly,
addressing
a
framework
make
explains why paucity of pos?
general point: the Akaike
a theory's estimated
tulated causes enhances
accuracy
(Forster
predictive
Before
1994). To see why, let's consider a simple

is a dichotomous
effect and P and M are two putative
and Sober
The
example.
Suppose B
causes.
dichotomous
says that P is the only cause that makes

occurs. The second model
says that both P
relevant:
first model
whether
causally
(One Cause)
(Two Causes)
a difference
and M
may
in
be
Pr(B
P&M)
Pr(B
P&-M)
Pr(B
Pr(B
-P&M)
= c
-P)
Pr(B
P & M)
Pr(B
P&-M)
= b
Pr(B
-P
&M)
= c
Pr(B
-P
&-M)
=
=
= a
Pr(B
P)
Pr(B
-P&-M)
= a
= d
that (One Cause) has two adjustable parameters

(Two
(a, c) while
Causes) has four (a, b, c, d).
run an
one might
of models,
these two families
To choose between
are
a
in
each
of
individuals
number
of
in
which
placed
large
experiment
in
be
the
could
four "treatment cells". The results of
displayed
experiment
Notice
a 2-by-2
table in which cell entries indicate
each treatment who exhibit the effect B:
the fraction
of individuals
in
189
to estimate the
If we use the empirical frequencies
(w, x,y,z)
the two models,
itmust emerge that the likeliest member
of
will fit the data at least as well as the likeliest member
of
The two models will show the same degree of goodness-of-fit
w ? y and x = z; if the four frequencies
differ even slightly,
will fit the data better.
parameters
in
(Two Causes)
(One Cause).
only when
(Two Causes)
The Akaike
framework
tells us how to evaluate
these two families
of models
of
considerations
i.e., how to bring together the conflicting
w
x
are
as
are
If
and
and
then
and
z,
close,
y
simplicity
goodness-of-fit.
the slightly better goodness-of-fit
of (Two Causes) will not compensate
In this case, we choose
is less parsimonious.
the simpler model because
it has a higher estimated predictive
accuracy.
if w, x, y and z are all very different,
then we should sacrifice
However,
and prefer the more complex model.19
parsimony
for the fact that this model
How does this comparison

of (One Cause) and (Two Causes)
apply to
the identity theory and dualism? Half this problem is straightforward;
dual?
ism's claim about the possible efficacies
of mental and physical properties
is captured by (Two Causes). Representing
the Identity Theory, however,
a bit. The identity theory entails
(One Cause)
requires that we fine-tune
are both not defined, because
P
-P
that Pr(P
&
and
&
M)
Pr(P
M)
|
|
=
the hypothesis
that
& M) = 0. This means
&
that
Pr(P
says
-M)
Pr(-P
the identity theory's claim about the causal efficacies
of P and M
is not
perspicuously
represented
by (One Cause),
but by the following
family
= Pr(P P) = Pr(P M) = a,
|P & M)
|
|
=
=
-P
&
Pr(P |
-M)
Pr(P |-P)
Pr(P |-M) = d,
and Pr(P |P & -M) and Pr(P | -P & M) are both not
(Ident-2) Pr(P
defined.
Note
that (Ident-2)
has two adjustable
parameters
whereas
(Two Causes)
has four.
I so far have provided
two treatments
The first has them postulating
different
which
mental
endorsing
and physical
models
different
of dualism
and the identity theory.

of the probabilities
with
tend to occur; the second has them
models
properties
of how mental
and physical
properties
confer
190
ELLIOTTSOBER
on behavior. These separate treatments may be combined

into
probabilities
a single formulation
in which each theory describes
both the probabilities
of causes
and
three adjustable
their efficacies.
parameters
a model with
identity theory endorses
two representing
the efficacies
of a single
The
-
mental/physical cause [Pr(? \P & M), Pr(B
\-P & -M)],
and one
on the
the probability
of that cause [Pr(P & M)].
Dualism,
representing
other hand, puts forward a model with seven independent
parameters
four for the efficacies
of different combinations
of causes
[Pr(i? \?P &
? M)]
When
favors
occur.
and three for the probabilities
with which these combinations
these models
fit the data about equally well,
the Akaike
framework
with the identity theory.20

to the arguments
advanced
by Smart (1959) and
Brandt and Kim (1967). They agree that parsimony
is the main advantage
can
over
that the identity theory
claim
dualism, but they disagree about the
Let
the model
us now
associated
return
Smart (pp. 155-6)

that this consideration
should be assigned.
significance
the
and
dualism
that
choice
between
the
resembles
says
identity theory
an
the choice between
with
its
of
ancient
theory,
evolutionary
postulate
and the
successive
layers of fossils are gradually deposited,
... with sediment
in
in
4004
BC
"that
the
universe
hypothesis
just began
the rivers, eroded cliffs, fossils in the rocks, and so on". Brandt and Kim (p.
533) agree that if the identity theory and dualism were related in this way,
a rational person must accept"
then "there would be no question whether
earth in which
the identity theory. But they deny that this is so. Brandt and Kim suggest
in the same sense that
that the identity theory doesn't explain observations
theories do. They conclude
(pp.
evolutionary
theory and other scientific
the identity theory and the theory of
that Smart's analogy between
533-^)
"is pernicious
in that it lends, or at least tends to lend, a false
evolution
a philosophical
to what
is essentially
respectability
the mental
correlations
between
of
interpretation"
perfect
air of scientific
and
speculative
the physical.
and
is that he denies Reichen?

interpretation of Smart's argument
can
a
reason
be
for assigning differ?
thinks
that
he
thesis;
parsimony
are
even
ent truth values
when theories
equivalent.
Interpreting
predictively
as
If they regard parsimony
Brandt and Kim is a little less straightforward.
A natural
bach's
force in the mind/body

though of an attenuated
problem,
having probative
are
in
face
of
Reichenbach's
the
thesis, though
sort, then they also
flying
a
the
other hand,
On
Smart
than
bit
less
bravado
with
displays.
perhaps
and "philosoph?
about the "speculative",
"metaphysical",
an endorsement
as marking
of this problem are understood
of
the
then their point about
of Reichenbach's
greater parsimony
position,
if their remarks
ical" character
191
the identity theory provides no reason
is false.
theory is true and dualism
It should
at all for thinking
that the identity
be clear
that the analysis of the dispute between

the identity
theory and dualism that I have offered goes contrary to what Smart, Brandt,
assumes
and Kim claim; their disagreement
that the identity theory and
about all possible
the same predictions
observations.
They
assess
a
reason
to
then try
how strong
for choosing
parsimony
provides
one theory over the other. In contrast, the Akaike
framework
tells us that if
dualism
make
are error free, simplicity plays no role in estimating

a theory's
error
if
I
have
that
observational
is
accuracy.
predictive
argued
possible,
the identity theory and dualism are not predictively
and that
equivalent,
the identity theory's simplicity
is a feature in its favor.
observations
6.
I have
defended
quite
different
CONCLUSION
assessments
of
the two main
examples
of time without
in this paper. Shoemaker's
discussion
change
and the Smart, Brandt and Kim discussion
of the mind/body
problem both
are
a
to
of
that
theories
be
present
supposed
predictively
pair
equivalent;
discussed
is cited as a reason for preferring one theory over

parsimony
the other. I have not contested
Shoemaker's
claim that his two theories are
in both cases,
but I have argued that parsimony,

understood
in
predictively
equivalent,
no
terms of the Akaike
reason
one
to
framework,
provides
prefer
theory
over the other. In contrast, I have tried to show how the parsimony
argument
for the identity theory can be justified within
the Akaike
framework, but I
argued that the two theories are not predictively
equivalent.
I have advanced
two more gener?
Besides
these examples,
discussing
one
al epistemological
more
of these is
conclusions;
however,
compelling
than the other. The first is that the rationale for using simplicity
in the curve
have
fitting problem
to discriminate
does
not carry over
between
as a justification
for using simplicity
theories.
The second con?
equivalent
predictively
is that a difference
in simplicity
between predictively
equivalent
theories counts as a merely
aesthetic or pragmatic consideration;
it is not a
one
for
that
true
is
and
the
other
is
I
false.
ground
thinking
suppose
theory
it is possible
to accept the first of these conclusions
but not the second. I see
no reason to do this, but those inclined to extract this lesson should do so
clusion
with
their eyes
about the scientific value of

pronouncements
and
unification,
simplicity, parsimony,
power are insufficient.
explanatory
The idea that scientific questions
and philosophical
are differ?
questions
ent in kind has taken a real
over
or
so.
the
last
Quine's
beating
forty years
whole philosophy
was devoted
to undermining
the distinction.
Our web of
open. Vague
192
ELLIOTTSOBER
is to be judged holistically
economy
by the twin criteria of conceptual
and empirical adequacy
(Quine 1953); to think that scientific propositions
answer to one set of standards while philosophical
answer to
propositions
belief
is to be trapped by an untenable
dualism. Kuhn's
(1970) work in
a similar picture. For Kuhn,
the history of science has defended
science
the
is unavoidably
saturated with philosophical
perhaps
presuppositions;
in Kuhn's
view of science has been to downplay
the
impulse
principal
another
idea that theory change is or can be driven by unbiased observation.

For
as
that
noted
Kuhn
claimed
the
of
choice
earlier,
Copernican
example,
over Ptolemaic
and nothing
astronomy was based on aesthetic preference
have a taste for
just as philosophers
prefer fewer epicycles
thus
The
of
appears to be univocal;
landscapes.
parsimony
principle
it seems to play a fundamental
role in both science and philosophy.
else.
Scientists
desert
This
of the boundary
the
crisp contrasts
largely replaced
but for reasons that bear rethinking.
blurring
between
science
and philosophy
has
and positivism,
posited by empiricism
In "Empiricism,
and Ontol?
Semantics,
In Experience
internal and external questions.
ogy", Carnap distinguished

two types of simplicity. Carnap
Reichenbach
and Prediction,
distinguished
in order to defend a thesis that
drew these distinctions
and Reichenbach
that there is no way to tell which of two
is essentially
epistemological
are
true
in both
if the theories
theories is
equivalent. However,
predictively
cases, this epistemological
point was anchored to faulty ideas in the philos?
can
of
ophy
language. Carnap thought that internal and external questions
that predictively
be separated by a syntactic
test; Reichenbach,
equivalent
that the
It is a point of the first importance
theories are in fact synonymous.
can
from the accompanying
be detached
claim
linguistic
epistemological
formulations.
The Akaike
often
viewed
the Akaike
resurrects some epistemological

ideas that are
that is best forgotten. To use
of a positivism
to distinguish
in a model
it is essential
parameters
that cannot. In
and parameters
from observations
framework
as elements
framework,
that can be estimated

goal is to discover models
are true. As noted
models
that
of
accurate; this differs from the goal
finding
accurate than
for (LIN) to be more predictively
it is quite possible
before,
as
that
cannot
be
evidence
this
(LIN) is true
(PAR). However,
interpreted
addition,
the fundamental
and (PAR) is false; after all, (LIN) entails (PAR).Although (PAR)must

of being true than (LIN) does, no matter what the
have a higher probability
have a higher degree of predictive
observations
say, (LIN) can nonetheless
sense of the pervasive
scientific
make
framework
Akaike's
accuracy.
helps
are
these
models
that
models
useful;
empirically
goal of finding simplified
193
are known
to be false,
but that does not bar them from making
reasonably
accurate
predictions.
these elements
Although
in the Akaike
approach do not accord well
said in favor of scientific
realism, the approach
is equally at odds with some forms of empiricism.
There is no distinction
between
theories that are strictly about observables
and theories that are not.
with much
that has been
comes from observation,

For Akaike,
evidence
but the content expressed
concern
are
matters
not
theories
that
may
by
directly observable.
should have an
Carnap and Reichenbach
thought that their epistemology
impact on the conduct
in pseudo-problems;
of philosophy.
Philosophy,
they believed,
allow us
epistemological
critique would
are.
to see these problems
for what they
The subject of the present paper
more modest.
has been more
is accordingly
limited, and my conclusion
and
when
in terms of the
understood
considerations,
Parsimony
simplicity
important
was mired
Akaike
framework,
do not license
the parsimony
constructed.
and simplicity arguments

It isn't that philosophical
that philosophers
have sometimes
are always misguided;
as we have seen, the identity
of parsimony
can
over
a
be
in
defended
dualism
way that fits fairly well into the
theory
are more alien
Akaike
framework. However,
other appeals to parsimony
uses
to the Akaike
outlook. Perhaps philosophy

and science
than a shared vocabulary
sometimes
suggests.
are more
dissimilar
NOTES
*
I am grateful
to Martin
Walsh
for comments
Denis
Barrett,
Eells,
Ellery
on earlier drafts. My
Malcolm
thanks
also
and
Forster,
Gregory
Mougin,
to members
of the philosophy
departments at London School of Economics andWayne State University for useful dis?
cussion.
1
Examples
would
include
the constructive
empiricism
of Van
Fraassen
(1980),
the con
trastive empiricism of Sober (1990a, 1993a, 1993b), and the gentle empiricism of Earman
(1993).
2
I argue this point in connection with maximum likelihood estimation in Sober (1988a,
1988b).
3
In Sober (1990b), I discuss examples in which differences in simplicity or
parsimony
reflect differences in likelihood or differences in prior probabilities. From a Bayesian point
of view, these considerations must exhaust the relevance of simplicity. The approach to
the curve-fitting problem I'll outline in the next section fits neatly into neither of these two
formats.
Bayesian
4
We assume
here
that the old
and new
data
sets contain
the same
number
of observations,
so that what is defined above is predictive accuracy with respect to data sets of size n.
Without this restriction, the definition should be given in terms of predictive accuracy per
datum.
5
Akaike's theorem also reflects the fact that the amount of data is relevant to
deciding how
much weight simplicity deserves. If there is a very slight parabolic bend in the data, itmay
194
ELLIOTTSOBER
sense
to favor (LIN) when
if the data set
the data set is relatively
small; however,
a curve's
Since
is quite
SOS almost
goes up as
large, (PAR) may be preferable.
inevitably
this means
is increased,
the number
of data points
that the Akaike
estimate
of predictive
more
and less and
is determined
and more
considerations,
by goodness-of-fit
inaccuracy
as the data set increases
in size.
less by simplicity,
6
n ounces. A
To understand
consideran
that weighs
the idea of an unbiased
estimate,
object
an
of
if
estimate
the
balance
unbiased
repeatedly
spring
object's weight
weighing
provides
make
of n ounces;
the object would

reading
yield an average
course deviate
from that average
value.
7
does not explore
the question
Shoemaker
events
he goes
of what
these
Whether
events
the physical
are consistent
are
laws
with
how we would
epistemically
8
Scientists
about
theorize
them.
not whether
possible,
to express
often prefer
that permit
the
is an
physics
at hand. In effect,
current
the calendar
might
prefer
For
(UF')
all augmented
If planet
their
and
is
freeze
possible.
freezes
The
fact
years
a,
in year
generalizations
that X\
first
in a format
that doesn't
a, then
it freezes
freeze
in year
a +
3,
4,
5.
If planet
freezes
in year
a, then
it freezes
in year
a+
If planet
freezes
in year
a, then
it freezes
in year
a+
in year
o, then
it freezes
every
in which
in year
case X
o+
3 years,
in year
case Y
o+
4, unless
next
freezes
For
(NUF')
it is physically
a universal
is whether
question
depend
to occur
in a year
happened
the number
the point. We
3 is besides
of observed
years
assigns
to consider
of the two hypotheses:
free" representations
"coordinate
on a unit of measure.
essentially
to which
Shoemaker's
of
may
to think about the problem

but it is not the only way
question,
a set of observations
to one side and asks us to consider
sets current physics
interesting
Shoemaker
therefore
on to describe.
measurements
individual
all observed
years
If planet X freezes
in a series of freezes
If planet Y freezes
If planet Z freezes
o,
in year
o, then
it freezes
every
4 years,
in which
in year
o, then
it freezes
every
5 years,
in which
3, unless
next
freezes
o+
in year
5, unless
case Z next freezes
o is the
19th
o+
in year
o is the
in year
in year
14th
o+
o is the
5.
7.
11th
o+
9.
if these for?
not be affected
The conclusion
I'll reach concerning
(UF) and (NUF) would
were used instead.
mulations
9
status. This
is an
cannot
in their epistemic
differ
I assume
that equivalent
hypotheses
basic
is
the
in
which
common
theories
to
all
confirmation
concept.
probability
assumption
10
in Forster
and Sober
"error theorem"
discussed
I omit here discussion
of the important
is often more
families
of lower dimensional
the predictive
accuracy
(1994).
Estimating
of families with higher dimensionality.
to error than estimating
the accuracy
subject
11
to assume
that the periodicities
in advance
In setting up this problem,
there is no reason
decimal
have up to, say, a hundred
that they may
must
So let's allow
be integer valued.
places.
12
In describing
to say anything
observa?
about whether
I neglected
Shoemaker's
example,
that inhabitants
this idea; just assume
to error. It would
are subject
be easy to introduce
on others. However,
the conclu?
what happens
in observing
of one planet can make mistakes
makes
this detail superfluous.
have the same number of parameters
sion that the two families
tions
Regardless
estimate
of whether
of predictive
there
accuracy.
is error
or not,
the two
families
must
have
the same Akaike
195
13
Of
a
to associate
it is easy enough
that has more
(UF) with
adjustable
family
I doubt
the one associated
with
is that there is any reason
to
(NUF). What
a family
that has fewer
than the one we should
associate
(UF) with
parameters
course,
than
parameters
associate
with (NUF).
14
One possibility
is that our intuitions concerning (UF) and (NUF) are influenced by
a quite different
inference
Let us forget about the epistemological
problem.
of local and universal
freezes
and imagine
that inhabitants
of the planets
"F" (e.g., the occurrence
have property
of snow storms) periodically
planets
about
thinking
consequences
that various
1-59. Given
years
the data
in (1), how
presented
should
these
people
choose
between
see
in
the
following hypotheses?
(HI)
For all years i,

Planet X has F in year i iffmod(?/3) = 0,
Planet Y has F in year i iffmod(z/4) = 0,
Planet Z has F in year i iffmod(z/5) = 0.
(H2)
For all years i,

Planet X has F in year i iffmod(?/3) = 0 andmod(?/60) ^ 0,
Planet Y has F in year i iffmod(?/4) = 0 andmod(?/60) ^ 0,
Planet Z has F in year i iffmod(z/5) = 0 andmod(z/60) ^ 0,
Notice
that neither
sumably,
so,
(HI)
observed
of these hypotheses
mentions
the calendar
of augmented
years. Pre?
the "years"
is recorded
in the observed
calendar.
If
they talk about concern what
and (H2) are not predictively
about what will happen
in
equivalent;
they disagree
are about to make
both hypotheses
false pre?
year 60. And as data set (2) shows,
dictions. (HI) falsely predicts a universal F in the observed year 60; (H2) falsely predicts
that the three planets will
have
not confuse
Shoemaker's
should
in the observed
problem
with
years
the problem
63,
64,
and 65,
of choosing
We
respectively.
between
(HI) and
(H2).
to Denis Walsh
151 am grateful
16
It is a substantive
assumption
to this question.
are probabilistically
inde?
readings
on
state
conditional
the
of
and
c-fibre
of error
pendent,
underlying
pains
firings. A model
in which
is possible,
this is not assumed
and it will
contain more parameters
than (Error)
does. Adopting
this more
model
would
not affect
the points
I'll argue for in what
complex
follows.
17
Although
dualism,
for drawing
that ouches
my
this view
about causal
captures
efficacy
not correspond
to the type of property
Kim holds
that the mental
and physical
it does
Kim
attention
and meter
(1984).
is a cause
that each
of behavior,
but
that mental
something
dualism
of
the spirit of Cartesian

for example,
by
of a person
are distinct,
defended,
properties
an irreducible
do not make
properties
no difference
makes
to the occurrence
of
M
contribution.
is a cause of B, even though M
P that the individual
B, once one holds fixed the physical
properties
possesses.
18
In this connection,
it is worth
Enc's
idea that theoretical
(1986)
considering
identity
an explanatory
made
of H2O
asymmetry.
Being
explains
why
is made
of water,
but not conversely.
one's
c-fibers
fire explains
something
Having
why
one is in pain, but not conversely.
so on.
And
19
This decision
not just on the spread among
will depend,
the four empirical
frequencies,
but on the amount
of data, as pointed
out in footnote
5.
20
The same style of argument
can be developed
to permit dualism
to be compared
with an
statements
anti-reductive
often
involve
physicalism
in which
the mental
supervenes
on
the physical
(Fodor
1975;
196
ELLIOTTSOBER
Kim
For
1984).
example,
that a mental
suppose
property
is "multiply
realizable"
by
different physical properties Pi, ft,...,

Pn; in the biological species Si, organisms have
M by having physical property Pi, in species S2, organisms have M by having Pi, and
so on. In this case, a pair of "local" models may be specified for each biological species
- one of them
dualistic, the other physicalistic. Global anti-reductive physicalism is thus
represented as a set of local reductive identity theories.
REFERENCES
Akaike, H.: 1973. 'InformationTheory and an Extension of theMaximum Likelihood Prin?
in B. Petrov
ciple',
and F. Csaki
Second
(eds.),
International
on Information
Symposium
Theory, Akademiai Kiado, Budapest, pp. 267-281.

Brandt, R. and J.Kim: 1967, 'TheLogic of the Identity Theory', Journal of Philosophy 64,
515-537.
1950,
Carnap,R.:
and Ontology',
Semantics,
'Empiricism,
phie 4,20-40. Reprinted inMeaning and Necessity

go, 1956).
G.:
Chaitin,
west
Enc,
'Randomness
1975,
J.: 1993,
Earman,
in Philosophy,
Studies
B.:
and Mathematical
'Underdetermination,
of Notre
University
without
'Essentialism
1986,
Realism,
Individual
Revue
Internationale
de Philoso?
(University of Chicago Press, Chica?
American
Proof,
Scientific
and Reason',
in H. Wettstein
Dame
Notre
Press,
Essences:
Dame,
Causation,
232,47-52.
(ed.), Mid?
pp.
Kinds,
19-38.
Superve
nience, andRestricted Identities', inP. French et al. (eds.),Midwest Studies inPhilosophy

11, 403-427.
Fisher, R.: 1925, Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh.
Fodor,
J.: 1975,
Forster,
M.:
The Language
1988,
Studies
Mechanics',
of Thought,
Thomas
Crowell,
New
and the Composition

Explanation,
and Philosophy
in the History
of Science,
York.
of Causes
'Unification,
in Newtonian
55-101.
Forster,M. and Sober, E.: 1994. 'How toTell When Simpler, More Unified, or Less ad hoc
Theories Will Provide More Accurate Predictions', British Journal for thePhilosophy of
Science
1-36.
45,
I.: 1965.
Hacking,
J.: 1984.
Kim,
'Salmon's
32, 269-271.
Philosophy
of Science
and Supervenient
In P. French,
Causation'.
T. Uehling
Vindication',
'Epiphenomenal
and
H. Wettstein (eds.),Midwest Studies inPhilosophy, vol. 9, University ofMinnesota Press,

pp. 257-270.
Minneapolis,
T.: 1957, The Copernican
Kuhn,
Revolution,
Harvard
University
Press,
Kuhn, T.: 1970, The Structure of Scientific Revolutions, University
Cambridge,
Mass.
of Chicago Press,
Chicago.
H.:
Putnam,
1975,
'Mathematics,
Matter,
and Method',
Philosophical
volume
Papers,
I,
Cambridge University Press, Cambridge.

Quine, W.: 1953, 'TwoDogmas of Empiricism', inFrom a Logical Point of View, Harvard
University
Reichenbach,
Sakamoto,
Kluwer
Mass.,
pp. 20-46.
Cambridge,
and Prediction,
of Chicago
1938, Experience
Press,
University
and G. Kitagawa:
Criterion
Y., M. Ishiguro,
1986, Akaike
Information
Dordrecht.
Publishers,
Press,
H.:
Chicago.
Statistics,
Shoemaker, S.: 1969, 'TimeWithout Change', Journal of Philosophy 66,363-381. Reprint?

ed in Identity, Cause, andMind Cambridge University Press, Cambridge, 1989.
Smart,
J.: 1959,
'Sensations
and Brain
Processes',
Philosophical
Review
68,
141-156.
PARSIMONYAND PREDICTIVEEQUIVALENCE 197

Sober,
E.:
Sober,
E.:
Sober,
E.:
1988a,
'Likelihood
and Convergence',
55, 228-237.
of Science
Philosophy
the Past: Parsimony,
and Inference, MIT
Evolution,
1988b, Reconstructing
Mass.
Cambridge,
1990a,
'Contrastive
of Minnesota
versity
Biological
E.:
Sober,
Point
1990b,
inW.
Empiricism',
Press,
Minneapolis,
pp.
of View, Cambridge
University
'Let's Razor Ockham's
Razor',
Reprinted
Press,
Cambridge,
in D. Knowles
Its Limits,
Sober,
Uni?
Theories,
Scientific
a
in E. Sober, From
(ed.),
Savage
392-412.
Press, Cambridge,
University
Cambridge
From a Biological
Point of View, Cambridge
Press,
Mass.,
pp.
University
1994.
Mass.,
(ed.),
Explanation
and
Reprinted
in E.
73-94.
Press,
Cambridge,
Mass.,
1994.
Sober,
E.:
for Empiricists',
'Epistemology
of
Notre
Dame
Press,
University
Philosophy,
Sober, E.: 1993b,
Sober,
Van
E.:
Fraassen,
1993c,
B.:
'Mathematics
and
Philosophy
Scientific
February
September
Department of Philosophy
of Wisconsin
5185 HelenC. White Hall

600 North Park Street
Madison, WI 53706-1475
U.S.A.
Notre
Indispensability',
Westview
of Biology,
1980, The
submitted
Manuscript
Final version
received
University
in H. Wettstein
1993a,
Madison
13,
26,
Image,
1995
1995
Oxford
(ed.), Midwest
pp. 39-61.
Review
Philosophical
Studies
Dame,
Press,
Boulder,
University
Colorado.
Press,
Oxford.
102,
35-58.
in

Parsimony and Predictive Equivalence

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Parsimony and Predictive Equivalence

Cargado por

Copyright:

Formatos disponibles

ELLIOTT SOBER

PARSIMONY AND PREDICTIVE EQUIVALENCE

If a parsimony criterion may be used to choose between theories thatmake

cal theories. How

the principle of parsimony

for one use of the principle

but make different predictions

descriptive simplicity. In this instance, the difference in simplicity ismerely

thinks that predictively

related. Even if the metric

each other will

's claim about two types

that are predictively

is that this approach

the mark, but in

not just a verdict

on the role of simplicity

faced head on the problem of justifying

Imagine a set of data points; the goal is to choose the best

should guide our choice of curve because

in each instance selecting

and less familiar

the data are infinite? Inmy view, convergence

can always be made

that the "method

I conclude, did not succeed

I'm going to outline a solution to the problem of under?

that are predictively

that the principle

philosophical theories because it is a legitimate criterion in scientific infer?

is related only metaphorically

that have been made

in two quite different

how the predictive

accuracy, we first must be careful to

To explain the idea of predictive

they use the available data to obtain the

in two steps. First,

look like. The question

let us begin by considering

set of data points is generated

a data set thus obtained, which

now are in a position

curve C has with

the quantity on the right side of this equation

so far have defined

in terms of small SOS, but

the idea can be stated more

the smallest SOS value is the member

from the data at hand, how predictively

in F and a2 is the error

are compared with each other, and so may be

term plays a larger role in estimating

to ask what the assump?

tions are from which

that the likeli?

sample size is large, in the sense that enough

data are available

applies to inductions over observational

and not to abductions

in (LIN) and (PAR) may be as "theoretical"

that this intuitive

that postulate. The