Está en la página 1de 31

ELLIOTT SOBER

PARSIMONY AND PREDICTIVE EQUIVALENCE

ABSTRACT.
different

If a parsimony criterion may be used to choose between theories thatmake


may

predictions,

the same

criterion

be used

to choose

between

theories

that are

predictively equivalent? The work of the statistician H. Akaike (1973) is discussed in con?
nection with this question. The results are applied to two examples inwhich parsimony has
been invoked to choose between philosophical theories - Shoemaker's (1969) discussion
of the possibility of time without change and the discussion by Smart (1959) and Brandt
and Kim (1967) of mind/body dualism and the identity theory.
razor -

Ockham's
solve

scientific

cal theories. How


contrast

between

the principle of parsimony


has been invoked to help
it also has been used to evaluate philosophi?
problems;
are these
scientific

to a second

imperfectly)
ny is used

to choose

sometimes

it is used

two applications
of the principle
related? This
and philosophical
is related (perhaps
arguments
sometimes
the principle of parsimo?
distinction;

between

theories

to discriminate

Does
the rationale
ly equivalent.
rationale for the other?

that make

between

different

theories

predictions;
that are predictive?

for one use of the principle

provide

1. REICHENBACH'S THESIS
In Experience

and Prediction,
Hans Reichenbach
argues that a difference
in simplicity
can
two quite different
have
among competing
hypotheses
sorts of significance.
two theories fit the available
When
data equally

well

but make different predictions


about what new data will look like, the
theories may differ in their inductive simplicity,
if so, this difference
counts
as a reason to think that the
more
is
simpler theory
plausible. However,
when two theories are predictively
equivalent,
agreeing not just about the
extant data but about all possible observations,
they can differ only in their

descriptive simplicity. In this instance, the difference in simplicity ismerely


aesthetic or pragmatic; here itwould be wrong to think that a difference in
is a basis on which
to attribute different truth values.
simplicity
Reichenbach's
discussion
anchors this distinction
to a view of theories
now accept. He says that theories that are
that few would
predictively
are "logically
equivalent
equivalent";
they differ only verbally, not in the
Erkenntnis 44: 167-197, 1996.
1996Kluwer Academic Publishers. Printed in theNetherlands.

168

ELLIOTTSOBER

substance
theories

of what
are related

thinks that predictively


they say. Reichenbach
equivalent
to each other as different systems of measurement
are

related. Even if the metric


than the system of inches
system is "simpler"
and feet, it would be foolish to conclude
that "the box is 25.4 centimeters
wide" and "the box is 10 inches wide" could differ in their plausibility.
says that he wrote Experience
of logical positivism.
Yet, this pronouncement
is echt positivism.
Those
of us who
think
Reichenbach

claims
one

can be

incompatible
even if we

with

and Prediction
about predictive
that predictively

each other will

as a critique
equivalence

equivalent
to set this idea to

want

's claim about two types


of
is not whether
the term
course,
question,
is
the
word
like
is
"simplicity"
ambiguous.
Surely
nothing
"simplicity"
asserts
the word "bank". Rather, what I'll call Reichenbach's
thesis
that a
in simplicity
is grounds for assigning
difference
in
different
truth values
side. But

of simplicity

remains.

do

so, Reichenbach

The

one circumstance,
but not in the other.
To evaluate this thesis, we first would have to understand what justifies
the use of a simplicity
criterion in the case of predictively
non-equivalent
theories. We then would have to determine whether
that rationale transfers
to the case of theories

that are predictively


Those
inclined to
equivalent.
as a sui generis constituent
the "principle of simplicity"
of "rational?
remark
ity" may think this problem has an easy solution. But if Russell's
about the advantages
of theft over honest toil applies here, the conclusion

view

to draw

is that this approach

short-circuits

the problem;

it does

not solve

it.

the mark, but in


thesis also misses
of epistemological
which
positions,
reasons
termed
end
that
be
up denying
might loosely
objective
"empiricist",
can be given for choosing between
theories.1 Even
equivalent
empirically
an
turns
out
to
sort
if
of this
be plausible,
itmay or may not
epistemology
is
deliver a full assessment
of Reichenbach's
thesis. For what is wanted
Another

a way

approach to Reichenbach's
that is more subtle. A number

not just a verdict

on the role of simplicity


considerations
an
of whether
but
equivalent,
understanding

when

theories

are

and why

simplicity
are not empirically
then will
Only
equivalent.
to see if an inferential principle
that makes good sense in one

empirically
is relevant

when

we be able

theories

leads to nonsense

circumstance
Reichenbach

in another.

faced head on the problem of justifying


of
idea was
that repeated applications

his thesis

about

the principle
of
simplicity.
on
ever
to
will
sets
size
of
data
converge
eventually
increasing
simplicity
the truth, if there is a truth on which inference could converge. Reichenbach
the curve-fitting
this general
line of argument by discussing
illustrated
His

problem;

the accompanying

figure

comes

from page

375 of Experience

169

PARSIMONYAND PREDICTIVEEQUIVALENCE

Figure

I. Reichenbach's

(1938)

illustration

of

the curve-fitting

problem.

Imagine a set of data points; the goal is to choose the best


curve. Many
curves pass through the data points; some are smooth while
in
others are bumpy. Reichenbach
argues that the simple curve depicted

and Prediction.

this figure is preferable over the curve that ismore complex. By simplicity,
a curve obtained by connecting
meant
Reichenbach
the data points by
the curve to remove discontinuities.
straight lines and then smoothing
Reichenbach
this method
more

should guide our choice of curve because


says that simplicity
will lead to the true curve in the infinite limit. As we examine

and more

in each instance selecting


data points,
the simplest curve
we
fits the data,
will recover the truth (if truth
eventually

that perfectly
there be).
The standard

to this justification
is that other principles
objection
on
the principle
of simplicity
the truth in the infinite
converge
limit. A procedure
that introduces crazy bumps into the curves it postu?
as the size of the data grows,
their magnitude
lates, but which diminishes
will agree with the principle
of simplicity when
the data set is infinite.
besides

However,
principle
the limit

for finite data sets, a procedure of this sort will disagree with the
as to which curve is best. In short, convergence
of simplicity
in
is not a sufficient condition
for justifying
the principle
(Hacking

1965).
There

is a second

and less familiar

to Reichenbach's
objection
sensible
inference procedures

argument.
to show that quite
It is possible
sometimes
violate the requirement
of convergence
in the limit.2 A method
is conver?
to
if
the
method
certain
is
the
truth
when
gent only
yield
applied to an
we
set.
data
infinite
If
abandon the demand for certainty in the face of finite
data, why

should we

impose

it in the hypothetical

circumstance

in which

170

ELLIOTTSOBER

the data are infinite? Inmy view, convergence


is not a necessary
condition
to be a reasonable one to use.
for a principle of inference
There is a third objection
that bears mentioning.
The two curves that
considered
in the accompanying
figure both pass through
the data points exactly. However,
in the real world, observation
is always
error.
a
to
Scientists
curve's
this; they compute
sum-of
subject
recognize
how far it is from the
squares (SOS). For each data point (x, yd), measure

Reichenbach

point (x,yc) on the curve; then square this distance and sum the squared
distances
for the entire data set. The fact that a curve's SOS value is greater
than zero hardly disqualifies
it from scientific consideration.
That Reichenbach's
argument
ignores the impact of error
the reason is that simplicity
criticism;
conflict. A simple curve will usually
curve
exactly; a sufficiently
complex

and goodness-of-fit
fail to pass through

is no

idle

are usually
in
the data points

can always be made


to do so. This
a
that the justification
for using
criterion in curve-fitting
simplicity
an
must
account
include
how
and goodness-of-fit
of
problems
simplicity
rate of exchange
should be traded off against each other. Which
is the right

means

assuming
are

of the problem
glides over this matter by
selects curves whose SOS values
of simplicity"

treatment

one? Reichenbach's

that the "method

zero.

I conclude, did not succeed


in justifying
the thesis I have
Reichenbach,
on
as
an
named for him. But his thesis lives
important claim with which
to
to reckon. As long as the rationale for using simplicity
considerations
theories remains a mystery, Reichen?
predictively
nonequivalent
as a
It cannot be dismissed
thesis should also remain a puzzlement.
residue of positivism.

evaluate
bach's
mere

I'm going to outline a solution to the problem of under?


truth values in
is a ground for assigning
different
standing why simplicity
will
focus
be on the
the case of predictively
theories.
My
nonequivalent
cover
in
sim?
which
all
circumstances
this
doesn't
curve-fitting
problem;
In this paper,

theories,
nonequivalent
plicity is relevant to choosing between empirically
but it is certainly a very central case.3 We will see that this treatment of
in the curve-fitting
the role of simplicity
considerations
problem provides
no rationale

whatever

for choosing

that are predictively


count
that simplicity differences

between

theories

prove
equivalent. This doesn't decisively
in the case of predictively
for nothing
equivalent
conclusion.
does lend support to that epistemological
in one context because
itmakes
of simplicity
commit an epistemological
equivocation.
Besides
Reichenbach's
thesis,
discussing
role of parsimony

considerations

good

in philosophical

theories.

it
However,
To use the principle
sense in the other is to

I also want
theorizing.

to consider
The

the

idea has

171

PARSIMONYAND PREDICTIVEEQUIVALENCE
gained

that the principle

currency

of parsimony

can be invoked

to evaluate

philosophical theories because it is a legitimate criterion in scientific infer?


ence. This

raises

the question

of whether

use of a parsimony
philosophical
to the use made of that principle
in

is related only metaphorically


principle
science. So, after outlining a solution to the problem of understanding
1*11discuss parsimony
in curve-fitting
problems,
simplicity matters
ments

that have been made

2.
The

statistician

H. Akaike

in two quite different

philosophical

why

argu?
contexts.

AKAIKE'S THEOREM
and his

school

a set of ideas
developed
a
curves
of
may be
family of
et al. 1986, Forster and Sober
have

how the predictive


accuracy
1973, Sakamoto
(see Akaike
show how simplicity
(as measured
1994). Their theorems
by the number
are relevant
in an equation)
and goodness-of-fit
of adjustable parameters
an
accurate
to estimating
how predictively
is.
equation
concerning
estimated

accuracy, we first must be careful to


the specific curves that are members
consider the infinite family of straight lines in
of that family. For example,
the x-y plane. These all have the form:

To explain the idea of predictive


a family
of curves from
distinguish

= a + bx.
(LIN) y
In this equation, a and b are adjustable parameters.
Once values are fixed
once
are
a
for these parameters
(i.e.,
adjusted),
they
specific straight line
is obtained.
use families of curves to predict new data from old data. The
Scientists
process

comes

best-fitting
smallest SOS

to predict what

they use the available data to obtain the


of the family with the
(i.e., the member
this best-fitting member
of the family is used

in two steps. First,


of the family

member

score). Then,
new data will

look like. The question

is how well

the curve

in the family that best fits the old data will do in fitting the new data. A
in this two stage prediction
task on one occasion,
family might do well
so
on
but not
well
another. Intuitively
the predictive
speaking,
accuracy
of a family is how well
it would perform on average, were this two step
process repeated again and again.
To make this idea more precise,

let us begin by considering


how the old
data are obtained. Our ultimate goal is to discover what the true relationship
is between
the independent
variable x and the dependent
variable y; just

to fix ideas, imagine that x is the temperature of the gas inside a kettle
and y is the pressure
that the gas exerts on the sides of that rigid chamber.
heat the kettle to different temperatures
and observe what pressure the

We

172
kettle
which

ELLIOTTSOBER
then experiences.
We
each of
thereby obtain several observations,
- a
can be represented
as a pair of numbers
in
the x-y
point
(x, y)

plane.

set of data points is generated


curve;
by the true (but unknown)
an
value
for
the
observed
value
for
is
obtained
because
of
x,
y
input
given
that links x to y. However,
there is a second factor that
the true equation
This

even if the
influences
the observed value obtained for y. For example,
true curve happens to be a straight line, the data points will almost certainly
fail to be exactly collinear. The reason is that observation
is always subject
to error, at least to some degree.
In the kettle example,
there is some

also

true relation

and
between
temperature and pressure, but the thermometer
accurate.
the pressure gauge don't always report values that are perfectly
the data don't necessarily
that comprise
the
The (x,y) values
represent
a
true pressure
associated
with
value; they represent
temperature
given
the observed pressure gauge reading associated with a given thermometer
reading.
Given

a data set thus obtained, which


family of curves should be used
to predict new data in the way sketched above? For example,
should (LIN)
it be better to use
be used, or would

(PAR) y = a + bx + ex2,
curves? Because
the family of parabolic
(LIN) is a special case of (PAR)
=
that (PAR) will fit the
0), we know in advance
(obtained by setting c
that
this does not guarantee
data at least as well as (LIN) will. However,
new
a
true
at
If
relation
data.
the
better
do
will
(PAR)
predicting
job
than
is linear, (PAR) will probably do worse
of temperature and pressure
task. This is because
(PAR) will "over-fit" the data.
(LIN) in this prediction
(PAR) will interpret the data's departure from linearity as an indication that
oix and y is genuinely
nonlinear;
(LIN), on the other hand,
so to speak, is
as
error.
to
due
these
deviations
Over-fitting,
interpret
be
this mistake
of confusing
the mistake
signal and noise. How might
the true relation

will

avoided?
As noted earlier, given a
this problem more
generally.
level of goodness-of-fit
may be obtained
by
body of data, any desired
a
curve
often
is
that
sufficiently
complex.
Simpler hypotheses
constructing
new
do worse at fitting the data at hand, but do a better job of predicting
Let

us pose

is influenced
data. The predictive
accuracy of a family of curves apparently
it contains).
by how simple it is (i.e., by how many adjustable parameters
in
this is just a brute fact or can be understood
is whether
The question
some general and mathematical
way.

173

PARSIMONYAND PREDICTIVEEQUIVALENCE

now are in a position


to define the predictive
accuracy of a family F
we will do this by characterizing
For convenience,
the concept
of predictive
of family
(F, D) denote the member
/?accuracy. Let Bestfit
F that best fits data set D. Let SOS (C, D) denote the sum-of-squares
that
We

of curves.

curve C has with

to data set D:

respect

Predictive

inaccuracy
[Bestfit

Average-SOS

of family F =#
(F, Dx),

D2l

the quantity on the right side of this equation


is large, the family is
not very good at predicting new data by fitting itself to old data; the family
is predictively
inaccurate.4

When

We

so far have defined

in terms of small SOS, but


accuracy
If we adopt the standard assumption

predictive

the idea can be stated more

generally.
that errors are symmetrically
distributed
around a curve, with large errors
less probable
than small ones, the SOS value of the best fitting
being
curve in a family F has a special meaning.
The member
of F that has

the smallest SOS value is the member


to the
of the family that assigns
data the highest probability. This best fitting curve is the likeliest member
sense of likelihood
of the family,
in the technical
introduced by R. A.
is the hypothesis
H in F that maximizes
(1925). Best-Fit^,
D)
are families whose
accurate
the quantity Pr(Z) H).
families
Predictively
|
likeliest members,
relative to old data, also have high likelihoods
relative
to new data.

Fisher

a desirable
is obviously
feature of families
of
accuracy
use
we
want
to
to
a
families
when
fitted
old
will
do
curves;
that,
data,
new
at
data. However,
for all that, predictive
good job
predicting
accuracy
seems to be epistemologically
inaccessible.
It seems that we can't tell,
Predictive

from the data at hand, how predictively


accurate a family of curves will
be. To be sure, it is easy to determine
how well a family fits the present
is how well the family will do in predicting
data; what seems inaccessible
new

data.

Akaike's

remarkable

epistemologically
An

unbiased

family F,
SOS

given

shows

The

estimate

[Best-fit

k is the number
- the
variance
degree
to have.
observations

Here

theorem

accessible.

of the predictive

data set D,

(F, D)]

that predictive
says that

is, in fact,

accuracy

theorem

is provided

+ 2ka2

of adjustable
of dispersion
The constant

inaccuracy

of

by the quantity

+ constant.

in F and a2 is the error


parameters
around the true curve that we expect
third term

in Akaike's

theorem

dis

174
appears

ELLIOTTSOBER
when

ignored. Notice
the conclusion
member
Second,
second

are compared with each other, and so may be


hypotheses
that there are two properties
of a family that can lead to
that it will be predictively
inaccurate. First, its best-fitting

data poorly
may fit the available
(i.e., have a high SOS score).
the family may have a large number of adjustable parameters. This
term in Akaike's
theorem gives simplicity
its due; the complexity

of a family ismeasured
it contains.
by how many adjustable parameters
see
that the number of adjustable parameters
It is important to
is not a
= ax + bx" and
an
of
the
feature
syntactic
equation. Although
equations "y
= ax + bz"
(a and
may each seem to contain two adjustable parameters
"y
?
can
so.
not
is
The former equation
be reparameterized,
let a!
a+b,
b), this
can be restated as "y ? a!x". For this
case the first equation
in which
in fact contains one adjustable parameter, while
the first equation
in
contains two. If you like, think of the number of parameters
a family as the number of quantities whose values need to be fixed for the
about data (given standard assumptions
about
family to make predictions
reason,

the second

error).

The

second

term in Akaike's

theorem,
also mentions

of adjustable
parameters,
variance
is large, this second

to the number
adverting
error
the
variance. When
this

besides

term plays a larger role in estimating


the
is
free
when
observation
of
error,
inaccuracy;
largely
family's predictive
cannot
contribution.
the second term makes
only a negligible
Simplicity
matter when observation
is error free; itmatters more and more as the data
become

noisier.

the data
theorem to (LIN) and (PAR). Suppose
Let us apply Akaike's
at hand fall fairly tightly around a straight line. In this case, the best fitting
So Best-fit
straight line will be very close to the best fitting parabola.
(PAR, D) will have almost the same SOS values.
(LIN, D) and Best-fit
theorem says that the family with the smaller
In this circumstance,
Akaike's
to be more
is the one we should estimate
number of adjustable parameters
if it fits the data about
accurate. A simpler family is preferable
predictively
theorem
describes
how much
as well as a more complex
Akaike's
family.
a
more
must
in
family
provide
complicated
goodness-of-fit
improvement
sense to prefer the complex
for it to make
family.5
Akaike's
ever

to ask what the assump?


is a theorem, so it is essential
assumes
that the true curve, what?
it derives. Akaike
the same for both the old and new data sets considered

theorem

tions are from which

it is, remains
of predictive
in the definition
hood function is "asymptotically

that the likeli?


accuracy. He also assumes
that the
normal". And finally, he assumes

sample size is large, in the sense that enough


value of each parameter can be estimated.

data are available

so that the

175

PARSIMONYAND PREDICTIVEEQUIVALENCE
As

noted

before,

Akaike's

theorem

a family's
individual
The

This

identifies

an unbiased

estimate

of

that
open the possibility
predictive
inaccuracy.6
estimates may
from this true value.
stray quite considerably
there are other unbiased
theorem does not say whether
estimators.
leaves

In addition,
there are other desirable
statistical properties
of an estimator
so
a
as
is
besides
to how various
there
unbiasedness,
genuine question
ought to be traded off against each other. However,
optimality
properties
the fact that important details remain unsettled
should not obscure
the
fact that Akaike's

in the task of
approach has made
significant headway
in hypothesis
the role of simplicity
evaluation.
explaining
to curve-fitting,
Akaike's
theorem
the illu?
applies directly
Although
is more general. The theorem explains why a unified
mination
it provides
to a disunified
theory is sometimes
preferable
theory; it also shows why
tomodels
that postulate fewer causes are sometimes preferable
that
more
Nor
and
Sober
should
the
surface
(Forster
1994).
postulates
appear?
ance of the curve-fitting
lead one to think that the Akaike
format
problem

models

applies to inductions over observational


that postulate
unobservable
mechanisms.
difference

in the Akaike

framework.

and not to abductions


regularities
a
This is a distinction
without

Akaike's

theorem

addresses
the gen?
the
of
selection", meaning
problem
evaluating
that contain adjustable parameters. The focus on the predic?
propositions
in no way limits the theories that can be considered
tive accuracy of models
to ones expressed
in some sort of "observation
language". The quantities
eral problem

of "model

in (LIN) and (PAR) may be as "theoretical"


as you please.
a
it remains true that each parameter
in family of curves that is
However,
to be treated within the Akaike
framework must be such that its maximum
value can be estimated
likelihood
from the data. A family of curves that
represented

is called "unidentifiable".
this requirement
So as to give the reader more of a feel for how the Akaike
let's now turn to two famous controversies
in the history
works,
violates

framework

of physics.
The Copernican
and Ptolemaic
fit
the
observations
then
available
systems
the
relative
of
bodies
about
positions
concerning
heavenly
equally well.
a
Ptolemaic
the
model
included
far larger number of adjustable
However,
these represent the "epicycles"
that made Ptolemaic
parameters;
astronomy
the very paradigm of an unparsimonious
theory. Although many philoso?
that the virtues of the
1957, p. 181) have claimed
phers (notably Kuhn
are
the Akaike
offers
framework
Copernican
hypothesis
purely aesthetic,
a much more down-to-earth
of
the
model
is
explanation
why
Copernican
its estimated

is much higher.
accuracy
predictive
this example with the controversy
that arose in connection
with Newton's
postulate of absolute space. Leibniz and many others took
preferable;
Contrast

176

ELLIOTTSOBER

in Newton's
element
this to be a defective
absolute
model;
space seems
of an unparsimonious
to be a perfect example
this
However,
postulate.
no
not
in the Akaike
is
framework. There
is
way to
analyzable
example
that represents
the value of a parameter
the velocity of a physical
estimate
framework applies
object relative to absolute space. It isn't that the Akaike
us
with
what
is
Newton's
the
framework
and tells
model;
wrong
simply
the Newtonian
is not identifiable.
model
does not apply at all; because
itmight seem that parsimony
considerations
over
should
be
the Ptolemaic
astronomy
preferred

At first glance,
Copernican
same sense

that parsimony
considerations
absolute
space is better than one

without
Akaike

strongly

approach

suggests

explain why
that includes

that this intuitive

explain why
system in the
a physics
that does

that postulate. The


involves an
judgment

equivocation.

3.

SOME PHILOSOPHICAL PRELIMINARIES

this
I'll be using the concept of "predictive
equivalence"
throughout
a
means.
not
it
about
what
This
is
I
should
say something
paper,
big job,
of observation
it requires a treatment of the concept
least because
(on
a few remarks are worth
which see Sober 1990a, 1993b, 1993c). However,
Since

here, incomplete
making
There
is first of all
equivalence
assumptions
Typically,
ries when

though they must be.


the familiar Duhemian

that the predictive


a
set of background
of two theories must be gauged against
the theories to make contact with observations.
that allows
point

but theo?
that make predictions,
by themselves
means
This
that the
with
auxiliary assumptions.
supplemented
a
not
should
be
concept of predictive
two-place equiv?
equivalence
it isn't theories

primary
alence relation,
of antecedently

but a three-place

relation

of equivalence

relative

to a set

assumptions.
accepted background
is the idea that theo?
important, and less often recognized,
do not have
with background
ries, even when supplemented
assumptions,
about
deductive
observations;
rather,
they assign probabilities
implications
Far more

outcomes.
This means
that we should
observational
possible
terms
in
theories entail
of what
not understand
predictive
equivalence
be understood
should
about observations.
Rather, predictive
equivalence
to different

in terms of identity of probability distributions. It is sometimes thought that


for statistical theories such as popu?
turn" is appropriate
this "probabilistic
theories
but not for deterministic
and quantum mechanics,
lation genetics
mechanics.
This is entirely wrong!
such as relativity theory and Newtonian
even deterministic
is subject to error, we must model
observation

When

PARSIMONYAND PREDICTIVEEQUIVALENCE
theories

as making

only

probabilistic

contact

with

observations

177
(Forster

1988).
in what follows will require the distinction
my discussion
Although
between what is observed
and what is not observed but only inferred, the
I'll deploy is not a rigid or absolute one. Whether
a
concept of observation
an
a conjecture
observation
under
report, or formulates
given statement is
state?
test, often depends on the problem at hand. In addition, observation
as I'll use the term, often employ
theoretical
and
their
ments,
concepts
confirmation
and disconfirmation
often depends on the use of instrumenta?
tion and background
theories. Roughly,
the idea is this: When
the question
is raised

about which

statement"

will

of two theories

describe

a detectable

is more
feature

an "observation
plausible,
of the environment
about

to make a reasonable
it is possible
judgment without
already hav?
as
an
to
formed
true.
sense that
which
is
It
is
in
this
ing
opinion
theory
are relatively
not
observations
so
(Sober 1990).
theory-neutral,
absolutely
about the "observa?
Despite widespread
among philosophers
skepticism
which

tion/theoretical
of observation

I hope these few words show that the concept


distinction",
I'll use is fairly innocuous.
It is a routine feature of scien?

tific testing to ask what types of information


constitute
the data; it also is
to distinguish what is observed
routine for scientists
from what is inferred
on the basis of observation.
Iwill be content if the reader allows that these
have not been
sense, even if philosophers
good scientific
a completely
of
this
is so.
adequate analysis
why
In light of my brief exposition
of Akaike's
theorem in the previous
a streamlined
to provide
it is possible
version
of the kind of
section,
I'll
construct
in
what
The
follows.
whole
argument
point of Akaike's

practices make
able to provide

theorem
when

is to estimate

then, should we expect


predictive
accuracy. What,
the Akaike
to two theories that are known at
framework

we

apply
to be predictively
There are two possibilities.
The
equivalent?
first is that their Akaike
estimates
of predictive
turn
out
to
be
accuracy
the same; the second
is that they do not. In the first instance, we find
that the estimation
reinforces what we already knew; in
procedure merely
the outset

the second,

we

find that the procedure


us with misleading
has provided
In short, we know in advance
that the
estimates, which we then discard.
Akaike
framework will never give us a justification
for choosing between
theories.
predictively
equivalent
The question may
then be raised of why one should work
through
the details of the philosophical
I'll present, since the "take home
examples
is already apparent at the outset. The main reason is that attention
message"
to these details illuminates both the
theories and the intuitions
philosophical
about simplicity
that have been cited in their defence. As we will see, some

178

ELLIOTTSOBER

of a simplicity
applications
the Akaike
without
others;
are all of a piece.

criterion

in philosophy
it is quite

framework,

are better grounded


than
to
think that they
easy

SHOEMAKER'S EXAMPLE OF TIME WITHOUT CHANGE

4.

any
change? That is, can time pass without
mere
the
other
than
of
time
itself?
If
the
external
changes occurring,
lapsing
are
of perceivers
world
is frozen for a while and if the thought processes
the passage of time during
likewise frozen, none of them will experience
there be

Can

time without

one might
In addition,
that a moment
of
imagine
leave no trace that later perceivers
will be able to
that universal
freezes are not the sorts of events for

that frozen moment.


universal

arrest will

This

detect.

suggests
evidence
observational

which

conclude

that it could

from this one might


to infer that universal
freezes

could be mustered.

never

be reasonable

And

occur.

(1969) has
Sydney Shoemaker
to show that this line of reasoning

a clever

invented

is mistaken.

He

that seems
example
us
to imagine a
asks

of three planets, each inhabited by intelligent beings.


composed
Shoemaker
stipulates that no planet will be able to detect that it is freezing
each planet can observe
the other two.7
this is happening. However,
while
we may imagine that the three planets use the same
For convenience,
i = 1,2,3
calendar (with years numbered
...) and that they start observing
each other in year 1. In year 3, planets Y and Z observe that X has frozen.
universe

the end of that year, Y


calendars have been frozen,

and Z

inform X that this has happened; X's


so the people on X must correct their calendars
to them. In year 4, X and Z observe
to take account of what has happened
that Y has frozen, so at the end of that year, Y must similarly be brought
that Z has frozen. The pattern
up to date. And in year 5, X and Y observe
in
the following
table. During a given year, there is
of freezes is displayed

At

a freeze
planet

on the planet or planets marked


what has happened:
1 2 3 4
X

5 6 7 8 9 10

see

periodicity;
pattern holds

in the data
freezes

12

every
through year 59.

14

F
F

so far

13

F
F

we

11

(1)

What

"F" and the unmarked

planets

or

observe

is that each

3 years,

planet has
every 4, and Z

15
F

...

59

. . .

...
. . .

its own
every

fixed

5. This

179

PARSIMONYAND PREDICTIVEEQUIVALENCE
the events

Although

in data set (1) are uncontroversial,


what
to
I'll
describ?
subject
interpretation.
begin by

recorded
ismore

happens subsequently
their
ing what the people on the planets observe. They note that when
a
there is subsequently
calendars
read "60", no one is frozen. However,
in play. This is how they record their
shift in the timing of the regularities
observations:
...

(2)

After

55

. . .

...

. . .

the hiatus

56

58

57

59

60

61

62

(3)

66

67

...

69

68

. . .

year 60, the familiar


with perfect regularity
there is again an interruption:
114

115

116

x
Y
Z

117

118

around

...

65

64

F F

continue

again. They
at which point

63

. .

3, 4, and 5 year cycles begin


until the calendar reads 119,

119

120

121

122

...

123

F
F

F
FF

. . .
. . .

the 3,4, and 5 year cycles begin again, but once again there is a hiatus
after a certain number of repetitions. We may imagine that the inhabitants
this pattern numerous
times. They have lots of data.
experience
we
If
take this data at face value, we will infer the following
general

Then

pattern:
(NUF)

For all observable

years o,

Planet X

freezes

in year o iff mod[(o

Planet Y

freezes

in year o iff mod[(o

Planet

Z freezes

in year o iff mod[(o

where

c is the
largest

integer

+
+
+

c)/3]
c)/4]
c)/5]

=
=
=

0,
0,
0,

such that 1+ 59c < o.

is the remainder of dividing n by m. (NUF) entails that


Here mod(n/m)
are
no
there
universal freezes in what I am calling "the observable
years".
These observable
that
years are the intervals marked off by the calendars
the people on the planets use, once they are corrected for the local freezes
that occur on one planet and which are observable
by the people on another.
a pattern
(NUF) perfectly fits the available data, it postulates
Although
for each planet that is somewhat
Each
freezes
with a
complicated.
planet
a
for
certain
a
number
of
then suffers
regular period
repetitions,
slightly
it resumes
longer sequence of years in which there is no freeze, after which

180

ELLIOTTSOBER

its cycle

of periodic

freezes.

asks us to entertain

Shoemaker

an alternative

there is a universal

freeze
Suppose
a
and
that
lasts
year 59
single year;
the same thing happens
after the observable
year
suppose
immediately
118. These years in which universal freezes alleged occur are "hidden", in
what

theory concerning
that occurs right after

the sense

is happening.
the observable

that no one

in the three planets could experience


them while they
traces
they be detected after the fact by observing
leave. The postulate of these hidden years gives
in which
the observable
calendar,
years are
"augmented"

are happening,
nor could
that the universal freezes
to a new,

rise

by hidden

supplemented
observable

12

...

58

59

augmented

12

...

58

59A61

60

61

...

117

118

62

"

118

119A121

...

120
122

120

is the set of augmented


years, we
a value o G O, we can construct
a G A for the same year as follows:

is the set of observable

can describe
the value

119

?60
If O

are related:

the two calendars

is how

years. Here

A as a function

years and A
of O. Given

a =

o for 1 < o < 59

a =

o+

a =

o + 2for

1 for 60 < o<

118

119 < o <

111

In other words,
=
o + c where
1+ 59c < o.

Note

that each observable

but the converse

year

is numbered

this explanation
of what
can state the universal
freeze

(UF)

the

largest

For all augmented

in the augmented

such

that

calendar,

years

the augmented
calendar amounts
terms
of it:
in
hypothesis

to, we

a,

Planet Y freezes

in year a iff mod(a/3)


in year a iff mod(a/4)

Z freezes

in year a iff mod(a/5)

Planet ?" freezes

Planet

integer

isn't true.

Given
now

is

0,
= 0,
= 0.

the
in data set (2), but views
recorded
(UF) agrees with the observations
calendar it uses as incomplete.
observable
(2), arriving
(UF) supplements
in the augmented
at the following
pattern for what supposedly
happens
calendar:

181

PARSIMONYAND PREDICTIVEEQUIVALENCE

...

ber

. . .
. . .
. . .

X
Y
Z

(2a)

56

55

57

58

59

60

61

62

64

Whereas

. . .

. . .
. . .

F
F

(UF) may seem


Although
that they are formulated

65

F
F

F
F

63

to be simpler than (NUF), it is well to remem?


in terms of two quite different
calendars.
with
in
the
observable
patterns
interruptions

(NUF) postulates
fixed peri?
calendar,
(UF) postulates
sequences
uninterrupted
containing
as a
In particular,
ods in the augmented
calendar.8
(UF) can be viewed
one
a
of
whose
is
other
is
and
the
(NUF)
conjunction,
postulate
conjuncts
about hidden years:
(H)

mod(o/59)
universal
Since

after

Immediately
=

there

observable

is a hidden

o
year
year in which

such

that
is a

there

freeze.

is equivalent

(UF)

0,

each

to (NUF) &

(H), we must

be careful

that our

assessment of the simplicity of (UF) also applies to (NUF) & (H).9 It is


of (UF); since (UF)
by the syntactic simplicity
important not to be misled
and (NUF) & (H) say the same thing, they must be equally plausible.
Shoemaker
to (NUF) on grounds
argues that (UF) should be preferred
of simplicity. We now need to assess what the Akaike
framework
says
about

these

two hypotheses.
are from which

hypotheses
families must

To do this, we must ask what


(UF) and (NUF) are obtained.

the families
Each

of

of these

values are estimated


parameters whose
adjustable
this point, however, we face a problem
that is quite
a
that contains no adjustable
general in the Akaike framework
hypothesis
is a member
of many different families. For example,
parameters
from

contain

the data. At

can be obtained
member

of

3 + 2x
from

the family

(LIN) and also from (PAR); indeed, it also is a


contains
(a + 1) + ax'\ which
"y
just one
The syntactic form of a hypothesis
does not tell us

adjustable parameter.
which family we should consider.
Two ideas should guide our choice, however.10
First,
about the families we should
inquiry tells us something

the context
associate

of

with

the families
should endorse
(NUF) and (UF). In the Shoemaker
problem,
or deny the existence
of universal freezes,
it being left to the data to decide
what specific patterns are asserted to obtain. The second piece of guidance
is that it must be possible
for the adjustable
to be estimated
parameters
from the data. Let us begin by considering
two families:
the following

182

ELLIOTTSOBER
For all observable

FAM(NUF)

years

o,

Planet X freezes in year o iffmod[(o + c)/x] = 0,


Planet Y freezes in year o iffmod[(o + c)/y] = 0,
Planet Z freezes in year o iffmod[(o + c)/z] = 0,
where

c is the largest

c[LCM(x,y,z)-

1]

integer
< o.

such that

after each observable


year o
=
0, there exists a
y, z)
1]}
a
there is universal
freeze.

& Immediately
FAM(NUF)
such that mod{o/[LCM(x,

FAM(UF)

hidden

year

in which

of the three numbers


y, z) is the least common multiple
LCM(x,
I've formulated
the two families
in terms of the observable

listed.1

calendar

for two reasons.

and parameters
First, that is how the data are described,
from the data. Second,
it is hard to see how (NUF) could
even be described
in the augmented
calendar.
(NUF) denies that there is
such a thing as the hidden years postulated
the
calendar; this
by
augmented

must

be estimated

cannot be described
hypothesis
in hidden years or as remaining

as denying
that there are universal
freezes
what
about
agnostic
happens during hidden

years.

The fundamental point is thatFAM(NUF) and FAM(UF) have precisely


con?
framework
of adjustable
The Akaike
parameters.
that they have the same degree of estimated predictive
accuracy.12
Other specifications
of the two families
from which
(UF) and (NUF)

the same number


cludes

are obtained

one might associate with


For example,
might be considered.
a
are
for each
in which
there
three adjustable
(NUF)
parameters
family
occur
we
in "the
must
for
that
each
estimate
the
parameters
planet;
planet,
a
a
b
is
hiatus
which
there
freezes
for
after
every
years
planet
repetitions
of c years, at which point the cycle recurs". There is nothing wrong with
is that a nine
this nine parameter family. The point Iwould make, however,
parameter family can now be associated with (UF).13
to be simpler than its
Shoemaker
judges the universal freeze hypothesis
turns
out
not
to
true
to
be
This
the present analysis.
competitor.
according
Shoemaker's
to account

I concede,
is an intuitive one. How,
then, are we
judgment,
in this case?14
for the fact that intuitions are misleading

In part, our intuitions rest on something


that is true: (UF) postulates
a pattern in the augmented
calendar
that is simpler than the pattern that
is a
in the observable
calendar. Constant
(NUF) postulates
periodicity
a
inter?
with
constant
than
of
pattern
periods
simpler pattern
two-phase
ruptions. The
this intuition;

(Chaitin 1975) underwrites


theory of algorithmic
simplicity
the abstract sequence of events
given a canonical
language,

PARSIMONYAND PREDICTIVEEQUIVALENCE

183

calendar can be specified by a shorter


by (UF) in the augmented
than
that
the pattern that (NUF) says
any algorithm
generates
algorithm
in
the observable
obtains
calendar.
All this is true, but entirely irrelevant as far as the Akaike
is
approach
concerned. What
is relevant is not the two abstract patterns, but the number

postulated

of parameters
in each family of hypotheses whose values must be estimated
This is why the calendar of observable
years is fun?
from the observations.
damental. The observable
calendar and the augmented
calendar have quite
different

is reflected in the
epistemological
standings, and this asymmetry
no
to
It
attention
isn't that Akaike pays
theoretical
the
analysis.
simplicity;
of a theory must be judged by seeing how the
point is that the simplicity
One of the beauties of Shoemak?
theory makes contact with observations.
er's example
is that it illustrates an important difference between "simplic?
and the simplicity captured
ity of abstract pattern" (algorithmic
simplicity)
framework. Besides
the fact that these two approaches
by the Akaike
yield
analyses of the problem at hand, there is an additional difference
that is quite fundamental: Whereas
the Akaike
approach
explains why
is epis
(as measured
simplicity
by the number of adjustable parameters)

different

has ever been developed


relevant, no comparable
temically
explanation
for the theory of algorithmic
simplicity.
To explain the intuition that the universal
freeze hypothesis
is simpler
than the hypothesis
that denies
that there are universal
I have
freezes,
focused on the abstract pattern of freezes that each hypothesis
forward
puts
calendar. However,
there is something
respect to its own proprietary
on
which
else
this intuition depends. One must ignore (or discount)
the fact
that the universal freeze hypothesis
whose
existence
postulates
something
the competing
the universal
freeze hypothesis
asserts
denies;
hypothesis
that there are such things as hidden years. Those
inclined to dismiss
the

with

Akaike
should

and insist on the authority


themselves
this: The universal

framework

of their intuitions

ask

freeze

in this case

hypothesis
postulates
less parsimonious
than the
ontology,
over
does
take
pattern
hypothesis. Why
precedence
ontology
when the overall simplicity
of the two theories is compared?15
Given
that Akaike's
theorem concerns
the estimate of predictive
accu?
no
it
is
that
the
theorem
to
should
fail
between
racy,
surprise
distinguish
two predictively
theories.
It is nonetheless
to see
instructive
equivalent
a simpler
alternative

abstract

pattern,

but a

why this is so in the case of Shoemaker's


problem. The hypotheses
(UF)
and (NUF) are each obtained from families containing
adjustable parame?
ters whose
values are estimated
from the data. The important point about
is that the same information
in the data is consult?
process
ed to construct each hypothesis.
It is for this reason that the number of
this estimation

184

ELLIOTTSOBER

in the two families must be the same; they are the


adjustable parameters
in the two families are
same in number because
the adjustable parameters
in fact identical.

5.

THE MIND-BODY

PROBLEM

with the existence


of regu?
is perfectly
dualism
compatible
Mind-body
take
event types. When
and physical
mental
larities connecting
people
want
to
take
when
off
their
headaches
their
they
usually disappear;
aspirin,
to do so. J. J. C. Smart (1959) and
their arms usually move
eyeglasses,

Brandt and Kim (1967) go further; they maintain that dualism is quite
consistent

the existence

with

and the physical.


If such perfect

of perfect

correlations

between

the mental

obtain, how are we to decide whether dual?


than the identity theory? To use an example
ism is more or less plausible
that was popular in the 1960's, why say that the property of being in pain
if we observe
is identical with the property of having one's c-fibresfire,
in a person at a time if and only
is instantiated
that one of these properties
in that person at the same time? Why not say,
if the other is instantiated
correlated properties?
instead, that these are distinct though perfectly
correlations

Smart, Brandt, and Kim point out that the ontology


by the
postulated
If
than that demanded
by dualism.
identity theory is more parsimonious
are
there
is
then
a mental property and a physical
identical,
just
property
there are two. This seems to provide
one property; but if they are distinct,
the identity theory over dualism.
a parsimony
argument for choosing
the identity theory and dualism from the point of view of
To evaluate
we must associate with each theory a family of
the Akaike framework,
are adjustable parameters whose
values can be
there
in
which
hypotheses
on what
and
Kim
focus
Since
estimated from observations.
Smart, Brandt,
and
of mental
the identity theory and dualism
say about the correlation

physical properties, we will begin with this idea, even though there ismore
than this.

to the theories
Let's
property
measure

M.

of their degree

Pr(P &M)
covariance

will

a physical
property P and a mental
such as these, the standard
characters
=
is their covariance;
of association
Cov(P, M)

two properties
dichotomous
With

consider

Pr(P) Pr(M). IfP occurs when and only when M does, their
be positive

and will

ismaximal when Pr(P) = Pr(-P)

take the value

Pr(P)

Pr(?P),

which

- 0.5 and declines as Pr(P) becomes

are "perfectly
their
that when properties
correlated",
as
a
other
On
a
the
of
function
but
varies
isn't
covariance
constant,
Pr(P).
of
is probabilistically
of one property
hand, if the occurrence
independent

more

extreme.

Note

185

PARSIMONYAND PREDICTIVEEQUIVALENCE
the occurrence

of the other,

of Pr(P).
It doesn't make much

the covariance

will

be zero,

of the

independent

value

sense

as a simple and unana


to view covariance
and
under study. Rather,
lyzable
physical properties
their degree of covariance
is the upshot of a complete
of val?
specification
ues for the four conjoint probabilities
Pr(P & M),
Pr(P & -M),
Pr(-P
& M), Pr(-P
Once these are specified,
& -M).
the degree of association
the two properties
between
is a consequence.
feature

of the mental

the probabilities
of these con?
are one and the same property,
joint events?
=
the identity theory asserts that Pr(P & -M)
& M) = 0, and that
Pr(-P
and Pr(-P
& -M)
take any pair of values
that sum
Pr(P & M)
may
to one. Thus, the identity theory endorses a model
that contains a single
What

does

the identity theory say about


Since itmaintains
that P and M

parameter:

adjustable
(Ident)

Pr(P & M)=

p and Pr(-P

& -M)

= 1p.

on the other hand, is compatible with the four


Dualism,
conjoint probabil?
ities having any values at all, as long as they sum to 1. The properties P
and M may be independent of each other; they also may show any degree
of positive
As a result, dualism deploys a model
association.
(or negative)
that contains

three adjustable

parameters:

(Dual) Pr(P & M) = pu Pr(P & -M)


Pr(-P
Let's

now

& M)

p3, and Pr(-P

= p2,
& -M)

= 1-

Pl

p2

p3.

an experiment
that assembles
empirical frequencies
likelihood estimates of the parameters
in (Ident) and
be obtained. This experiment will mimic
the structure of the

imagine
from which maximum
(Dual) may
simple inference
that was

discussed

the kettle's temperature


of this paper. Suppose
wear monitors
for some length

problem concerning
at the beginning

in a psychology
experiment
monitors
record when subjects
and
are
with
with

and pressure
the subjects
of time. The

say "ouch"; they also detect c-fibre firings


record when those events take place. I'll assume that both indicators
of "ouch" need not be perfectly
associated
subject to error. Utterances
and
detector
pains
neurological
readings need not always coincide
the occurrence
of c-fibre firings.

This

will yield data that describe how often subjects say


experiment
and how often the neurological
detector says "c-fibres are firing".
one
For example,
might obtain frequency data like the following:
"ouch"

186

ELLIOTTSOBER

"ouch"

says

Subject

no

yes

yes
meter

c-fibre

says

"firing"

no

error is possible,
then all four possible
at least sometimes,
if the data set is sufficiently

If observational
occur

observations

will

large. The identity


on
as erro?
must
all
observations
this
table's
anti-diagonal
theory
interpret
as the
all
observations
neous; dualism need not, though it will
interpret
the individual's
and
result of an interaction between
underlying
physical
mental

characteristics

on the one hand and various

causes

of observational

error on the other.


how

To describe
tions, we must
tional error. A

(Ident)

supplement
reasonably

and (Dual) make contact with these observa?


each of these theories with a model of observa?
of error will

general model

contain

four param?

eters:
(Error) Pr(Subject
Pr(Subject

does not say "Ouch"


I
No
Pain)

says "Ouch"

Pr(Meter

says "c-fibres

Pr(Meter

says "no c-fibres

firing"

IPain) = e\
= C2

I
No

firing"

c-fibres
Ic-fibres

e^
= e$
firing)
firing)

If we assume

that the way "ouches" indicate pain states is probabilistically


of
the way meter readings indicate c-fibre firings, then we can
independent
state the probability
of the conjoint observations
(?Ouch & ?Meter
says
"c-fibres
& ?c-fibres

on the conjoint
states (?Pain
underlying
firing"), conditional
are firing), as products
and their
of the above probabilities

complements.16
In the Akaike
advance

error probabilities
framework,
or are estimated
from
of the experiment

are either

the data,
in the models

in
specified
case
in which

under evalu?
parameters
they represent further adjustable
is
viable. The
the
former
In
the
ation.
present experiment,
only
option
2x2
table tell us the four frequencies
observations
given in the previous
sum to unity.
says "c-fibres firing"]. These must
we know
that
that there are three independent
frequencies
we
as
treat
If
in
unknowns
the parameters
the observations.
(Error)

of [?Ouch
This means
from

& ?Meter

187

PARSIMONYAND PREDICTIVEEQUIVALENCE

from the data, then neither


(Ident) nor (Dual) is
five
&
contains
(Ident
Error)
identifiable.
adjustable parameters, where?
as (Dual & Error) contains
seven. There are infinitely many assignments
that must

be estimated

of values

to these parameters

that will maximize

the probabilities

of the

observations.
If this were

of error in the
say about the occurrence
then the Akaike
framework would not underwrite
the parsi?
experiment,
mony argument in favor of the identity theory that Smart, Brandt, and Kim
put forward.

all one

However,
evidence

could

to imagine that we might possess


us to assign values to the parameters
in
in other settings how often pains are
observed

it is not absurd
that allows

independent
(Error). Perhaps we have
associated with utterances

of "ouch"; it is similarly quite possible


that we
should have neurological
evidence concerning how sensitive the meter is to
c-fibre firings. If so, we can assign values to the parameters
in (Error), and
so this model of error will contain no adjustable
parameters. The observed

conjoint frequencies
find the maximum

of "ouches"
likelihood

readings now
of the parameters

and meter

estimates

can be used
in (Ident)

to
and

(Dual).
Because

than (Ident), the former mod?


(Dual) contains more parameters
fit the data at least as well as the latter. (Dual) will almost always do
for some specific arrays of empirical frequencies,
the two
better; however,
models will tie. (Dual) tends to beat (Ident) when
it comes to goodness
el will

of-fit, just as (PAR) tends to beat (LIN).


In any event, the greater goodness-of-fit
of (Dual) is just one of the
factors that is relevant in the Akaike
framework. The other is simplicity as
measured
It is here that (Ident) has
by the number of adjustable parameters.
an advantage
over (Dual), since the former has one adjustable parameter
the latter has three. If the data are such that the two models
fit the
data about equally well, we should prefer (Ident) over (Dual) on grounds
of simplicity,
just as Smart, Brandt, and Kim maintain.
This would be the end of the story if the identity theory and dualism did
nothing more than make claims about how mental and physical properties
co-occur. But there is more
to both theories
than this. I next want
to

while

consider

what

and causation
Dualists
mental

the identity
of behavior.

theory and dualism

say about

the explanation

not just that some organisms


have irreducibly
that those properties
explain and cause behavior

often maintain

but
properties,
in a way that the purely physical
of organisms
cannot. Even
properties
if the physical
characteristics
of an organism were fully specified,
this
account of why the organism behaves as it
would not provide a complete
does. Parallel
remarks pertain to the issue of prediction;
for the dualist,

188

ELLIOTTSOBER

the physical
traits of an organism provide, at best, an incomplete
basis on
to predict what the organism will do. To be sure, it is possible
to
which
a
formulate
dualism
in such
have no causal
way that mental
properties
or
own.
a good part of
But
of
their
power
explanatory
efficacy
historically,
in dualism has centered on its claim concerning
the irreducible
the
of
and explanatory
mental.17
importance
If we understand
in this way, what are we to make
of the
dualism
a
are
a
If
mental
and
then
identical,
property
identity theory?
physical
the causal efficacy
of the one just is the causal efficacy
of the other. I
the interest

causal

see nothing
are
to fault in the idea that names for the same property
is perhaps
in causal contexts, salva veritate. Explanation
intersubstitutable
a less straightforward
has to do with gains in
since explanation
matter,
In any case, I'll
and so may involve subjective
elements.18
understanding,
set to one side the question
of what the identity theory should say about
on what it says about causation.
concentrate
and
explanation,
I want to
the identity theory and dualism directly,
addressing
a
framework
make
explains why paucity of pos?
general point: the Akaike
a theory's estimated
tulated causes enhances
accuracy
(Forster
predictive
Before

1994). To see why, let's consider a simple


is a dichotomous
effect and P and M are two putative

and Sober
The

example.
Suppose B
causes.
dichotomous

says that P is the only cause that makes


occurs. The second model
says that both P
relevant:

first model

whether
causally

(One Cause)

(Two Causes)

a difference
and M

may

in
be

Pr(B

P&M)

Pr(B

P&-M)

Pr(B
Pr(B

-P&M)
= c
-P)

Pr(B

P & M)

Pr(B

P&-M)

= b

Pr(B

-P

&M)

= c

Pr(B

-P

&-M)

=
=

= a

Pr(B

P)

Pr(B

-P&-M)

= a

= d

that (One Cause) has two adjustable parameters


(Two
(a, c) while
Causes) has four (a, b, c, d).
run an
one might
of models,
these two families
To choose between
are
a
in
each
of
individuals
number
of
in
which
placed
large
experiment
in
be
the
could
four "treatment cells". The results of
displayed
experiment

Notice

a 2-by-2
table in which cell entries indicate
each treatment who exhibit the effect B:

the fraction

of individuals

in

189

PARSIMONYAND PREDICTIVEEQUIVALENCE

to estimate the
If we use the empirical frequencies
(w, x,y,z)
the two models,
itmust emerge that the likeliest member
of
will fit the data at least as well as the likeliest member
of
The two models will show the same degree of goodness-of-fit
w ? y and x = z; if the four frequencies
differ even slightly,
will fit the data better.

parameters

in

(Two Causes)
(One Cause).
only when
(Two Causes)

The Akaike
framework
tells us how to evaluate
these two families
of models
of
considerations
i.e., how to bring together the conflicting
w
x
are
as
are
If
and
and
then
and
z,
close,
y
simplicity
goodness-of-fit.
the slightly better goodness-of-fit
of (Two Causes) will not compensate
In this case, we choose
is less parsimonious.
the simpler model because
it has a higher estimated predictive
accuracy.
if w, x, y and z are all very different,
then we should sacrifice
However,
and prefer the more complex model.19
parsimony
for the fact that this model

How does this comparison


of (One Cause) and (Two Causes)
apply to
the identity theory and dualism? Half this problem is straightforward;
dual?
ism's claim about the possible efficacies
of mental and physical properties
is captured by (Two Causes). Representing
the Identity Theory, however,
a bit. The identity theory entails
(One Cause)
requires that we fine-tune
are both not defined, because
P
-P
that Pr(P
&
and
&
M)
Pr(P
M)
|
|
=
the hypothesis
that
& M) = 0. This means
&
that
Pr(P
says
-M)
Pr(-P
the identity theory's claim about the causal efficacies
of P and M
is not
perspicuously

represented

by (One Cause),

but by the following

family

= Pr(P P) = Pr(P M) = a,
|P & M)
|
|
=
=
-P
&
Pr(P |
-M)
Pr(P |-P)
Pr(P |-M) = d,
and Pr(P |P & -M) and Pr(P | -P & M) are both not

(Ident-2) Pr(P

defined.
Note

that (Ident-2)

has two adjustable

parameters

whereas

(Two Causes)

has four.
I so far have provided
two treatments
The first has them postulating
different
which

mental

endorsing

and physical
models

different

of dualism

and the identity theory.


of the probabilities
with
tend to occur; the second has them
models

properties
of how mental

and physical

properties

confer

190

ELLIOTTSOBER

on behavior. These separate treatments may be combined


into
probabilities
a single formulation
in which each theory describes
both the probabilities
of causes

and

three adjustable

their efficacies.
parameters

a model with
identity theory endorses
two representing
the efficacies
of a single

The
-

mental/physical cause [Pr(? \P & M), Pr(B

\-P & -M)],

and one

on the
the probability
of that cause [Pr(P & M)].
Dualism,
representing
other hand, puts forward a model with seven independent
parameters
four for the efficacies
of different combinations
of causes
[Pr(i? \?P &
? M)]
When
favors

occur.
and three for the probabilities
with which these combinations
these models
fit the data about equally well,
the Akaike
framework

with the identity theory.20


to the arguments
advanced
by Smart (1959) and
Brandt and Kim (1967). They agree that parsimony
is the main advantage
can
over
that the identity theory
claim
dualism, but they disagree about the
Let

the model
us now

associated

return

Smart (pp. 155-6)


that this consideration
should be assigned.
significance
the
and
dualism
that
choice
between
the
resembles
says
identity theory
an
the choice between
with
its
of
ancient
theory,
evolutionary
postulate
and the
successive
layers of fossils are gradually deposited,
... with sediment
in
in
4004
BC
"that
the
universe
hypothesis
just began
the rivers, eroded cliffs, fossils in the rocks, and so on". Brandt and Kim (p.
533) agree that if the identity theory and dualism were related in this way,
a rational person must accept"
then "there would be no question whether
earth in which

the identity theory. But they deny that this is so. Brandt and Kim suggest
in the same sense that
that the identity theory doesn't explain observations
theories do. They conclude
(pp.
evolutionary
theory and other scientific
the identity theory and the theory of
that Smart's analogy between
533-^)
"is pernicious
in that it lends, or at least tends to lend, a false
evolution
a philosophical
to what
is essentially
respectability
the mental
correlations
between
of
interpretation"
perfect

air of scientific

and

speculative
the physical.

and

is that he denies Reichen?


interpretation of Smart's argument
can
a
reason
be
for assigning differ?
thinks
that
he
thesis;
parsimony
are
even
ent truth values
when theories
equivalent.
Interpreting
predictively
as
If they regard parsimony
Brandt and Kim is a little less straightforward.
A natural

bach's

force in the mind/body


though of an attenuated
problem,
having probative
are
in
face
of
Reichenbach's
the
thesis, though
sort, then they also
flying
a
the
other hand,
On
Smart
than
bit
less
bravado
with
displays.
perhaps
and "philosoph?
about the "speculative",
"metaphysical",
an endorsement
as marking
of this problem are understood
of
the
then their point about
of Reichenbach's
greater parsimony
position,
if their remarks

ical" character

191

PARSIMONYAND PREDICTIVEEQUIVALENCE
the identity theory provides no reason
is false.
theory is true and dualism
It should

at all for thinking

that the identity

be clear

that the analysis of the dispute between


the identity
theory and dualism that I have offered goes contrary to what Smart, Brandt,
assumes
and Kim claim; their disagreement
that the identity theory and
about all possible
the same predictions
observations.
They
assess
a
reason
to
then try
how strong
for choosing
parsimony
provides
one theory over the other. In contrast, the Akaike
framework
tells us that if

dualism

make

are error free, simplicity plays no role in estimating


a theory's
error
if
I
have
that
observational
is
accuracy.
predictive
argued
possible,
the identity theory and dualism are not predictively
and that
equivalent,
the identity theory's simplicity
is a feature in its favor.
observations

6.
I have

defended

quite

different

CONCLUSION
assessments

of

the two main

examples
of time without
in this paper. Shoemaker's
discussion
change
and the Smart, Brandt and Kim discussion
of the mind/body
problem both
are
a
to
of
that
theories
be
present
supposed
predictively
pair
equivalent;
discussed

is cited as a reason for preferring one theory over


parsimony
the other. I have not contested
Shoemaker's
claim that his two theories are
in both cases,

but I have argued that parsimony,


understood
in
predictively
equivalent,
no
terms of the Akaike
reason
one
to
framework,
provides
prefer
theory
over the other. In contrast, I have tried to show how the parsimony
argument
for the identity theory can be justified within
the Akaike
framework, but I
argued that the two theories are not predictively
equivalent.
I have advanced
two more gener?
Besides
these examples,
discussing
one
al epistemological
more
of these is
conclusions;
however,
compelling
than the other. The first is that the rationale for using simplicity
in the curve

have

fitting problem
to discriminate

does

not carry over

between

as a justification
for using simplicity
theories.
The second con?
equivalent

predictively
is that a difference
in simplicity
between predictively
equivalent
theories counts as a merely
aesthetic or pragmatic consideration;
it is not a
one
for
that
true
is
and
the
other
is
I
false.
ground
thinking
suppose
theory
it is possible
to accept the first of these conclusions
but not the second. I see
no reason to do this, but those inclined to extract this lesson should do so
clusion

with

their eyes

about the scientific value of


pronouncements
and
unification,
simplicity, parsimony,
power are insufficient.
explanatory
The idea that scientific questions
and philosophical
are differ?
questions
ent in kind has taken a real
over
or
so.
the
last
Quine's
beating
forty years
whole philosophy
was devoted
to undermining
the distinction.
Our web of
open. Vague

192

ELLIOTTSOBER

is to be judged holistically
economy
by the twin criteria of conceptual
and empirical adequacy
(Quine 1953); to think that scientific propositions
answer to one set of standards while philosophical
answer to
propositions

belief

is to be trapped by an untenable
dualism. Kuhn's
(1970) work in
a similar picture. For Kuhn,
the history of science has defended
science
the
is unavoidably
saturated with philosophical
perhaps
presuppositions;
in Kuhn's
view of science has been to downplay
the
impulse
principal
another

idea that theory change is or can be driven by unbiased observation.


For
as
that
noted
Kuhn
claimed
the
of
choice
earlier,
Copernican
example,
over Ptolemaic
and nothing
astronomy was based on aesthetic preference
have a taste for
just as philosophers
prefer fewer epicycles
thus
The
of
appears to be univocal;
landscapes.
parsimony
principle
it seems to play a fundamental
role in both science and philosophy.

else.

Scientists

desert

This

of the boundary
the
crisp contrasts
largely replaced
but for reasons that bear rethinking.
blurring

between

science

and philosophy

has

and positivism,
posited by empiricism
In "Empiricism,
and Ontol?
Semantics,
In Experience
internal and external questions.

ogy", Carnap distinguished


two types of simplicity. Carnap
Reichenbach
and Prediction,
distinguished
in order to defend a thesis that
drew these distinctions
and Reichenbach
that there is no way to tell which of two
is essentially
epistemological
are
true
in both
if the theories
theories is
equivalent. However,
predictively
cases, this epistemological
point was anchored to faulty ideas in the philos?
can
of
ophy
language. Carnap thought that internal and external questions
that predictively
be separated by a syntactic
test; Reichenbach,
equivalent
that the
It is a point of the first importance
theories are in fact synonymous.
can
from the accompanying
be detached
claim
linguistic
epistemological
formulations.
The Akaike
often

viewed

the Akaike

resurrects some epistemological


ideas that are
that is best forgotten. To use
of a positivism
to distinguish
in a model
it is essential
parameters
that cannot. In
and parameters
from observations

framework
as elements

framework,
that can be estimated

that are predictively


goal is to discover models
are true. As noted
models
that
of
accurate; this differs from the goal
finding
accurate than
for (LIN) to be more predictively
it is quite possible
before,
as
that
cannot
be
evidence
this
(LIN) is true
(PAR). However,
interpreted
addition,

the fundamental

and (PAR) is false; after all, (LIN) entails (PAR).Although (PAR)must


of being true than (LIN) does, no matter what the
have a higher probability
have a higher degree of predictive
observations
say, (LIN) can nonetheless
sense of the pervasive
scientific
make
framework
Akaike's
accuracy.
helps
are
these
models
that
models
useful;
empirically
goal of finding simplified

193

PARSIMONYAND PREDICTIVEEQUIVALENCE
are known

to be false,

but that does not bar them from making

reasonably

accurate

predictions.
these elements
Although

in the Akaike
approach do not accord well
said in favor of scientific
realism, the approach
is equally at odds with some forms of empiricism.
There is no distinction
between
theories that are strictly about observables
and theories that are not.

with much

that has been

comes from observation,


For Akaike,
evidence
but the content expressed
concern
are
matters
not
theories
that
may
by
directly observable.
should have an
Carnap and Reichenbach
thought that their epistemology
impact on the conduct
in pseudo-problems;

of philosophy.

Philosophy,
they believed,
allow us
epistemological
critique would
are.
to see these problems
for what they
The subject of the present paper
more modest.
has been more
is accordingly
limited, and my conclusion
and
when
in terms of the
understood
considerations,
Parsimony
simplicity

important
was mired

Akaike

framework,

do not license

the parsimony
constructed.

and simplicity arguments


It isn't that philosophical

that philosophers
have sometimes
are always misguided;
as we have seen, the identity
of parsimony
can
over
a
be
in
defended
dualism
way that fits fairly well into the
theory
are more alien
Akaike
framework. However,
other appeals to parsimony
uses

to the Akaike

outlook. Perhaps philosophy


and science
than a shared vocabulary
sometimes
suggests.

are more

dissimilar

NOTES
*

I am grateful
to Martin
Walsh
for comments

Denis

Barrett,
Eells,
Ellery
on earlier drafts. My

Malcolm
thanks

also

and
Forster,
Gregory
Mougin,
to members
of the philosophy

departments at London School of Economics andWayne State University for useful dis?
cussion.
1
Examples

would

include

the constructive

empiricism

of Van

Fraassen

(1980),

the con

trastive empiricism of Sober (1990a, 1993a, 1993b), and the gentle empiricism of Earman
(1993).
2
I argue this point in connection with maximum likelihood estimation in Sober (1988a,
1988b).
3
In Sober (1990b), I discuss examples in which differences in simplicity or
parsimony
reflect differences in likelihood or differences in prior probabilities. From a Bayesian point
of view, these considerations must exhaust the relevance of simplicity. The approach to
the curve-fitting problem I'll outline in the next section fits neatly into neither of these two
formats.
Bayesian
4
We assume
here

that the old

and new

data

sets contain

the same

number

of observations,

so that what is defined above is predictive accuracy with respect to data sets of size n.
Without this restriction, the definition should be given in terms of predictive accuracy per
datum.

5
Akaike's theorem also reflects the fact that the amount of data is relevant to
deciding how
much weight simplicity deserves. If there is a very slight parabolic bend in the data, itmay

194

ELLIOTTSOBER

sense
to favor (LIN) when
if the data set
the data set is relatively
small; however,
a curve's
Since
is quite
SOS almost
goes up as
large, (PAR) may be preferable.
inevitably
this means
is increased,
the number
of data points
that the Akaike
estimate
of predictive
more
and less and
is determined
and more
considerations,
by goodness-of-fit
inaccuracy
as the data set increases
in size.
less by simplicity,
6
n ounces. A
To understand
consideran
that weighs
the idea of an unbiased
estimate,
object
an
of
if
estimate
the
balance
unbiased
repeatedly
spring
object's weight
weighing
provides

make

of n ounces;

the object would


reading
yield an average
course deviate
from that average
value.
7
does not explore
the question
Shoemaker
events

he goes

of what

these

Whether

events

the physical
are consistent

are

laws
with

how we would
epistemically
8
Scientists

about

theorize

them.

not whether
possible,
to express
often prefer

that permit

the

is an
physics
at hand. In effect,

current

the calendar
might

prefer
For

(UF')

all augmented

If planet

their

and
is

freeze

possible.

freezes

The

fact

years

a,

in year

generalizations
that X\
first

in a format

that doesn't

a, then

it freezes

freeze

in year

a +

3,
4,
5.

If planet

freezes

in year

a, then

it freezes

in year

a+

If planet

freezes

in year

a, then

it freezes

in year

a+

in year

o, then

it freezes

every

in which

in year
case X

o+

3 years,

in year
case Y

o+

4, unless

next

freezes

For

(NUF')

it is physically

a universal

is whether

question

depend
to occur
in a year
happened
the number
the point. We
3 is besides
of observed
years
assigns
to consider
of the two hypotheses:
free" representations
"coordinate

on a unit of measure.

essentially
to which

Shoemaker's

of

may

to think about the problem


but it is not the only way
question,
a set of observations
to one side and asks us to consider
sets current physics

interesting
Shoemaker

therefore

on to describe.

measurements

individual

all observed

years

If planet X freezes
in a series of freezes
If planet Y freezes
in a series of freezes
If planet Z freezes
in a series of freezes

o,

in year

o, then

it freezes

every

4 years,

in which

in year

o, then

it freezes

every

5 years,

in which

3, unless

next

freezes

o+

in year
5, unless
case Z next freezes

o is the

19th
o+

in year
o is the
in year

in year

14th

o+

o is the

5.

7.

11th

o+

9.

if these for?
not be affected
The conclusion
I'll reach concerning
(UF) and (NUF) would
were used instead.
mulations
9
status. This
is an
cannot
in their epistemic
differ
I assume
that equivalent
hypotheses
basic
is
the
in
which
common
theories
to
all
confirmation
concept.
probability
assumption
10
in Forster
and Sober
"error theorem"
discussed
I omit here discussion
of the important
is often more
families
of lower dimensional
the predictive
accuracy
(1994).
Estimating
of families with higher dimensionality.
to error than estimating
the accuracy
subject
11
to assume
that the periodicities
in advance
In setting up this problem,
there is no reason
decimal
have up to, say, a hundred
that they may
must
So let's allow
be integer valued.
places.
12
In describing

to say anything
observa?
about whether
I neglected
Shoemaker's
example,
that inhabitants
this idea; just assume
to error. It would
are subject
be easy to introduce
on others. However,
the conclu?
what happens
in observing
of one planet can make mistakes
makes
this detail superfluous.
have the same number of parameters
sion that the two families
tions

Regardless
estimate

of whether
of predictive

there
accuracy.

is error

or not,

the two

families

must

have

the same Akaike

195

PARSIMONYAND PREDICTIVEEQUIVALENCE
13
Of

a
to associate
it is easy enough
that has more
(UF) with
adjustable
family
I doubt
the one associated
with
is that there is any reason
to
(NUF). What
a family
that has fewer
than the one we should
associate
(UF) with
parameters

course,

than

parameters
associate

with (NUF).
14
One possibility

is that our intuitions concerning (UF) and (NUF) are influenced by

a quite different
inference
Let us forget about the epistemological
problem.
of local and universal
freezes
and imagine
that inhabitants
of the planets
"F" (e.g., the occurrence
have property
of snow storms) periodically
planets

about

thinking

consequences
that various

1-59. Given

years

the data

in (1), how

presented

should

these

people

choose

between

see
in
the

following hypotheses?
(HI)

For all years i,


Planet X has F in year i iffmod(?/3) = 0,
Planet Y has F in year i iffmod(z/4) = 0,
Planet Z has F in year i iffmod(z/5) = 0.

(H2)

For all years i,


Planet X has F in year i iffmod(?/3) = 0 andmod(?/60) ^ 0,
Planet Y has F in year i iffmod(?/4) = 0 andmod(?/60) ^ 0,
Planet Z has F in year i iffmod(z/5) = 0 andmod(z/60) ^ 0,

Notice

that neither

sumably,
so,

(HI)

observed

of these hypotheses
mentions
the calendar
of augmented
years. Pre?
the "years"
is recorded
in the observed
calendar.
If
they talk about concern what
and (H2) are not predictively
about what will happen
in
equivalent;
they disagree
are about to make
both hypotheses
false pre?
year 60. And as data set (2) shows,

dictions. (HI) falsely predicts a universal F in the observed year 60; (H2) falsely predicts
that the three planets will
have
not confuse
Shoemaker's

should

in the observed
problem

with

years

the problem

63,

64,

and 65,

of choosing

We
respectively.
between
(HI) and

(H2).
to Denis Walsh
151 am grateful
16
It is a substantive
assumption

to this question.
are probabilistically
inde?
readings
on
state
conditional
the
of
and
c-fibre
of error
pendent,
underlying
pains
firings. A model
in which
is possible,
this is not assumed
and it will
contain more parameters
than (Error)
does. Adopting
this more
model
would
not affect
the points
I'll argue for in what
complex
follows.
17
Although
dualism,

for drawing
that ouches

my

this view

about causal
captures
efficacy
not correspond
to the type of property
Kim holds
that the mental
and physical

it does

Kim

attention

and meter

(1984).
is a cause
that each

of behavior,

but

that mental

something
dualism

of

the spirit of Cartesian


for example,
by
of a person
are distinct,

defended,

properties
an irreducible
do not make
properties
no difference
makes
to the occurrence
of

M
contribution.
is a cause of B, even though M
P that the individual
B, once one holds fixed the physical
properties
possesses.
18
In this connection,
it is worth
Enc's
idea that theoretical
(1986)
considering

identity
an explanatory
made
of H2O
asymmetry.
Being
explains
why
is made
of water,
but not conversely.
one's
c-fibers
fire explains
something
Having
why
one is in pain, but not conversely.
so on.
And
19
This decision
not just on the spread among
will depend,
the four empirical
frequencies,
but on the amount
of data, as pointed
out in footnote
5.
20
The same style of argument
can be developed
to permit dualism
to be compared
with an
statements

anti-reductive

often

involve

physicalism

in which

the mental

supervenes

on

the physical

(Fodor

1975;

196

ELLIOTTSOBER

Kim

For

1984).

example,

that a mental

suppose

property

is "multiply

realizable"

by

different physical properties Pi, ft,...,


Pn; in the biological species Si, organisms have
M by having physical property Pi, in species S2, organisms have M by having Pi, and
so on. In this case, a pair of "local" models may be specified for each biological species
- one of them
dualistic, the other physicalistic. Global anti-reductive physicalism is thus
represented as a set of local reductive identity theories.

REFERENCES
Akaike, H.: 1973. 'InformationTheory and an Extension of theMaximum Likelihood Prin?
in B. Petrov

ciple',

and F. Csaki

Second

(eds.),

International

on Information

Symposium

Theory, Akademiai Kiado, Budapest, pp. 267-281.


Brandt, R. and J.Kim: 1967, 'TheLogic of the Identity Theory', Journal of Philosophy 64,
515-537.
1950,

Carnap,R.:

and Ontology',

Semantics,

'Empiricism,

phie 4,20-40. Reprinted inMeaning and Necessity


go, 1956).
G.:

Chaitin,

west
Enc,

'Randomness

1975,

J.: 1993,

Earman,

in Philosophy,

Studies
B.:

and Mathematical

'Underdetermination,

of Notre

University
without

'Essentialism

1986,

Realism,

Individual

Revue

Internationale

de Philoso?

(University of Chicago Press, Chica?

American
Proof,
Scientific
and Reason',
in H. Wettstein
Dame

Notre

Press,

Essences:

Dame,

Causation,

232,47-52.
(ed.), Mid?
pp.

Kinds,

19-38.
Superve

nience, andRestricted Identities', inP. French et al. (eds.),Midwest Studies inPhilosophy


11, 403-427.

Fisher, R.: 1925, Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh.
Fodor,

J.: 1975,

Forster,

M.:

The Language

1988,

Studies

Mechanics',

of Thought,

Thomas

Crowell,

New

and the Composition


Explanation,
and Philosophy
in the History
of Science,

York.
of Causes

'Unification,

in Newtonian

55-101.

Forster,M. and Sober, E.: 1994. 'How toTell When Simpler, More Unified, or Less ad hoc
Theories Will Provide More Accurate Predictions', British Journal for thePhilosophy of
Science

1-36.

45,

I.: 1965.

Hacking,
J.: 1984.
Kim,

'Salmon's

32, 269-271.
Philosophy
of Science
and Supervenient
In P. French,
Causation'.
T. Uehling

Vindication',

'Epiphenomenal

and

H. Wettstein (eds.),Midwest Studies inPhilosophy, vol. 9, University ofMinnesota Press,


pp. 257-270.
Minneapolis,
T.: 1957, The Copernican

Kuhn,

Revolution,

Harvard

University

Press,

Kuhn, T.: 1970, The Structure of Scientific Revolutions, University

Cambridge,

Mass.

of Chicago Press,

Chicago.
H.:

Putnam,

1975,

'Mathematics,

Matter,

and Method',

Philosophical

volume

Papers,

I,

Cambridge University Press, Cambridge.


Quine, W.: 1953, 'TwoDogmas of Empiricism', inFrom a Logical Point of View, Harvard
University
Reichenbach,
Sakamoto,
Kluwer

Mass.,
pp. 20-46.
Cambridge,
and Prediction,
of Chicago
1938, Experience
Press,
University
and G. Kitagawa:
Criterion
Y., M. Ishiguro,
1986, Akaike
Information
Dordrecht.
Publishers,
Press,
H.:

Chicago.
Statistics,

Shoemaker, S.: 1969, 'TimeWithout Change', Journal of Philosophy 66,363-381. Reprint?


ed in Identity, Cause, andMind Cambridge University Press, Cambridge, 1989.
Smart,

J.: 1959,

'Sensations

and Brain

Processes',

Philosophical

Review

68,

141-156.

PARSIMONYAND PREDICTIVEEQUIVALENCE 197


Sober,

E.:

Sober,

E.:

Sober,

E.:

1988a,

'Likelihood

and Convergence',
55, 228-237.
of Science
Philosophy
the Past: Parsimony,
and Inference, MIT
Evolution,

1988b, Reconstructing
Mass.
Cambridge,
1990a,

'Contrastive

of Minnesota

versity
Biological
E.:

Sober,

Point
1990b,

inW.

Empiricism',

Press,

Minneapolis,

pp.

of View, Cambridge
University
'Let's Razor Ockham's
Razor',

Reprinted

Press,

Cambridge,
in D. Knowles

Its Limits,
Sober,

Uni?
Theories,
Scientific
a
in E. Sober, From

(ed.),

Savage
392-412.

Press, Cambridge,
University
Cambridge
From a Biological
Point of View, Cambridge

Press,

Mass.,

pp.

University

1994.

Mass.,
(ed.),

Explanation

and

Reprinted

in E.

73-94.

Press,

Cambridge,

Mass.,

1994.
Sober,

E.:

for Empiricists',
'Epistemology
of
Notre
Dame
Press,
University

Philosophy,
Sober, E.: 1993b,
Sober,
Van

E.:

Fraassen,

1993c,
B.:

'Mathematics

and

Philosophy

Scientific

February
September

Department of Philosophy
of Wisconsin

5185 HelenC. White Hall


600 North Park Street
Madison, WI 53706-1475
U.S.A.

Notre

Indispensability',
Westview

of Biology,

1980, The

submitted
Manuscript
Final version
received

University

in H. Wettstein

1993a,

Madison

13,
26,

Image,
1995
1995

Oxford

(ed.), Midwest

pp. 39-61.
Review
Philosophical

Studies

Dame,

Press,

Boulder,

University

Colorado.
Press,

Oxford.

102,

35-58.

in

También podría gustarte