Documentos de Académico
Documentos de Profesional
Documentos de Cultura
AUDITORY PERCEPTION
NEAL F. VIEMEISTER
This paper surveys experimental and theoretical work on the psychology of hearing, particularly
those aspects that are, or may be, relevant to audio reproduction. The general areas considered
include: (1) auditorY sensitivity and dynamic range; (2) temporal aspects of hearing; (3) frequency
and pitch perception; (4) intensity and loudness perception. Current work and directions will be
discussed and particular attention will be devoted to the neural and mechanical correlates of these
psychological phenomenon.
INTRODUCTION
Psycboacoustics,
broadly defined, is the study of the psychology of hearing. It is concerned with how organisms respond behaviorally to sound. This includes research on basic auditory capabilities, such as the detection and
discrimination of pure tones, to more "psychological" research on how sounds are recognized and interpreted. Although the term was coined only recently, psychoacoustics
has a very long history and now has many manifestations
one finds research in psychoacoustics being conducted
in psychology, physics, engineering, audiology, and physiology. There are subspecialties of clinical psychoacoustics,
animal psychoacoustics, musical psychoacoustics, speech
psychoacousties, and, of course, the psychoacoustics of audio reproduction,
Before surveying the basic concepts in this broad area, I
would like to make some comments about how we study
hearing, specifically about our choice of stimuli and about
the methods we use. In psychoacoustics we often use "unnatural", simple stimuli such as pure tones and noise. Indeed, much of the data I will be presenting were obtained
using such stimuli. We have been criticized, particularly by
some psychologists, for concentrating on these stimuli,
stimuli that seem to have no relationship to real-life sounds,
There is some justification to this criticism -- some of us
have become so preoccupied with the psychoacoustics of
simple stimuli that we lose track of the general goal of trying to understand auditory perception. But the criticism
misses a crucial point, namely that we use simple stimuli
as tools or probes to tell us how the auditory system
works, to give us information about the basic mechanisms
of hearing. Few of us are interested in the perception of
pure tones, but it is clear that by using pure tones we have
learned a lot about how the system works and about the proAES8th INTERNATIONAL
CONFERENCE
cessingunderlyingtheperceptionof complex,real-]ifcsounds.
Thesecondcommentisaboutthe methodsweuseto"mcasure" heating, the socalled psychophysicalmethods. It is useful to divide these psychophysical methods into two general
categories: objective methods and subjective methods. The
distinguishingfeature of an objective method is that the subject's response can be classified as being eitherright or wrong.
For example,we can define a certain temporalinterval for the
subject, using a light,perhaps, and duringthis interval,the socalled observation interval,we present a signal or we do not.
The subject is to respond either "yes" or "no" to indicate
whether he or she thought a signal was presented. The response can be scoredas being correct or incorrectbecause we
know whether or not the signal was presented. In contrast,
with a subjectivetechnique,there is no corrector incorrectresponse, there is only a response.In the "method of magnitude
estimation", for example,the subject is to report a number to
indicate the loudness of a sound. We are tapping a subjective
attributeand there is no corrector incorrectresponse. The distinction between objective and subjective psychophysical
methods is important because there appears to be a considerable difference in the intrinsic validity of data obtained with
these methods.Specifically,the validity of data obtained using
subjective methods generally is far more questionable than
that from objective techniques.There has been continuingdebate about whetherthe numbers that the subjectreports in the
magnitude estimationprocedure are pure, true, valid indicalions of subjectivemagnitude.And this is arehtively straightforward case in the sense that there is general agreement
about what "loudness"means.When one gets to less well-defined subjective attributes, like "quality", the question of validitybecomes even morepressing -- is there a dimension of
quality and can we validly measure it? This is not to say that
the validityof theobjective methods is beyond question.Sim13
VIEMEISTER
I
i40 - _._
12o
l00
n-a 8o
m
o3
'
'
60
40
20
0
_ .......
j_
, ...... ,I
, ....... t
t
z0
100
1000
10000
Frequency (Hz)
Figure1.Thelowercurveisthe audibilitycurvebasedupon
minimumaudiblefield measurements.
Thiscurveisa eompositeof datafromCorse[TheExperiment
Psychoio.qy
of
Sensory
Behavior,
Holt,Rinehart
a
nd
Winston,
N
ewYork,p.
280, 1970]anddatafromRobinsonandDadson[Brit.J.
Appl.Phys.7, 166-181,1956].Theuppercurveis thethresholdfor painor "tickle"and isbasedupondatafromWegel
[Ann.(2)to/.,Rhinol.,Laryngol.41,740-779, 1932].
ply because we measure a threshold using an objective technique does not mean that threshold is a valid measureof the
subject's ability to hear. It has been repeatedly demonstrated
that factors that are unrelatedto hearing can affect thresholds,
These include learning and attentionaleffectsand responsebiases. A fairly recent development in psychophysics has been
the application of Signal Detection Theory. Among other
things this has provided measures of performance that are
more valid in the sense that they are relatively freeof contamination by non-sensory factors,
The point of all this is to be skeptical of the numbers -they may not mean what we think they do. We are trying to
measure some aspect of the behavior of a very complex biological system. The strategies to measure behavior are deceptively simple. However, with objective psychophysical
techniques and great care in subject training, the reliability
and validity of psychophysical measurement is close to
that for physical measurement,
I. ABSOLUTE SENSITIVITY AND THE DYNAMIC
RANGE OF HEARING
A. Absolute sensitivity: the audibility curve
Figure 1 shows the familiar audibility curve for human
hearing together with one measure of the upper intensity
limit of hearing. These curves bound the so-called auditory
area. The audibility curve represents the threshold in dB
SPL for a pure tone presented in quiet as a function of frequency. The curve shown is for young adults with normal
hearing and the tones are presented via a loudspeaker -these are "Minimum Audible Field" measurements and, unlike headphone measurements, reflect, in part, the acoustic
properties of the head, torso, and external ear. I would like
to call comment on three aspects of the audibility curve,
The first is the general U-shaped form of the function,
What accounts for this? There are many factors but the
most important -- for normal hearing people -- is the
transfer function of the middle ear. The human middle ear
consistsof three small bones,the ossicles,and their supporting structures. It transfers sound energy from the
14
eardrum to the cochlea, where the hair cell receptors are lorated, and serves essentially as an impedance matching device. There is a beautiful story of the evolution of the
middle ear but there is not enough time in this talk to do it
shows a bandpass characteristic that is roughly similar to
an inverted audibility curve.
The second aspect concerns the effects of hearing loss.
When we talk of a hearing loss we are referring to a
justice. in
The
transfer
of an
the elevation
middle ear
change
thepressure
audibility
curve, function
specifically
in
thresholds above the normal threshold. Hearing loss can be
producedin manyways-- exposureto intensesoundscan
irreversibly damage hair cells, the receptor cells within the
cochlea,as can exposureto certaindrugs, ototoxic chugs.
High frequenciesaregenerallymoresusceptibleto damage
and as we age we tend to first lose our sensitivity to high
frequencies.I would like to emphasizethat theregenerally
is moreto hearinglossthana simplereductionin sensitivity. In a frequency region of hearing loss there often are additional changes that can affect perception. Thus, we generally can not restore normal perception by restoring
normal sensitivity using a hearing aid or using equalizers
or tone controls on a audio reproduction system. The study
of the perceptual consequences of hearing loss is a very acfive research area of psychoacoustics and audiology.
Finally, I would like to remark on the incredible absolute
sensitivity of our auditory system. At 3 kI-Iz, where we are
most sensitive, a sound at threshold produces a displacement of the eardrum that is about 1/100 of the diameter of a
hydrogen molecule! One can speculate on why we are so
sensitive, but I won't. A more tractable question is what determines our absolute sensitivity. One possibility is that
there is a true sensory threshold, a "barrier" in our auditory
system which requires a certain energy to be exceeded. The
notion of a true sensory threshold, as opposed to the operationally defined thresholds we usually talk about such as for
the audibility curve, is generally held in disrepute. Very
weak signals, even those below "threshold" convey some
information -- they are not filtered out by the operation of
a sensorythreshold. (If thereis no threshold,or limit on
perception, then the issue of subliminal perception becomes
moot).Anotherpossibleexplanationfor our absolutesensitivity is that thermal agitation of the air molecules near the
eardrum -- Brownian motion -- provides a noise floor that
limits our ability to detect a tone in "quiet". This also appears not to be correct, at least for humans. The current
consensus is that our sensitivity is limited by noise, but not
Brownian noise at the eardrum. Rather it is the "noise" that
is characteristic of sensory transmission. Transmission of
information through our auditory system is inherently
stochastic. For example, most auditory nerve fibers are
spontaneously active -- they show responses in the absence
of auditory stimulation. "Internal noise" such as this must
limit sensitivity, not only our absolute sensitivity but also
our sensitivity to changes in frequency and amplitude.
B. The dynamic rangeof hearing
Theuppercurve in Figure I showsthe "threshold"for pain
and is one measure of the upper intensity limit of hearing.
AES8Ih INTERNATIONAL
CONFERENCE
It is clear from this figure that the dynamic range of hearing, the range between threshold and pain, is enormous. At
3 kHz the dynamic range is about 120 dB. Of course, the
"every day" dynamic range, the one in realistic listening
situations, is smaller because the lower limit, the audibility
curve, will be raised by ambient noise. Nevertheless, we
can hear over an remarkable range, particularly remarkable
and wonderful because the "front end" of the auditory system is essentially mechanical. It is also remarkable because
this dynamic range is available to us almost instantly: after
listening to a very loud sound, e.g. a pure tone at 115 dB,
our threshold has recovered to the quiet threshold within
about 500 ms. This and other evidence indicates that the
auditory system does not appear to maintain its dynamic
range by using a gain-control mechanism -- one which adjusts a limited dynamic range to an operating point determined by the ambient level. This is the "trick" used by the
visual system -- the system adapts to the ambient illumination and works around that point. The auditory system
does not appear to work that way. How does the auditory
system maintain its dynamic range? This is a fundamentally important question because it concerns the question of
how auditory information is coded and processed, something we must know if we are to claim that we understand
hearing. At the level of the auditory nerve we know that
somehow information over a 120 dB range is represented
in the activity of a population of nerve fibers each of which
has a typical dynamic range of only 30- 40 dB. At present
the best hypothesis is that sub-populations of nerve fibers
convey information over different intensity ranges. At levels above about 40 dB only a small population, about 1520% of the 30000 nerve fibers, are coding the sounds we
communicate with.
II. TEMPORAL ASPECTS OF HEARING
A. Temporal resolution
The auditory system is extremely fast, at least when
compared to other sensory systems. By this I mean that we
can detect or resolve relatively brief changes in sounds,
For example, it has been shown that a periodically interrupted white noise sounds continuous -- the interruptions
are no longer audible -- only when the interruption rate is
more than 24 thousand interruptions per second, another
way of looking at this is that we can resolve or "follow"
very brief intensity changes. In contrast, an interrupted
light appears continuous once the interruption rate exceeds
only about 60 interruptions per second. In this sense the
auditory system is almost two orders of magnitude faster
than the visual system,
There are many manifestation of the rme temporal resolution of hearing. The threshold for detecting a gap in a
continuous sound is about 2 msec, we can detect amplitude
modulation up to modulation frequencies of about 2 kHz,
and, as mentioned above, we recover from exposure to intense sounds quite quickly. The auditory system seems to
have been "built" to process rapidly changing dynamic signals. It is not surprising that the rapid dynamic changes
present in speech and in music are percept,Ja!!yimportant,
Clearly, the information transmitted via the auditory
AES 8th INTERNATIONAL CONFERENCE
VIEMEISTER
9o
1o
J
m so
m
e_
30
1o
o._,
_ I t t J i II
o.s
z
FREQUENCY
t
2
, , , ,, ,I
s
lO
(KHZ)
...
s0 - _
.--I
mtn70
60
................
'k
x_
_ 5o
_ n0
ce
_ _0
_
N,_
,
_
-.
20
100
1000
Frequency
10000
(Hz)
Figure3. Psychophysicaltuning curves for three signal frequencies. The ordinate is the level of the masker that is necessaryto just mask the signal. Data from Wightman et al. [In
Psychophysics and physiology of hearing. Evans, E. F. and
Wilson, J. P.(ods.). Academic Press, London. 1977].
shown that the 200 Hz pitch does not result from the generation of a 200 Hz distortion product in the peripheral auditory system. Various models have been proposed to account
for the basic phenomenon of residue pitch and of the many
related phenomenon. These models generally propose fairly
extensive central processing and can involve cognitive,
learning-related factors. The models have evolved to the
point that many can accurately predict the pitch (or pitches)
of very complex sounds such as bell-strikes.
IV. INTENSITY PERCEPTION AND LOUDNESS
A. Intensity discrimination.
Under optimal circumstances
a 1 dB change in sound
intensity can be detected. That is, we can just detect a 1 dB
intensity difference between two bursts of sound, we can
detect a 1 dB increment in a continuous sound, and we can
detect a 1 dB "bump" in the spectrum of an spectrally flat
sound. This is tree, approximately, over a very wide range
of sound levels. Thus, for example, we can just detect the
intensity difference between a 20 and a 21 dB sound and
between a 120 and 121 dB sound. There are several aspects
of this that deserve comment. First of aH, the fact that relatively small intensity differences can be detected over such
a wide range -- a range of over 100 dB -- is another manifestation of the remarkable dynamic range of the auditory
system. Secondly, it should be emphasized that the decibel
is a relative measure: a 1 dB change at 120 dB is a much
larger absolute intensity change than a 1 dB change at 20
dB. A 1 dB change corresponds to a change of about 26%.
Constant relative intensity changes are just detectable. This
fact is known as Weber's Law. Specifically, Weber's Law
states that: AI= k I, where AI is the absolute intensity
change that is just-detectable, and I is the absolute intensity
of the reference. Weber's Law is one of the great "laws" of
experimental psychology and dates back to the work of
Weber and Fechner in the early 1800's. Weber's law holds,
at least approximately, for a wide variety of auditory stimuli and also holds for intensity discrimination in most of
the other senses. Weber's Law, or a version of it, also holds
for detecting signals in noise: this version states that a constant signal-to-noise ratio yield constant detectability and is
17
VIEMEISTER
one reason why we often use this specification. The fundamental question is why does Weber's Law hold? Why are
relative intensity changes so important in hearing? What is
it about auditory processing that makes relative changes
important? We have theories, of course, including that the
auditory system employs logarithmic compression, but
none has proven completely satisfactory.
Finally, I would like to put intensity discrimination in the
broader context of how complex sounds are processed and
ultimately perceived. Intensity discrimination
tells us
something about how changes in amplitude or intensity
that occur within a limited spectral region are detected and
processed. More generally, it has given us valuable hints
about how the spectral characteristics of a sound might be
coded, particularly at the level of the auditory nerve. A recent and exciting development in psychoacoustics addresses the closely related problem of how we discriminate and
perceive speclml shape or spectral "profiles". The important difference is that in this research the subjects must
make a comparison across frequency, not just what happens within a single frequency region. It seems clear that
subjects can do this quite well. It is also clear that such capability is crucial for real-world auditory perception,
B. Loudness
Loudness is, of course, one of the fundamental attributes of
auditory perception.
It is the subjective magnitude of
sound. It is, like pitch, not a physical property of sound. At
the risk of belaboring the obvious: it is almost always incorrect to say that: "the loudness of the sound was 90 dB
SPL". The 90 dB SPL is a physical measurement and is
only indirectly related to the loudness of the sound. A 90
dB sound could be, depending on its spectrum, loud or
quite soft.
I will not attempt a thorough review of loudness but will
mention several highlights. As you are well aware, "equal
loudness contours" have been measured for tones and for
narrow bands of noise. These measurements are based
upon loudness matches and from these measurements we
can determine the "loudness level" (in phons) of a sound,
When we say that the loudness level of a sound is 50 phons
we mean that it is judged equal in loudness to a 1kI-Iz tone
presented at 50 dB SPL. The growth of loudness with intensity has been extensively studied, typically using magnitude estimation procedures, and we know that for sounds
above threshold a 10 dB increase in level will produce approximately a doubling of loudness. Finally, there are several fairly successful schemes for calculating the loudness
of complex sounds,
I am minimizing a discussion of loudness because in my
opinion loudness is not particularly important in hearing,
While it is a primary auditory attribute, loudness, in itself,
is not important for auditory communication, speech and
music included. It is important if a sound is too loud or too
soft, but within this vast range we can communicate about
equally effectively regardless of loudness. What is important, crucially important, for auditory communication are
the intensity changes that occur over frequency and over
time. This is where the information is and we must under18
stand how these changes are processed if we are to understand auditory perception. Loudness, has little, if anything,
to do with it. Yes, dynamics are important in music, at least
certain types of music, but far more important are the spectral shapes of the sounds and their temporal characteristics.
V. SUMMARY AND CONCLUSIONS
In psychoacoustics we are concerned with the behavior of a
very complex system and, despite the stories I've told you,
there are many potential pitfalls in trying to measure hearing
and in drawing valid conclusions from our measurements. I
discussed the distinction between objective and subjective
psychophysical measurements. The question of validity is
not as pressing with objective methods, and the data can be
much more directly related to underlying physiological processes. But, there are many aspects of perception, including
those related to the evaluation of audio reproduction devices,
that simply are not amenable to objective psychophysical
measurement. We must use subjective methods in some cases, but considerable caution should be exercised in interpreting the results of such measurements.
The dynamic range of hearing is the intensity range between absolute threshold and a somewhat arbitrary upper
limit, often taken as the "threshold" for pain. Absolute
thresholds (measured in quiet) are determined by internal
noise, by the transfer function of the acoustic system up to
the cochlea, and by many other factors. Hearing loss is defined by an elevation in absolute threshold There generally
are perceptual consequences of hearing loss in addition to a
loss in sensitivity. Thus, simple compensation for the loss
in sensitivity generally does not restore normal hearing.
The dynamic range of hearing is spectacular and it is not
yet clear how the system maintains such a large range. This,
the so-called "dynamic range problem", is fundamental to
an understanding of how we hear. In discussing this probleto, I mentioned that this huge range is available to us almost instantly m our ears do not slowly adjust their gain to
operate over a restricted range. Clearly, audio reproduction
that does not audibly degrade the signal must somehow preserve a large dynamic range. If this is accomplished by using gain-adjustment devices careful attention must be devoted to the temporal characteristics of those devices.
The auditory system seems to have been designed to
process rapidly changing sounds m sounds whose amplitude and/or frequency changes over time. I distinguished
between two types of temporal resolution: within-channel
and cross-channel. Within-channel resolution reflects sensitivity to envelope changes that occur over a relatively
small portion of the spectrum, a bandwidth of the order of
20% of the center frequency. Cross-channel resolution
refers to sensitivity to temporal difference that occur over
widely spaced frequency regions. For both types of resolution, the approximate auditory time constants are about 3
ms. Phase disparities in reproduction equipment may be
audible if they exceed these times.
A fundamental fact about hearing is that the auditory systern is tonotopically organized. At any given level, different
frequencies stimulate difference places. This organization
begins in the cochlea and, at this level, shows a very high
AES 8th INTERNATIONAL CONFERENCE
AN OVERVIEW
OF PSYCHOACOUSTICS
ANDAUDITORYPERCEPTION
degree of frequency selectivity. There are direct psychoacoustical manifestations of frequency selectivity -- the notion of critical bands, of psychophysical tuning curves, and
of auditory fiteringcapture these. Although it is clear that
the auditory system performs a type of frequency-to- place
analysis, it is also clear that timing information is also preserved. Timing information, or phase locking, is clearly important in binaural hearing and also underlies the high degree of temporal resolution of monaural hearing. Whether it
plays a basic role in other types of auditory coding, in pitch
perception, for example, is not clear.
Pitch is a very important subjective attribute of sound,
particularlyin musicalperception.A Iong-stundingissueis
how pitch is coded at the periphery m this is the place vs.
time issue I have just mentioned. The more general issue is
how we extract the pitch of complex sounds. It is clear that
the pitch, or pitches, of such sounds is not simply determined by the physical characteristics of the sounds -- extensive "computation", perhaps including stored or learned
strategies, seems to be involved.
AES8Ih INTERNATIONAL
CONFERENCE
19