A Study of The Subjective Performance of Ambisonic Decoders

University of Southampton
Faculty of Engineering and the Environment

Institute of Sound and Vibration Research
A Study of the Subjective Performance of Ambisonic Decoders

by
Thomas Samuel Leach

Supervisors: Prof. Philip Nelson & Dr Filippo Fazi
A thesis submitted in partial fulfillment for the degree of Master of Science
December 2010
Abstract
This thesis is focused on the works of Michael Gerzon and more specifically the methods and hypotheses introduced by him in his 1992 Metatheory. The thesis links the works of Gerzons velocity vector and energy vector models to the binaural methods of the localization of sounds, and provides a study into the accuracy of 3 decoding methods designed on the principles of Gerzons paper, the velocity, the energy and the combined decoding methods. A computer model was created to test the effects of different ambisonic parameters and the results used to design a subjective listening test that formed the main focus for this study. The subjective listening test tried to identify the limitations of Gerzons hypotheses and to assess their validity. From the literature, computer simulations and the results of the subjective listing tests it was shown that Gerzons hypotheses were correct for the optimization of different frequency bands, however the study failed to conclude the advantages of a combined decoder over a single decoding method.
To Marissa, without you none of this would have been possible, keep looking to the stars.
Table of Contents
List of Figures List of Tables Acknowledgements Introduction Sound Localization 2.1 Introduction to Sound Localization 2.2 Terminology 2.3 Localization & Blur 2.4 Methods of Measurement 2.5 Localization In the Horizontal and Median Planes 2.5.1 Localization in the Horizontal Plane 2.5.2 Localization in the Median Plane 2.6 Binaural Cues 2.6.1 Interaural Time Difference 2.6.1.1 Envelope Interaural Time Difference 2.6.2 Interaural Intensity Difference 2.7 Cones of Confusion 2.7.1 Monaural Spectral Cues and Pinna Effect 2.7.2 Head Movement Sound Field Reproduction 3.1 Introduction 3.2 Ambisonics 3.3 Conventions 3.4 The Spherical Wave Equation 3.5 Spherical Harmonics and Higher Order Ambisonics 3.6 Decoding 3.7 Gerzons Metatheory 3.7.1 Velocity Vector Model 3.7.2 Energy Vector Model 3.8 Decoding Gains iv vi vii 1 3 3 4 5 6 8 8 9 9 10 11 12 12 13 14 15 15 16 17 18 22 26 30 31 32 33 i
3.8.1 Equivalent Panning Functions 3.9 Gerzons Vienna Decoder 3.10 Aims of Thesis Simulations 4.1 Introduction to Chapter 4.2 Ambisonic Decoder Model 4.3 Frequency Response Model 4.4 Increasing Ambisonic Order 4.5 Number of Loudspeakers Experimental Procedure 5.1 Introduction to Chapter 5.2 Introducing the Response 5.2.1 Perceptual Response 5.2.2 Affective Response 5.3 Design of Listening Test 5.3.1 Independent Variables 5.3.1.1 Signal 5.3.1.2 Signal Category 5.3.1.3 Time Domain Characteristics 5.3.1.4 Spectral Characteristics 5.3.1.5 Reproduction System 5.3.1.6 Listening Room 5.3.1.7 Calibration 5.3.1.8 Subjects 5.3.1.9 Pointing Methods 5.3.2 Dependent Variables 5.3.2.1 Measurement Scale 5.3.2.2Bias Effects 5.3.2.3 Contraction Bias 5.3.2.4 Visual Bias 5.4 Experimental Procedure 5.4.1 Experiment List 5.5 KEMAR Measurements
35 35 36 37 37 38 40 44 46 50 50 50 50 51 51 52 52 52 53 53 58 60 61 63 63 63 64 64 64 65 65 70 71 ii
Results 6.1 Listening Test Results 6.2 KEMAR Results Discussion Conclusion References Appendix A A.1 Proof of decoding from wave to matrix, Equation (3.31) A.2 Computing the energy vector Appendix B - Data Sheets Appendix C - MatLab Code Appendix D - Consent Forms Instruction Sheet Questionnaire Answer Sheet Participant Consent Form
72 72 79 82 92 94 99 99 100 102 104 108 109 110 111 112
iii
List of Figures
Figure 2.1: The median, frontal and horizontal planes of the head. Figure 2.2: Illustration of the cones of confusion. Figure 3.1: Illustration of the Bell Labs sound field reproduction method. Figure 3.2: Illustration of spherical coordinate system, relative to the Cartesian coordinate system. Figure 3.3: Plot to show spherical harmonics of the eighth order in the x,z plane, plotted looking down the x axis, the black lines indicate nodal lines along with the outline of the sphere. Figure 3.4: Illustration to show the directivity patterns of the spherical harmonics up to order n=3. Figure 3.5: Illustration of decoder style speaker gains, the large arrows indicate the magnitude of the speaker gains, while the colour indicates phase, red positive, blue negative. The velocity decoding method is shown on the left and the energy decoding method on the right. Figures 4.1 to 4.12: Decoder frequency comparison simulation. Figures 4.1 4.4 Velocity Decoding, Figures 4.5-4.8 Energy Decoding, Figures 4.9-4.12 Target Sound field for 250Hz, 700Hz, 2500Hz and 4000Hz respectively. Figure 4.13: Polar plot to show directionality of Velocity decoding (Red) and Energy decoding (Blue) for a 1st order system. Figure 4.14: Polar plots to show how directivity increases when increasing order from M=1 to M=15 for both Velocity Decoding (Red) and Energy Decoding (Blue). Figure 4.15: Comparison to show the affects of changing number of loudspeakers (L). 700Hz target wave incident at 60, reproduced using 3rd order Energy decoding. Figure 4.16: Reproduced field for a target plane wave incident at 60, using 6 loudspeakers and 3rd order Energy decoding. Figure 5.1: Time domain characteristic of the 5 Broadband GDWN bursts used in the experiments. Figure 5.2: Magnitude response of 250Hz 2nd order Butterworth filter design. 53 54 iv 48 47 45 43 41 34 25 24 17 4 13 16
Figure 5.3: Magnitude response of 700Hz 2nd order Butterworth filter design. Figure 5.4: Magnitude response of 2500Hz 2nd order Butterworth filter design Figure 5.5: Magnitude response of low and high pass 2nd order Butterworth filters, with crossover at 700Hz. Figure 5.6: Phase response of 2500Hz 2nd order Butterworth filter design. Figure 5.7: Power Spectral Density of 250Hz Stimuli. Figure 5.8: Power Spectral Density of 700Hz Stimuli. Figure 5.9: Power Spectral Density of 2.5KHz Stimuli. Figure 5.10: Power Spectral Density of the recombined Broadband Stimuli. Figure 5.11: Diagram of reproduction system, set up in the ISVR large anechoic chamber. Figure 5.12: Photograph of the large anechoic chamber at the ISVR. Figure 5.13: Screenshot of recorded calibration tones. Figure 5.14: Spectral analysis of channel 2 recorded calibration tone. Figure 6.1: Box plot to show Experiment 1 (Recombined Broadband Stimuli) Results. Figure 6.2: Box plot to show Experiment 2 (250Hz Bandpass Filtered Stimuli) Results. Figure 6.3: Box plot to show Experiment 3 (700Hz Bandpass Filtered Stimuli) Results. Figure 6.4: Box plot to show Experiment 4 (2500Hz Bandpass Filtered Stimuli) Results. Figure 6.5: Box plot to show Experiment 5 (Broadband Stimuli) Results. Figure 6.6: Graph to compare calculated interaural level differences from binaural recordings of test procedure 1, 4 and 5 measured using a KEMAR system. Figure 6.7: Graph to compare calculated interaural time differences from binaural recordings of test procedure 1, 2 and 5 measured using a KEMAR system.
54 55 55 56 56 57 57 58 60 61 62 62 77 77 78 78 79
80
80
List of Tables
Table 3.1: Spherical harmonics for zeroth and first orders. Table 5.1: Test 1 Running Order Broadband Noise, Combined Decoding. Table 5.2: Test 2 Running Order 250Hz Stimuli. Table 5.2: Test 3 Running Order 700Hz Stimuli. Table 5.2: Test 4 Running Order 2500Hz Stimuli. Table 5.2: Test 5 Running Order Broadband Stimuli. Table 6.1: Experiment 1 (Broadband Stimuli) Results. Table 6.2: Experiment 2 (250Hz Bandpass Filtered Stimuli) Results. Table 6.3: Experiment 3 (700Hz Bandpass Filtered Stimuli) Results. Table 6.4: Experiment 4 (2500Hz Bandpass Filtered Stimuli) Results. Table 6.5: Experiment 5 (Recombined Broadband Stimuli) Results. 23 66 66 67 68 69 72 73 74 75 76
vi
Acknowledgements
I would like to take the opportunity to first and foremost thank Dr. Filippo Fazi for his guidance, endless knowledge of ambisonic systems and most of all his patience. Id like to thank my parents Phil and Caroline for their support throughout the years, both emotionally and financially. Id like to thank the friends Ive met this year at the ISVR, cheers guys its been fun. Id also like to take the opportunity to thank Richard Perkins for being supportive and understanding of this dissertation whilst also trying to help me find my feet at Parsons Brinckerhoff. And finally I would like to say a big thank you to my partner Marissa, without your dedication and perseverance I know I would have never achieved this, thank you for your undying support and companionship, I love you.
vii
Equation Sectio n (Next) Equati on Chapter (Next) Sectio n 1a
Chapter 1
Introduction
Hearing, one of the five senses and one of the most important, is used by all mammals. The ability to localize sound is an important part of everyday life and the ability to identify the direction in which a sound is originating accurately is intrinsic to survival. In 1907 Lord Rayleigh was the first to theorize the physics of the perception of location. Rayleigh proposed that two binaural cues were responsible for human localization of stimuli, for low frequency sources localization was due to the interaural time difference, while for higher frequencies it was due to the interaural level difference. When designing sound field reproduction systems, designers need to take into account the way the human ear receives and the brain perceives sound. In order to recreate natural sounding events the system must be capable of reproducing accurately the natural cues that make up sound. Sound field reproduction has been around since 1881 (Rumsey, 2001) and has culminated into the multi-channel surround sound reproductions known today. The aim of all reproduction systems is to try to accurately recreate an original sound field and these current multi-channel methods use the relative intensities of the loudspeakers to recreate the original sound, but due to the scalar pressures of the speakers it is not possible to recreate the correct pressure and velocity at the listening position (Benjamin et al. 2006). Ambisonics is different; ambisonics tries to control the sound field using a regular array of speakers, and it is this technology that this thesis is based on. Invented in the 1970s ambisonics is a method of trying to capture, store and reproduce sound in each possible direction, and each possible distance from a listener (Gerzon, 1974).
Michael Gerzon, often credited for the invention of ambisonics, released in 1992 his Metatheory, or theory of theories, in which he details a method for decoding ambisonics into various methods, each relating to one part of human sound localization. It is from his paper Gerzons most important models appear, the velocity vector model and the energy vector model, both try to conserve their relative energies and are based on the workings of the interaural time difference and the interaural level difference respectively. This objective of this thesis is to test the velocity and energy vector models set out by Gerzon in order to compare the subjective accuracy of different decoding methods, it is to the authors knowledge that no subjective testing directly comparing the different decoder designs has been carried out to date, although computer simulations study has been conducted by Jerom Daniel (2001).
Equation Chap ter (Next ) Section 1C
Chapter 2
Sound Localization
2.1 Introduction to Sound Localization
In natural environments the approach of a predator, a mate, or ones prey may be conveyed by subtle fluctuations within the acoustic environment. In many instances it is likely that the early detection of an intruder is not due to a sound which is uncommon in either amplitude or frequency, but rather because it appeared at an inappropriate location within the acoustic landscape (Brown and May, 2005). Sound localization is an important part of everyday life and the ability to identify the direction in which a sound is originating accurately is an important part of survival. The factors of sound localization in animals, specifically humans and the effects on how sounds are perceived have been the subject of many studies. Moore (1977) describes localization as the judgment of direction and distance of a sound source, in order to understand how we perceive a sound one must first ask what is perception? Lungwitz (1923 in Blauert 1974) elegantly describes perception as the moment when the perceiver and the perceived encounter each other in such a way that the perceiver becomes conscious of the perceived.
2.2 Terminology
When researching sound localization there are a number of common terminologies and phrases that are used within the research community. The position of a source is normally given in the form of a set of coordinates relating to an upright listener, with the origin being that of the subjects head, which is often modeled as a solid sphere for simplicity. Three orthogonal reference planes intersect the head. The first of which is the horizontal plane, which passes through the head parallel to the ground and passes through the head inline with the openings to the ears. The second plane is the median plane, perpendicular to the horizontal plane, it runs equidistant between the ears and effectively divides the head vertically in two. The final plane of reference is the frontal plane, also perpendicular to the horizontal plane, it passes through the openings of the ears and divides the head into front and back. Figure 2.1 shows the frontal, horizontal and median planes in relation to the listener.
Figure 2.1: The median, frontal and horizontal planes of the head. Illustration adapted from (Blauert, 1974).
The direction of a sound is represented by two coordinate angles given in degrees, relating to the distance around the horizontal and median planes. Azimuth denotes the angle from the front of listener in the horizontal plane, and has the symbol !. Elevation describes the displacement of the sound around the median plane, denoted by the symbol ". In the coordinate system, (0, 0) relates to a point directly in front of the listener. Positive angles of the azimuth are to the left of the forward facing head, i.e. in an anti-clockwise direction, positive angles in the median plane start from the front and go upwards over the head. The perceived distance of a sound is denoted by the distance, r. Sound source distance is not reviewed in this current literature review, see Blauert (1974) for details on perceiving distance.
2.3 Localization & Blur

Before examining in detail the individual processes used by the human body to identify where a sound is coming from, it would be reasonable to give an overview of the capabilities of the auditory system as a whole, and how well it can accurately locate sound spatially in the horizontal and vertical planes. Localization blur is the smallest change of an event to change the location of the audible event. The just detectable change in the position of a sound source is termed the minimum audible angle (MAA), and is generally regarded as the most precise index of localization accuracy (Brown and May 2005). In the following discussion the term localization blur is the amount of displacement of the position of a sound source that is recognized by at least 50% of experimental subjects.
A range of different experiments have been used to try and assess the ability to localize sounds, many investigators have tended to measure the directionality of hearing using biologically significant stimuli, such as vocalizations (Brown et al. 1978, Gardner 1968), or more commonly using simple synthetic signals, such as pure tones (Mills 1958, Brown et al. 1978), or the more complex clicks and noise bursts (Boerger 1965a in Blauert 1974, Heffner and Heffner 1982 in Popper 2005). A different approach called sound lateralization asks subjects to make judgments about a sounds position, which is presented via headphones and are perceived to be inside the head. This method allows for a greater precision of control to the experimenter as to when the sounds reach each individual ear.
2.4 Methods of Measurement

An experiment conducted by Haustein and Schirmer (1970), tried to measure the horizontal localization and the associated blur using 900 untrained subjects. The experiment used white-noise pulses with duration of 100 ms, the subjects had to position a movable loudspeaker so that it came into alignment with a fixed source, termed the acoustical pointer. The subjects then had to displace the moveable speaker so it was at a position either directly forward, behind, to the left or to the right of them. The experimental procedure was criticized by Blauert (1974) as it was not sure whether the deviations in the results were due to subjective errors in the judgment of direction, or whether they actually reflect sound localization. The results from the Haustein and Schirmer experiments have since been repeated and similar deviations were observed by Wilkens (1972) and van de Veer (1957).
A method to explore the effects of both vertical and horizontal localization is to suspend multiple sources above a subjects head in a semi-sphere as used by Middlebrooks et al. (1989). To stop subjects simply choosing a loudspeaker they think is producing a sound, the subjects are placed in a darkened room or blindfolded, the subjects are asked to use a pointing device to indicate the perceived direction of the sound. This pointing device may just be an extension of the forearm, from which the horizontal and vertical angles can be measured. This method however may induce measuring errors, an alternative is to ask the subject to use their nose to point to the perceived location of the source, the measurements are taken using sensors attached to the subjects head as used in a study by Carlile et al. (1999). A disadvantage of large speaker arrays is the effect of mirroring; this is particularly common when using narrow-band signals. The sound event appears to be positioned not in the direction of the sound source but more or less axially symmetric through the axis of the ears (Perekalin 1930 in Blauert 1974). For example if a sound is incident at an angle of 160 in the horizontal plane, it may be perceived as coming from an angle of 20. This effect was particularly noticeable in a study by Stevens and Newman (1936), in an experiment using tone-bursts on the roof of a building, thus minimizing reflections. The study was investigating horizontal localization where subjects had to report the direction of the sound to the nearest 15, although left-right confusions were rare, when the subjects were presented with a low-frequency sound directly in front, it was often indistinguishable from its mirror image behind. These results have been conformed in a study by Sandel et al. (1955) in an anechoic chamber. A way to avoid this problem is to use a smaller, limited number of speakers from which the subject can choose, similarly the ability to move ones head also can reduce the mirroring effect.
In a study into vertical localization by Gardner (1973), an arc in the median plane of nine speakers was positioned around the subject. The problem with smaller setups are that the subjects are often asked to state which loudspeaker the sound is presented from, this may force a subject to choose one of the loudspeakers available, even if the perceived sound location falls outside the position of the loudspeakers, so this method raises questions on how reliable it would be in a real life situation where sound potentially presented from any angle. The smaller arrangement can however reduce the error in localization by having such a restrictive number of positions to choose from.
2.5 Localization In the Horizontal and Median Planes

2.5.1 Localization in the Horizontal Plane Experiments into the ability to localize sound in the horizontal plane has shown that the minimum localization blur occurs in the forward direction, with the blur increasing as the displacement from the forward direction increases to a maximum with the sound source positioned at 90 to the subject. When the source is at a right angle to the direction the subject is facing, the localization blur is between three and ten times the value of the forward facing blur (Bloch 1893 in Blauert 1974). Behind the subject the blur decreases. In the forward direction, using sinusoids Mills (1958) measured the subjective localization blur to be approximately 1.0 - 3.1, while in an study using familiar speech Gardner (1968) measured the localization blur to be 0.9.
2.5.2 Localization in the Median Plane Unlike localization in the horizontal plane where there are interaural differences, in the median plane the sound is arriving to both ears is identical. In a study by Damaske and Wagner (1969), 7 subjects were placed under an arc of speakers in the median plane, the heads of the subjects were immobilized and familiar speech was used at a level of 65 phon. The study found that when a source is directly in front of the subject the sound localization blur is approximately 9, a result repeated in a study by Blauert (1970), using 20 subjects. Damaske and Wagner found that as the median angle increased to 90, directly above the subject the blur increased to 22, before reducing as the median angle increased to a blur value of 15 behind the subject. In an earlier study by Blauert (1968), it was shown that when a source signal is used with a bandwidth less than 2/3 of an octave, neither localization nor blur could be determined in the median plane with respect to the sound source. The perceived direction of the sound event was due only to the frequency of the signal and not the direction of the sound source.
2.6 Binaural Cues

Lord Rayleighs (Strutt, 1907) duplex theory was the first analysis of the physics of the perception of location. Rayleighs theory still holds today, he noted that two binaural physical cues dominated the ability locate an incoming sound, the interaural time difference (ITD) and the interaural intensity difference (IID). These were caused due to the baffling effect of the head. Rayleigh proposed that for low frequency sources localization was due to the ITD, while for higher frequencies IID was used for the localization of a sound source. Other studies at the time include Mallock (1908) and a classic study by von Bksy (1930); Blauert (1974) contains an in-depth history of investigations into sound localization. The effects of ITD and IID in localizing sounds will now be examined in turn.
2.6.1 Interaural Time Difference When a sound is presented towards a listener at an angle not directly in front or behind, the time taken for the sound to reach each ear will be different, a maximum interaural difference will occur when the sound is either directly to the right or left of the forward facing listener assuming that the sound source has a wavelength which is equal to, or greater than the diameter of the head, around 1500Hz, so that it will diffract around the head. When a sound is in the median plane or directly in front or behind, the time taken to reach the individual ears is identical so no interaural difference occurs. Because the head is acoustically solid when a wave front reaches the head at an angle it will have to diffract around the head in order to reach the ears, this difference in path length can be modeled by assuming the head to be a rigid sphere, from this simplification Kuhn (1987) approximated the ITD as:
" 3a sin ! inc low frequencies $ $ c ITD ! # $ 2a sin ! high frequencies inc $ c %
(2.1)
Where a is the radius of the sphere used to model the head, c is the speed of sound in air and "inc is the angle between the median plane and a ray passing from the centre of the head through the source position, thus "inc = 0 is defined as straight ahead. Interaural time difference is referred to as interaural phase difference (IPD) for pure tones. One result of this effect is that subjects with larger diameter heads will experience a greater interaural time difference than a subject with a smaller head, consequently a subject with a larger head will be able to perceive finer changes in azimuth.
10
ITD are limited by frequency, when the period of the signal becomes twice the maximum possible ITD, the phase cues provided by the signal become ambiguous, at which point the ability to determine which ear is leading the phase and which is lagging becomes impossible (Moore 1977). A study by Yost et al. (1971) showed that for frequencies below 900Hz a shift in azimuth position could be detected, but no shift could be detected with signals above 1500Hz. Because of this ambiguity the information due to ITD is not enough on its own. The smallest detectable ITD is 10 s for broadband noise in the horizontal plane (Klump and Eady 1956 in Plack 2005), which corresponds to a MAA of around 1 in the horizontal plane, straight in front of the listener. 2.6.1.1 Envelope Interaural Time Difference Although the ability to localize sounds using ITD is limited to be effective at low frequencies, it is still possible for ITDs to provide useful cues for higher frequencies for more complex sources. Most natural frequencies contain envelope fluctuations, these are generally much slower in fluctuations than fluctuations in simple sounds, and so the time of arrival of the envelope features can be used to resolve ambiguities. Henning (1974) investigated the importance of envelopes, and it was found that for carrier signals of 3900Hz modulated at a frequency of 300Hz, localization due to interaural delays was as good as with a pure tone of 300Hz. It has since been shown that interaural delays can be detected with a carrier frequency as high as 10,000Hz (Bernstien and Trahiotis 2002 in Plack 2005).
11
2.6.2 Interaural Intensity Difference The second part of binaural hearing cues in Lord Rayleighs duplex theory is the interaural intensity differences, and arises due to two reasons. The first is due to sound intensity decreasing as distance increases, when a sound is presented at an angle of 90 the distance between the ears is at a maximum to the wave front, thus the level at the far ear will be reduced compared to the closest ear. Due to the relatively short distance between the ears this is a minor factor. The second reason is due to the acoustic shadowing of the head. The head acts as a barrier and as such prevents some of the energy of a sound presented from a direction reaching the other ear. Low frequencies diffract around the head so the intensity difference is less noticeable, thus IIDs are more for localizing higher frequencies. For example in a study by Moore (2003) a loudspeaker played directly to the side of the head will produce an IID of 1dB for a 200Hz signal, but an IID of as much as 20dB for a 6000Hz tone. Results reflected in a study by Harris (1972). The smallest detectable IID is around 1-2dB (Grantham 1984).
2.7 Cones of Confusion

When localizing sounds using the IID and ITD methods there will be multiple points in space at which the IID and ITDs are the same for each point, thus creating ambiguities. These points can be mapped to represent a cone, with an axis collinear with the interaural axis, this is termed the cones of confusion (Woodworth 1938). Figure 2.2 shows an illustration of the cones of confusion where a bird heard from the front can appear to be coming from the rear, as both positions have the same ILD and ITD.
12
Figure 2.2: Illustration of the cones of confusion Similarly when a sound is in the median plane the resultant IID and ITD will be zero for every angle, so there must be additional cues that help to resolve these ambiguities and to localize sounds. These additional cues are described below. 2.7.1 Monaural Spectral Cues and Pinna Effect Although we have two ears, information about sound localization can be obtained by just one ear, monaural cues to localization arise because of modifications to the sound waves due to the complex shapes of the pinna. The first attempts to explain the effects of the pinna suggested that it located sounds by the complex reflection and shadowing of rays, however this is incorrect, as the pinna have been shown to work by dispersion and the diffraction of waves (Blauert 1974). The pinna is composed of several ridges and cavities; Blauert goes on to explain how the incoming sound is modified by resonances of the cavities of the pinna. Because the cavities of the ears are small they only affect frequencies with short wave lengths, above 4000Hz or so. The precise modification of the sound waves depends on the angle at which the sound enters the ear. Pinna effects are thought to be responsible for localization of sound sources on the median plane. In a study by Gardner and Gardner (1973) the affect of removing the pinna effect dramatically reduced performance of localization in the median plane.
13
2.7.2 Head Movement Much of the ambiguity of sound localization can be resolved by moving the head, thus changing the positions and direction of the ears in the sound field. If a sound source is placed at a point with an ITD of 0 in the horizontal plane, by turning the head the sounds position can be resolved, if the head is turned to the left, the intensity will increase in the left ear if the sound is in front of the listeners original position, if the intensity decreases and increases in the right ear, then the source is behind the listener. Thurlow and Runge (1967) analyzed the effect of head movements about the three orthogonal axis of the head, these movements included tipping the head, rotating the head and pivoting the head. From their experiments they found that when head movements were allowed front-back errors reduced from 90% in fixed head experiments to 0% for all stimuli. Although there is clear evidence that head movements are involved in sound localization, it is believed their role is minor compared to other cues (Wightman and Kistler 1997). For head movements to help the sound source needs to be sustained long enough to give the listener time to respond, brief sound may be over before the listener has time to move the head, the effect of head movements is effectively nullified for short duration sounds.
14
Chapter 3
Sound Field Reproduction
3.1 Introduction
The goal of sound field reproduction is to try to accurately recreate an original sound field that has been sampled or simulated, by means of using an array of loudspeakers. One of the earliest attempts of sound field reconstruction was by Bell Laboratories in America. The Bell Labs technique for trying to recreate the wave front of a source on a stage was to arrange an array of microphones, each connected to a loudspeaker in the same order and spacing (Rumsey & McCormick 1994), as seen in figure 3.1. This technique worked well, but as each microphone was linked to its own speaker, this required a large number of microphones and loudspeakers so was not optimal. There have been several other techniques since Bells early experiments at sound field reproduction including quadraphonics, 5.1 surround sound and spaced stereo techniques, all of these methods use the relative intensities of the loudspeakers. The intensities are varied in the way that they sum in order to produce a pressure and particle velocity vector pointing the same way as the speakers. Due to the scalar pressures of the speakers it is not possible to recreate the correct pressure and velocity at the listening position (Benjamin et al. 2006). In contrast to this ambisonics uses a full array of speakers to try and control the sound field at the position of the listener to accurately recreate the original sound field.
15
Figure 3.1: Illustration of the Bell Labs sound field reproduction method. Illustration adapted from (Bates, 2009).
3.2 Ambisonics
Ambisonics was invented mainly by Michael Gerzon in the 1970s and is based on the spherical harmonic decomposition of the sound field (Gerzon 1972). The principles of ambisonics is described by Gerzon as For each possible position of a sound in space, for each possible direction and for each possible distance away from the listener, assign a particular way of storing the sound on the available channels. Different sound positions correspond to the stored sound having different relative phases and amplitudes on the various channels. To reproduce the sound, first decide on a layout of loudspeakers around the listener, and then choose what combinations of the recorded information channels, with what phases and amplitudes, are to be fed to each speaker. The apparatus that converts the information channels to speaker feed signals is called a decoder, and must be designed to ensure the best subjective approximation to the effect of the original sound field. (Gerzon, 1974)
16
One aspect that separates ambisonics from other sound field reproduction techniques it that it allows for the addition of height to the sound field in order to recreate a threedimensional sound field around a central point (Wiggins 2004). To capture the sound field, and to produce an ambisonic interpretation a minimum of 4 channels of information is required for first order models, captured using an omnidirectional microphone, and 3 figure-of-8 microphones facing the x, y and z coordinates. The resultant 4 channels of information are often entitled B-format (Daniel et al., 2003), this thesis does not try to cover the capture of ambisonic sound fields but for comprehensive information on sound field microphones and ambisonic encoding see Moreau et al. (2006).
3.3 Conventions
Ambisonics uses the spherical coordinate system for both two and three-dimensional space. The spherical coordinate system defines a point in three-dimensional (3D) space by the distance from the origin (r), the elevation from the vertical axis (!) and the azimuth from the horizontal axis ("). The spherical coordinate system is illustrated in figure 3.2.
Figure 3.2: Illustration of spherical coordinate system, relative to the Cartesian coordinate system. Illustration adapted from (Daniel, 2003).
17
The spherical coordinate system is related to Cartesian coordinate system through
x = r sin (! ) cos (" ) , y = r sin(! )sin(" ), z = r cos(! ).

(3.1)
The two-dimensional (2D) spherical coordinate system is obtained directly from the 3D system by restricting the points in the z - plane to z = 0.
3.4 The Spherical Wave Equation

The classical wave equation describes the temporal and spatial characteristics of a wave function, therefore it can be used to define an arbitrary sound field (Excell, 2003). The classical wave equation is given as
! 2 p(x,t) =
1 " 2 p(x,t) c 2 "t 2
(3.2)
where p(x,t) is the pressure, x is the position, t is the time and c is the speed of sound. The time harmonic dependence of the sound field can then be calculated to obtain the space and frequency properties from the pressure of the sound field
p(x,t) = p(x)ei! t
(3.3)
18
Equation (3.3) can then be substituted into equation (3.1) to give the Helmholtz equation (Fahy and Walker, 1998)
! 2 p(x) + k 2 p(x) = 0
where the wave number
(3.4)
k =
! 2" = c #
(3.5)
with # the angular frequency and $ the wavelength. The Helmholtz equation, (3.4), can then be rewritten in spherical coordinates using the relations in equation (3.1), to give (Williams, 1999)1
1 ! 2 !p 1 ! !p 1 !2 p 1 !2 p (r )+ 2 (sin " ) + 2 2 $ =0 r 2 !r !r r sin " !" !" r sin " !# 2 c 2 !t 2

The solution to equation (3.6) is given by the separation of variables:
(3.6)
p(r,! , " ,t) = R(r)#(! )$(" )T (t)

which leads to four ordinary differential equations (Skudrzyk in Williams, 1999)
(3.7)
d 2! + m2! = 0 2 d"
(3.8) (3.9) (3.10) (3.11)
1 d d" m2 (sin ! ) + [n(n + 1) # 2 ]" = 0 sin ! d! d! sin !

1 d 2 dR n(n + 1) (r ) + k2R ! R=0 2 r dr dr r2
1 d 2T + k 2T = 0 2 2 c dt
The majority of the content in this section has been adapted from (Williams, 1999) 19
The solutions to equations (3.8) to (3.11) can be written as linear combinations of the complex modes in the form
(1) 2 hn (kr) m Pn (cos" )e jm# , n $!, m $! and m < n (2) ! hn (kr)
(3.12)
where
(1) hn (x) = jn (x) + jnn (x) =
% ! " $ J n + 1 (x) + jN n + 1 (x) ' 2x # 2 & 2
(3.13) (3.14)
(2) hn (x) = jn (x) ! jnn (x) =
" 2x
# & % J n + 1 (x) ! jN n + 1 (x) ( $ 2 ' 2
J n (x) and jn (x) are the ordinary and spherical Bessel functions of the first kind, N n (x) and nn (x) are the ordinary and spherical Bessel functions of the second kind
(1) (2) and hn (x) and hn (x) are the spherical Hankel functions of the first and second kind.
Pnm (x) is the associated Legendre function, for each m the function forms a complete
set of orthogonal functions which obey the relation
"
!1
Pnm (x)Pnm (x)dx = '
2 (n + m)! #n 'n 2n + 1 (n ! m)!
(3.15)
From Equation (3.15) the angle functions can be combined into a single function termed a spherical harmonic Ynm defined by
Ynm (! , " ) #
(2n + 1) (n % m)! m Pn (cos! )e jm" 4$ (n + m)!
(3.16)
20
Using the spherical harmonic function we can now write any solution to equation (3.6), with e! j" t implicit, as
&
p(r,! , " , # ) = %
n=0 m=$n
% (A
$ mn n
h (kr) + Bmn jn (kr))Ynm (! , " )
(3.17)
Where the subscript n, is referred to as the order of the spherical harmonic and m the mode. The weighting coefficients Bmn associated with the spherical Bessel functions jn(kr) describe the through-going field due to sources located externally to the arbitrary sphere, whilst the weighting coefficients Amn are associated with the divergent spherical
! Hankel functions hn (kr) , and describe outgoing waves due to sources inside. From an
ambisonic perspective it is assumed a centered point of view with no internal sources, thus only the components of Bmn are considered so that the internal pressure field is in general,
&
p(r,! , " , # ) = %
n=0 m=$n
%A
mn
(# ) jn (kr)Ynm (! , " )
(3.18)
Equation (3.18) is termed the ambisonic signals (Daniel et al., 2003).
21
3.5 Spherical Harmonics and Higher Order Ambisonics

As above in equation (3.16) spherical harmonics can be defined as
(2n + 1) (n % m)! m Pn (cos! )e jm" 4$ (n + m)!
Ynm (! , " ) #
The importance of spherical harmonics is that they can be expanded in terms to give the function of any arbitrary function on the surface of a sphere f (! , " ) , thus
%
f (! , " ) = $
n=0 m=#n
$C
m nm n
Y (! , " )
(3.19)
where Cnm are a set of coefficients of a series that constitutes an infinite and countable set of complex numbers, which fully describe the sound field in the interior of the sphere. Because of the orthogonality of spherical harmonics the arbitrary constants can be found from
Cnm = $ d!Ynm (" , # )* f (" , # )

with # denoting the complex conjugate and where $ is the solid angle defined by (Williams, 1999)
2% 2%
(3.20)
& d! " & d# & sin$ d$

0 0
(3.21)
22
Table 3.1 below shows the spherical harmonics for the zero and first orders, n=0 n=1
Y00 (! , " ) =
1 4#
Y10 (! , " ) =
3 cos! 4# 3 sin ! 8# 3 sin ! 8#
Y1$1 (! , " ) = e$ j" Y11 (! , " ) = $e j"

Table 3.1: Spherical harmonics for zeroth and first orders. It is worth noting that for m = 0,
Yn0 (! , " ) =
2n + 1 Pn (cos! ) 4#
(3.22)
Figure 3.3 shows the values of the spherical harmonics, Re[Ynm (! , " )] (m = 0,1,!, 8) for n=8, projected on to the (y,z) plane, looking down at the positive x-axis.
23
Figure 3.3: Plot to show spherical harmonics of the eighth order in the x,z plane, plotted looking down the x axis, the black lines indicate nodal lines along with the outline of the sphere, taken from (Williams, 1999). From figure 3.2 it can be seen that Y80 has no longitudinal nodal lines, also Y81 has its longitudinal node on the circumference of the circle representing the outline of the sphere. When modeled as multipoles the outgoing radial functions can be characterized by sums of monopoles, used to give the directivity of the spherical harmonics for each order, figure 3.4 shows the directivity pattern of the spherical harmonics from the zeroth order to n = 3. The omni-directional pressure (W), and the first order pressure-gradients (X, Y, Z) shown in figure 3.4, are well known as B-format, termed by Michael Gerzon (1972), they are constituted from the zeroth and first order. More recently Higher Order Ambisonics (HOA) has been more frequently used. One primary reason for this is that the B-format signals have a low spatial resolution, which limits the correct sound field reconstruction to a small listening area, or sweet spot, especially for higher frequencies. The use of HOA extends the B-format to a higher resolution, which ultimately results in enlarging the reproduction area.
24
These HOA multipole models are constructed from distributions of point sources infinitesimally close to the origin, whose amplitude is equal but are opposing in phase. Multipole expansions are similar to the spherical harmonic expansion in equation (3.19), with the outward-going radial function as (Williams, 1999),
&
p(r,! , " , # ) = %
n=0 m=$n
%C
(1) mn n
h (kr)Ynm (! , " ) .
(3.23)
Figure 3.4: Illustration to show the directivity patterns of the spherical harmonics up to order n=3. Adapted from (Moreau, et al., 2006).
25
3.6 Decoding
The reproduction of a target sound field is achieved by means of reproducing the encoded ambisonic signals via an array of loudspeakers. This given array of loudspeakers (L), is positioned uniformly around the listener positioned in the centered listening position. Ambisonics makes the assumption that the loudspeakers are modeled as point sources and that they are positioned suitably far enough away from the listener as to be able to reproduce plane waves. In theory to reproduce the original sound field would require an infinite number of sources, as this is clearly not practical the reproduced field is represented by a truncated series of equation (3.18) to give (Daniel et al., 2003)
! p(r ) =
m=0
%j
jm (kr)
0 $n $ m, ! = 1
! ! BmnYmn (" , # ) .
(3.24)
As the individual loudspeakers are assumed to be producing plane waves, the most logical target sound field to model would be that of a plane wave traveling in a direction towards the listener. The target plane wave K has a pressure field at a point p defined as
p(r,! , " ) = e jK # x
! !
(3.25)
! ! where K ! x is defined as
! ! ! !r K " rx = ( K " x) c c
(3.26)
! and K ! x = cos " , remembering that = k the wave number. c
26
Since the source for our target wave is located at infinity, then the expression for the interior problem is appropriate and thus the incident pressure field can be expressed as presented in equation (3.18). The expression for the incident field is shown where the direction is given by (! K , " K ) to be 2
+
! ! jK ! x
=*
n=0 m=)n
* % 4" j Y &
n
m n
(# K , $ K ) *' j n (kr)Ynm (# , $ ) (
(3.27)
n where the terms $ 4! j nYnm (" K , # K ) *& = Am , the direction of arrival of the plane wave. % '
In order to reproduce the target field the array of loudspeakers L needs to reproduce the correct pressure and direction of the plane wave, the pressure field of the reproduced wave is defined as
L ! ! ! p( x, ! ) = # e jK L " xWL l =1
(3.28)
where WL is the vector of signals. This in essence is the decoding principle, the sound field is reproduced by simply applying real gains to the signal W. In order that the reproduced sound field is accurate to the target sound field it is necessary to
determine the values of WL with the aim of the ideal but impossible solution, p = p .
Assuming the latter to be true we can solve to give a simple expression of the ambisonic component (Daniel et al., 2003)
! ! Bnm = WLi "Ynm (# Li , $ Li )
(3.29)
The workings of this can be seen in Appendix A.1
The author would like to acknowledge Dr. Fazi, for his help in working through the solutions to the decoding process. 27
The design of ambisonic decoding works on a principle that Daniel (2001) calls re-encoding, the aim being to acoustically recompose the ambisonic components
! Bnm at the centre of the listening area. The input signals (W) are then encoded using
spherical harmonics in to a matrix (ci), thus the re-encoding principle can be written into matrix form, with C = [ c1 ! c N ] being the re-encoding matrix (Daniel et al., 2003). This gives the re-encoding principle matrix to be
! B = C!W
where
% ' ' ' ci = ' ' ' ' &
+1 Y00 (! i , "i ) ( * +1 Y11 (! i , "i ) * * #1 Y11 (! i , "i ) * * ! * $ Ymn (! i , "i ) * )
(3.30)
% ' ' " ' B=' ' ' ' &
" +1 B00 ( * " +1 B11 * * " #1 B11 * ! * * "$ Bnm * )
% W1 ' ' W2 W=' ! ' ' ! ' WN &
( * * * * * * )
(3.31)
The decoding matrix aims to derive the signals W, from the original ambisonic signals B, giving
W = D.B
(3.32)
! To ensure B = B , the matrix in equation (3.30) needs to be inverted, therefore the

decoding matrix D is defined as
D = pseudo inv(C) = CT (C ! CT )"1 .
(3.33)
28
In order for the matrix inversion to work the number of signals, and by that the number of sources, i.e. loudspeakers is needed to be equal or greater than
WS 3D = (M + 1)2
(3.34) (3.35)
WS 2 D = 2M + 1
for 3D and 2D respectively. If the correct normalizing convention is applied to the spherical harmonics3 then one can show that (Daniel et al., 2003)
1 T C . N
D=
(3.36)
In order to optimize the spatial properties of the perceived signals, individual gains (gm) can be applied to the respective ambisonic components via a diagonal matrix of gains (GM) to change the decoding matrix,
1 T C .Diag ! g0 gm # " $ N
D=
(3.37)
These decoder gains are examined, and the effects discussed later in this chapter.
See (Daniel et al., 2003) for spherical harmonic normalizing conventions, as the derivation is beyond the scope of this paper. 29
3.7 Gerzons Metatheory

In 1992 Michael Gerzon released a paper entitled a General Metatheory of Audio Localization. In his Metatheory or theory of theories, Gerzon explains how humans use a variety of audio localization methods in a hierarchal format that need to be satisfied in order to perceive a reproduced sound field that is accurate to the original. Apart from when the auditory cues contradict each other the impression of sound direction comes from a majority decision of the senses (Gerzon 1992). Gerzons theories of sound localization relate to the methods detailed in the chapter 2. For each of these theories of sound localization Gerzon goes on to suggest a hierarchal model in which he derives a localization vector for each, the direction and magnitude of the vector derived by Gerzon describe the direction and stability of the perceived sound respectively. Gerzon goes on to state that for a real single point source that the magnitude of the localization vector would be 1, for any value other than 1 then the perceived sound image will move when the listener moves their head (Gerzon 1977). The first two models in Gerzons Metatheory are the Velocity and Energy vector models, these are the simplest and probably most important of all of Gerzons models presented in his 1992 paper (Benjamin, et al., 2006). Gerzon points out that all models of auditory localization, except those for the pinna coloration and impulsive (highfrequency) interaural time delay are special cases of these two models (Gerzon, 1992). It is the Velocity and Energy models proposed by Gerzon that this thesis will focus on. Before this point ambisonics has been explained in terms of a 3D reproduction area, however this thesis is only going to examine 2-Dimensional (2D) reproduction. The 2D reproduction can be thought of as a slice through the vertical axis of the 3D reproduction sphere, the spherical harmonics which describe the surface of the sphere are now simply sine and cosines around the perimeter of the slice. From now on the models described in this thesis will use 2D notation.
30
3.7.1 Velocity Vector Model The first of Gerzons vector models is the Velocity Model and corresponds to the Makita model of localization. Makita examined how sound is localized in a stereo sound field using ITD and wave front anomalies (Makita, 1960). This first degree, first order model based on interaural time differences is used to localize low frequency sound sources, and is prevalent to sources with a frequency below 700Hz. Given a combination of plane waves (Pn , !n ) at the centre point of the reproduction area, r = 0, the velocity vector V is defined as the sum of un vectors weighted by their respective amplitude gains Gn (Daniel et al., 1998), giving
N !1
V=
"G u
n=0 N !1 n=0 n
"G
= rv # u v
(3.38)
" cos! v % where rv % 0, and u v = $ . For a single phantom source Daniel relates the # sin ! v ' &
velocity vector (V) to the B format first order wave field by

X Y , ) 2W 2W
V=(
(3.39)
from this it can be shown that rv = 1 and ! v = " , where the first order ambisonic components of a horizontal encoding of a single source (P! ,! ) are defined by
"W = P! & $ $ $ $ # X = 2P! cos(! ) ' . $ $ $ $ %Y = 2P! sin(! ) (
(3.40)
31
3.7.2 Energy Vector Model As the frequency of a target source increases the ITD, as discussed in chapter 2, becomes no longer relevant, instead the ILD is used to localize sound sources. Gerzon recognized this and based his second degree, first order model on the spatial energetic distribution of sound sources, which is loosely related to deBoers model of localization (deBoer, 1940), and the ILD. This proposed energy vector model, is similar to the velocity vector model except that the sound amplitude gain Pi from each speaker is replaced by its energy gain between 500 5000Hz. The energy vector E is defined as
N !1
Pi
. The
energy vector models are suggested by Gerzon (1992) to be best applied to frequencies
E=
"G u
n=0 N !1 n=0
2 n n 2 n
"G
= rE # u E
(3.41)
" cos! E % where 0 ! rE ! 1 and u E = $ (Daniel et al., 1998). # sin ! E ' &
Again the energy vector rE needs to equal 1 to perfectly represent the original sound source, a problem lies in the fact that if 2 or more sound sources are used at an equal distance to the listener the resultant energy vector magnitude rE is strictly less than 1 for non trivial input gains. For the energy vector to equal 1 all of the speakers would have to lie in the same direction, which is contrary to the assumption of Gerzons theorem of a regular loudspeaker layout (Gerzon, 1992).
32
3.8 Decoding Gains

As shown in Equation (3.36), the decoder matrix is defined by
1 T C . N
D=
While this decoding matrix holds true for reproduction of a target sound field, it is only accurate up to a limiting frequency, which is dependent on the size of the listening area (Daniel et al., 2003). Above this limiting frequency other decoding solutions are required, and are achieved by applying suitable gains (gm), before processing the basic decoding. Gerzons Energy Vector model as discussed previously is based around the localization principle of interaural level differences, and as the magnitude of the Energy Vector cannot equal 1, an optimizing high frequency decoding matrix is required to find a set of correcting gains that maximizes rE, this decoding style is termed max rE decoding. Given a regular polygon layout, we find the maximum value of rE to be (Daniel et al., 1998)
rEmax = cos
! 2M + 2
(3.42)
with the optimizing gains (gm) to be equal to

m" 2M + 2
gm = g0 ! cos
for m = 0,1,..., M
(3.43)
33
The value of g0 is set according to the decoding criteria, for velocity decoding g0 = 1. For energy decoding in order to preserve energy the decoding gains are set to
g0 =
N . (M + 1)
(3.44)
Figure 3.5 shows an illustration of the difference of individual speaker gains for the different decoding styles. The size of the large arrows indicate the magnitude of the gains applied to the individual speakers, while the colours, red and blue indicate a positive or negative phase respectively. The black arrow indicates the direction of the incident wave, in this illustration 60.
Figure 3.5: Illustration of decoder style speaker gains, the large arrows indicate the magnitude of the speaker gains, while the colour indicates phase, red positive, blue negative. The velocity decoding method is shown on the left and the energy decoding method on the right.
34
3.8.1 Equivalent Panning Functions By combining the encoding equation (3.29) and the decoding gain corrected matrix (3.37), one can derive an equivalent panning function G(&) as given by Daniel (2001) to be
M & 1# g0 + 2 " gm cos(m! )( % ' N$ m =1
G(! ) =
(3.45)
such that loudspeaker i is fed with the signal Wi = W .G(! ) , where the angle & is ! ! defined as the angle between the source direction ui and the loudspeaker uW to give
! ! ! = arccos(uW . ui ) .
(3.46)
3.9 Gerzons Vienna Decoder

In his Metatheory Gerzon presented the idea of a decoder that combined both the velocity and the energy decoders into one single decoder, an optimum decoder that would allow for a wider range of accurate frequency reproduction of the target sound field. The Vienna decoder works by splitting and filtering the input signals into high and low pass frequency content before applying the relevant decoding styles and recombining.
35
3.10 Aims of Thesis

The review of the current literature raises two questions, firstly are Gerzons three decoder designs, the energy decoder, the velocity decoder and a combined decoder, able to reproduce sound fields where a stimulus can be accurately localized? And secondly are the velocity and energy decoding methods limited to the working frequency ranges suggested by Gerzon? In order to test these theories decoders will need to be designed which meets Gerzons criteria laid out in his 1992 Metatheory, these decoders will then form the basis of a subjective listening experiment in order to test the claims made by Gerzon. Before a listening test is designed the first stage is to design simulations to test the reproduction capabilities of ambisonics, and specifically Gerzons ambisonic system designs.
36
Chapter 4
Simulations
4.1 Introduction to Chapter
Before Gerzons decoder designs are tested subjectively, the first logical step is to simulate the reproduced ambisonic target sound field using computer software. The aim of the simulations is to test the velocity and energy decoding methods and analyze the resultant computed sound fields for a variety of frequencies. Thus testing the limitations of the ambisonic systems with the end goal of helping to design a suitable system for a subjective listening test. The model from which the simulations are constructed and implemented was built using MathWorks MatLab, the program is described as a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation (MathWorks, 2010). The main variable aspects of the model will be the frequency of the target wave, the number of speakers used for the reproduction system and the ambisonic order of the system. The first of these variables is the most important to simulate as the resultant sound field at different frequencies will show the limitations of the velocity and energy decoding methods as theorized by Gerzon (1992). The other two variables will give an idea into the system requirements to reproduce a target sound field accurately for a listener in the design of the subjective experiments. The model will be constructed using the underlying theory laid out in the previous chapter and based upon the works of Jerom Daniel (2001). The MatLab code used for the simulations can be seen in Appendix C.
37
4.2 Ambisonic Decoder Model

The ambisonic model was computed by plotting the pressure of the sound field on a two dimensional grid, loudspeakers were placed at a radius r from the origin and positioned in an equally spaced, regular layout. The loudspeakers were modeled as point monopoles, these omni-directional pulsating spheres were also assumed to be radiating into a free space, i.e. there were no reflections present, the reason for this assumption was to simplify the model. The model is further simplified by assuming that sound travels at exactly 343ms-1 and that the sound was not attenuated by the medium it was traveling through, in this case air. The modeled ambisonic sound field can be described as the sum of the pressures from the point sources, or loudspeakers as
P = ! PL
l =1
(4.1)
whereby defining the positions of the loudspeakers in two vectors as
X L = [!]
and
(4.2) (4.3)
YL = [!] .
By using equations (4.2) and (4.3), the pressure from each loudspeaker can be defined at each point on a grid as
PL (x, y, ! ) =
e" jkR L Gm 4# R L
(4.4)
38
Where RL is a Matrix containing the distances from the loudspeakers to each point on a grid, defined as
1 2
R L = "(x ! X L ) + (y ! YL ) $ # %
2 2
(4.5)
In equation (4.4), gm is a vector of optimizing gains, used to change the style of the decoding, for velocity decoding gm=1, where as for the energy decoding method gm is defined in equation (3.45) to be
m" 2M + 2
gm = g0 ! cos
for m = 0,1,..., M
where
g0 =
N (M + 1)
39
4.3 Frequency Response Model

To test Gerzons decoder design theory the ambisonic sound field simulation model was used to compare the reproduced sound field at set frequencies noted by Gerzon in his Metatheory (Gerzon, 1992), using both the velocity and energy decoding methods. The frequencies to be compared are 250Hz, 700Hz, 2500Hz and 4000Hz, figures 4.1 to 4.4 show the velocity decoding method for the given frequencies, while figures 4.5 to 4.8 show the frequencies for the Energy decoding method and figures 4.9 to 4.12 show the target sound fields for the respective frequencies. The reproduced sound field was modeled using a fifth order ambisonic system. This was to allow for a larger reproduction area, and using 360 loudspeakers placed in a circle, one at each degree around the circle. The target sound field is a plane wave which is incident at 60. The frequencies 250Hz, 700Hz, 2500Hz and 4000Hz were chosen as Gerzon states that for 250Hz the energy decoding method is not as capable at reproducing the target sound field, similarly the velocity decoding method is not suited for reproducing 2500Hz and above, as being based on the principles of ITD the wavelengths are too short to diffract around the head. 700Hz is suggested as a transition point so both decoding methods should work equally well. 4000Hz is Gerzons suggestion for the upper limit for the ambisonic system (Gerzon, 1992).
40
(4.1)
(4.5)
(4.9)
(4.2) (4.6) (4.10) Figures 4.1 to 4.12: Decoder frequency comparison simulation. Figures 4.1 4.4 Velocity Decoding, Figures 4.5-4.8 Energy Decoding, Figures 4.9-4.12 Target Sound field for 250Hz, 700Hz, 2500Hz and 4000Hz respectively.
41
(4.3)
(4.7)
(4.11)
(4.4) (4.8) (4.12) Figures 4.1 to 4.12: Decoder frequency comparison simulation. Figures 4.1 4.4 Velocity Decoding, Figures 4.5-4.8 Energy Decoding, Figures 4.9-4.12 Target Sound field for 250Hz, 700Hz, 2500Hz and 4000Hz respectively.
42
From Figures 4.1 to 4.12 it can be noted that as the target plane wave increases in frequency the accurate area of the reproduced sound field becomes smaller and smaller. Also as the frequency increases and the wavelength becomes shorter the width of the reproduced plane wave becomes narrower. This indicates an increase in directionality, and thus an increase in the ability to be localized subjectively by an individual, albeit in a smaller sweet spot. From the simulations it can also be seen that the velocity decoding method gives a larger reproduction area than the energy decoder up to 700Hz, which is most noticeable comparing the simulations for 250Hz, figures 4.1 and 4.2. This corroborates with Gerzons hypothesis laid out in his Metatheory (Gerzon, 1992). The directionality is caused by the different decoding methods; a polar plot showing the directionality is shown in figure 4.13. At 700Hz the difference in reproduction area is less noticeable, which is the frequency area predicted to be a crossover point for the two decoders and for 2500Hz and 4000Hz the energy method gives a wider reproduction area.
Figure 4.13: Polar plot to show directionality of Velocity decoding (Red) and Energy decoding (Blue) for a 1st order system. Figure 4.13 shows that the velocity decoding method produces a hyper-cardioid polar pattern, while the energy decoding method produces a cardioid pattern for a 1st order ambisonic system. 43
At the higher frequencies it appears as if the energy decoding method reproduces a more symmetrical sound field, and while the velocity decoding simulations are more directional towards the origin the energy method is able to recreate the target wave across the whole sound field. This however may lead to front-back ambiguities for a listener using the energy decoding method as more points around the listener are symmetrical.
4.4 Increasing Ambisonic Order

To increase the subjective localization of a sound, higher order ambisonic components are used to increase the directivity of the reproduced sound field, as the orders increase the directivity narrows, as shown in figure 4.14 which shows the directivity for the velocity and energy methods.
44
M=1
M=2
M=3
M=4
M=5
M=15
Figure 4.14: Polar plots to show how directivity increases when increasing order from M=1 to M=15 for both Velocity Decoding (Red) and Energy Decoding (Blue).
45
4.5 Number of Loudspeakers

The last variable to simulate using the model is the number of loudspeakers required to accurately reproduce a target sound field for a central listener, this will give an idea into the number of loudspeakers required when designing an ambisonic system for subjectively testing source localization. The model was run using a frequency of 700Hz, a loudspeaker radius of 2m and a 3rd order energy decoding method as it has a narrower reproduction area than the velocity method. The simulations compared the reproduced sound field from 4, 6, 8, 10, 12 and 20 loudspeakers, all positioned in a regular polygon layout. The results of the model are shown in figure 4.15.
46
L=4
L=6
L=8
L=10 L=12 L=20 Figure 4.15: Comparison to show the affects of changing number of loudspeakers (L). 700Hz target wave incident at 60, reproduced using 3rd order Energy decoding.
47
From figure 4.15 it can be seen that as the number of speakers (L) is increased from 4 speakers to 8, the direction of the wave reproduced becomes the same as the target wave, which is incident at 60. 6 loudspeakers is shown to be the minimum number required to reproduce the target plane wave, however the reproduced field for L=6, shown in figure 4.15 may be misleading as both it and L=12 have a loudspeaker placed inline with the direction of the target wave, figure 4.16 shows the reproduced field for L=6 for comparison, when the target wave is incident at 40.
Figure 4.16: Reproduced field for a target plane wave incident at 60, using 6 loudspeakers and 3rd order Energy decoding.
Daniel (Daniel, et al., 1998) states that the minimum number of speakers required to accurately reproduce a target sound field relates to the ambisonic order used when decoding, giving the minimum number to be
L ! 2M + 2
(4.6)
where L is the number of loudspeakers required and M the ambisonic order used. So for the simulations in figure 4.16 that use a 3rd order decoding the minimum number of speakers required is 8.
48
Figure 4.15 shows that as the number of speakers increases beyond that of the minimum detailed in equation (4.6), the sound field becomes over developed and so only a single narrowing column is accurately reproduced in the centre of the sound field, this is most noticeable for L=20.
49
Equation Chapter (Next) Section 1C
Chapter 5
Experimental Procedure
5.1 Introduction to Chapter
Following on from the simulations, the next step is to subjectively test the ambisonic decoder designs. This chapter introduces and discusses the parameters, which influence the final design of the listening tests to be conducted. The purpose of the listening test is to provide quantifiable data, which corroborates the theory and results of simulations laid out in the earlier chapters. The experiment will subjectively test each of Gerzons decoder designs to compare the subjective accuracy of each decoder.
5.2 Introducing the Response

Listening tests are described as an arduous activity in (Bech and Zacharov, 2006), in which a subject is presented with a stimulus of sound, termed the auditory event, to which they provide an answer to the experimenter. This answer is called the response attribute and is purely subjective involving the subjects perception, opinion, emotional state and background experience. Due to the wide nature of the response it is divided into two parts, the perceptual and the affective. 5.2.1 Perceptual Response The perceptual response is composed of the events, which take place inside the mind of the listener. These events are a number of individual auditory attributes where each attribute represents a specific impression, for example the loudness of a sound. For each of these attributes it gives rise to a sensorial strength that depends on the magnitude of the stimuli and the physical characteristics of the hearing system. It is assumed that it is possible to quantify the sensorial strength using standard methods (Bech and Zacharov, 2006). 50
5.2.2 Affective Response The second step in the process of the response attribute is the formation of an overall impression of the stimuli based of a combination of individual attributes of the cognitive factors, which include the expectations of the listener, their emotional state and any previous experience with that type of stimuli. The combined impression forms the basis for an assessment of the degree of liking or disliking a sound (Bech and Zacharov, 2006). The affective response will not be considered in the design of the listening test. To the authors knowledge no previous tests have be conducted into the subjective localization of ambisonic decoders, however there have been subjective experiments into the effects of ambisonic orders (Bertet et al., 2007) and experiments into objective ambisonic localization using binaural KEMAR recordings (Carlsson, 2004).
5.3 Design of Listening Test

The design of the listening test can be separated into two main known variable groups, the independent and the dependent. An important goal for the subjective experiment is that the data is objective, meaning that another experimenter could produce a statistically similar data set and conclusions. The two variable groups will influence the degree of objectivity and reproducibility and the variables need to be carefully selected. The independent variables are those, which are defined and controlled by the experimenter, and include the selected loudspeakers, signals and the subjects to name a few. The dependent variables are not controlled by the experimenter and consist of the answers to the experiment, these can be effected by bias and response scales.
51
5.3.1 Independent Variables As mentioned above in order to produce data which can be statistically replicated at a later date and by a third party the independent variables, which the experimenter has control over need to be carefully selected, the following sections highlight and discuss each independent variable in turn. 5.3.1.1 Signal The purpose of the signal is to excite the perceptual difference between the devices under test. On the basis of the known response attributes it is possible to establish classifiers to aid the selection of a signal (Bech and Zacharov, 2006), these include the signal category, time domain characteristics and spectral characteristics. 5.3.1.2 Signal Category A signal can be classified into several different categories including, natural and synthetic sounds, music, noise and speech. The signal presented in the experiment was a Gaussian Distributed White Noise (GDWN), which is a synthetic noise, selected as it has been previously used in psychoacoustic experiments to understand localization systems (Haustein and Schirmer, 1970). It is preferred in psychoacoustics because of its relatively steep slope of its temporal envelope, and, at the same time, its relatively narrow spectral distribution (Fastl and Zwicker, 2007). As the category of the signal is uncommon to most participants, the noise was presented to the participant before each experiment to help familiarize themselves with the stimuli.
52
5.3.1.3 Time Domain Characteristics The time domain characteristics of the stimuli are shown in figure 5.1, the signal consists of GDWN bursts as was used in an experiment by Boerger (1965). Each stimuli consisted of 5 100ms bursts with 100ms of silence in between each burst, The bursts of GDWN were followed by 10 seconds of silence in which the participant was given the chance to think about the stimuli and record their answer.
Figure 5.1: Time domain characteristic of the 5 Broadband GDWN bursts used in the experiments. 5.3.1.4 Spectral Characteristics In order to test the subjective localization of different ambisonic decoders 4 different signals were used, 3 of which were bandpass filtered using a 100Hz passband width around centre frequencies of 250Hz, 700Hz and 2500Hz. The 4th and 5th signals were broadband, but signal 4 was split and passed through low and high pass filters before being recombined as in Gerzons design for the Vienna decoder (Gerzon, 1992).
53
The filters, 2nd order Butterworth filters, were designed in MatLab using the filter toolbox and the magnitude response of the filters can be seen in figures 5.2 to 5.5, figure 5.6 shows a representative phase response of one of the filters. The MatLab commands can be seen in appendix C. The 2nd order Butterworth filters were selected due to their linear phase response, which is the most important factor when selecting filters for ambisonic reproduction (Heller et. al, 1998).
Figure 5.2: Magnitude response of 250Hz 2nd order Butterworth filter design.
54
Figure 5.5: Magnitude response of low and high pass 2nd order Butterworth filters, with crossover at 700Hz.
55
Figure 5.6: Phase response of 2500Hz 2nd order Butterworth filter design. The spectral characteristics of the 250Hz, 700Hz, and 2500Hz filtered and recombined broadband Gaussian distributed white noise signals are shown in figures 5.7 to 5.10
Figure 5.7: Power Spectral Density of 250Hz Stimuli.
56
Figure 5.8: Power Spectral Density of 700Hz Stimuli.
Figure 5.9: Power Spectral Density of 2.5KHz Stimuli.
57
Figure 5.10: Power Spectral Density of the recombined Broadband Stimuli. The experimental stimuli were created using MatLab and Apples Logic Pro software before being exported as .wav files. The individual stimuli were then loaded into Adobe Audition and pre-cued before the experiment. 5.3.1.5 Reproduction System From the results of the simulations conducted in the previous chapter it was decided that a 3rd order ambisonic decoder would be used to reproduce the stimuli for the experiment, the reason behind this was that the experimenter wanted to use a higher order ambisonic system to increase the accurate reproduction area, yet not one so refined and precise that the differences between the two decoders were not noticeable. Bertets (2009) PhD thesis investigated the subjective difference between ambisonic orders, and from those results a 3rd order system was selected. Using equation (4.6) which describes the minimum number of speakers for a given order as
L ! 2M + 2
58
A 3rd order system (M=3) would require the minimum number of speakers (L) to be 8, from the simulations it was shown that this would provide a sufficiently large reproduction area in the centre of the array. The 8 speakers were set up in the corners of a regular octahedron, at a radius of 2m from the centre. The speakers used were KEF HTS3001, and each one was placed on top of a speaker stand so that the centre of each of the main drivers was 1.2m from the floor, this was measured as the height of a 5 10 humans ear when sat in the chair positioned in the centre used for the experiment. Each of the 8 speakers were fed from a separate pre-cued channel in the multi-track computer software, the stimuli on the computer were sent via a MADI output to a RME ADI 8 DS DAC converter, where each channel was inputted to a separate channel of the power amplifier. The computer, audio interface and amplifier were positioned under a desk in the corner of the anechoic chamber, before being surrounded by fiberglass wedges to try and block and absorb the noise of the computer fans and switches, thus stopping them from distracting or confusing the participant during the experiment. A diagram of the reproduction system can be seen in figure 5.11. Information on the speakers used for the experiment can be found in appendix B.
59
Fiberglass wedges
2m Computer, ADC & Amp Curtain Chair
Figure 5.11: Diagram of reproduction system, set up in the ISVR large anechoic chamber. 5.3.1.6 Listening Room The experiment was set up in the large anechoic chamber at the Institute of Sound and Vibration Research. The anechoic chamber is a 9.15m x 9.15m x 7.32m box within a box, isolated from the outside via an air gap all around. The chamber is supported on isolation mounts and the concrete walls are 0.35m thick. The internal surfaces of the chamber are covered with glass-fiber wedges that extend 0.9m into the room, resulting in a useable space of 295m3 that exhibits free-field conditions down to 80Hz (Institute of Sound and Vibration Research Consulting, 2010). Figure 5.12 shows a photograph of the anechoic chamber.
60
Figure 5.12: Photograph of the large anechoic chamber at the ISVR. The large anechoic chamber was used to remove reflections that may confuse the participant as to the direction of the stimuli. 5.3.1.7 Calibration To calibrate the system a Brel and Kjaer Type 4189 microphone was placed in the centre of the reproduction system at a height of 1.2 meters from the floor of the anechoic chamber. The microphone was placed so the diaphragm was parallel to the floor, with an omni-directional pick up pattern. A 5 second test tone of 200Hz was sent to each speaker in turn and the signal from the microphone recorded into Adobe Audition via an RME ADI 8 DS ADC audio interface. The recorded signals were placed on adjacent tracks and magnified to see the waveform as shown in figure 5.13.
61
Figure 5.13: Screenshot of recorded calibration tones As shown in figure 5.13, when magnified the individual channel waveforms line up with each other, thus showing that they are all in phase, and more importantly each speaker is equidistant from the microphone. To ensure that each speaker was working correctly the recorded signals were analysed for their spectral content, an example can be seen in figure 5.14. The individual levels of each channel were then matched so that each speaker was the same; finally the overall level was adjusted so that it did not exceed 60dB as per the health and safety requirements.
Figure 5.14: Spectral analysis of channel 2 recorded calibration tone.
62
5.3.1.8 Subjects The subjective listening test used volunteers who were recruited via a distributed email and by word of mouth. A mixture of experts, defined as those who had participated in listening tests before (Bech and Zacharov, 2006), and non-experts were selected for the experiment. The participants were screened by way of a questionnaire prior to the experiment for any condition that may affect their auditory response, although their hearing response was not measured. The 11 participants aged in range from 23 54, and were a mix of male and female. 5.3.1.9 Pointing Methods Evaluating sound localization is not an easy task, the reporting method should introduce as little bias as possible in the test to be valid. The study used a method whereby the participant needed not to indicate to a third party by pointing; instead they recorded their answer on a form provided. Other experiments such as those by (Makous & Middlebrooks, 1990) and (Bronkhorst, 1995 in Bertet et al., 2007) used a head tracker, the listener had to point to the stimuli with the head which was then recorded by a tracker to try and reduce any bias. 5.3.2 Dependent Variables The dependent variable is the answer given in response to the stimuli, the answer given by the participants depends heavily on the question being asked of them, and so it is important that the participants understand the aims of the experiment and that they are comfortable answering the question posed, what is the direction of the stimuli being presented? To help the participant understand what they are being asked to answer, they are presented with a handout, which details the experiment and the answering procedure; they are also walked through an example of the experiment. The participants were provided with an answer sheet on which to record their answers, an indirect method of evaluation, thus reducing the chance of a reporting bias due to a third party recording the answers. 63
5.3.2.1 Measurement Scale A direct scale was used to aid the participants method of evaluation to report the direction of the stimuli. The direct scale gives the participants an established scale from which to record their answers, assigning a value of the scale to a stimulus. Another reason for using a scale was to help avoid confusion or bias, as the direction of the stimuli were opposite to those on a compass, i.e. 90 was counter-clockwise, the convention in ambisonics, rather that clockwise as most of the subjects would have been familiar with. The scale used in the experiment consisted of 72 points, with a regular separation of 5 to provide a sufficient resolution; the 72 points formed a complete circle around the participant. 5.3.2.2 Bias Effects Bias effects are related to all forms of scaling of subjective impressions, and are also related to the environment where the experiment is being performed. Bias effects need to be considered carefully before performing an experiment, some bias effects are described below but a full discussion of bias effects is beyond the scope of this paper, however a comprehensive description of bias effects can be found in (Poulton, 1989). 5.3.2.3 Contraction Bias This type of bias is caused by a subjects tendency to be conservative so that large differences are underestimated and small differences are overestimated, and thus the reported range is different to the range of the stimuli (Bech and Zacharov, 2006). To minimize this bias effect the stimuli will be presented in a random order so that the participant cannot gauge their response to the stimuli on the previous answer.
64
5.3.2.4 Visual Bias In order to reduce visual biases an acoustic curtain was erected in between the participant and the loudspeaker array, this removed and visual cues for the participant to focus on and gave a surface for the stimuli scale to be located. An experiment by Toole has shown that being able to see a loudspeaker has a significant effect on the result of the experiment (Toole, 1994). Also in order to prevent bias due to fatigue and loss of concentration the comfort and duration of the listening test has to be considered. It was decided that the total experiment should take no longer than 30 minutes and that there would be a short break between each individual procedure.
5.4 Experimental Procedure

The experimental procedure consisted of 5 experiments, each with 5 directions in which stimuli were presented to the participant, these were kept constant for each experiment and each position was repeated twice giving a total of 10 stimuli for each decoder design. The five directions the stimuli were presented from were 50, 60, 125, 275 and 340. To try and reduce bias effects due to familiarity of the order of presentation of the stimuli, the procedure was randomized; the running order for each procedure can be seen in tables 5.1 5.5 below.
65
Broadband Noise Stimuli Number 1 2 3 4 5 6 7 8 9 10
Test 1 Angle (degrees) 60 340 60 50 275 125 340 50 125 275 Decoding Style Combined Combined Combined Combined Combined Combined Combined Combined Combined Combined
Table 5.1: Test 1 Running Order Broadband Noise, Combined Decoding.

250Hz Stimuli Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Test 2 Angle (degrees) 340 50 125 60 125 275 340 50 60 340 60 50 125 275 275 340 125 50 275 60 Decoding Style rV rE rV rV rE rE rE rV rV rV rE rV rE rV rE rE rV rE rV rE
Table 5.2: Test 2 Running Order 250Hz Stimuli. 66
700Hz Stimuli Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Test 3 Angle (degrees) 125 60 275 125 50 275 340 60 125 60 50 340 60 275 50 340 125 50 275 340 Decoding Style rE rV rV rE rE rV rV rV rV rE rV rV rE rE rE rE rV rV rE rE
Table 5.2: Test 3 Running Order 700Hz Stimuli.
67
2500Hz Stimuli Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Test 4 Angle (degrees) 275 125 50 340 275 50 60 125 50 340 60 125 340 60 275 125 340 50 275 60 Decoding Style rV rE rE rV rE rV rE rV rE rE rE rE rV rV rV rV rE rV rE rV
Table 5.2: Test 4 Running Order 2500Hz Stimuli.
68
Broadband Stimuli Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Test 5 Angle (degrees) 340 60 125 340 60 275 125 50 340 275 50 60 125 50 275 125 340 50 275 60 Decoding Style rE rE rE rV rV rV rE rE rV rE rV rE rV rE rV rV rE rV rE rV
Table 5.2: Test 5 Running Order Broadband Stimuli.
69
5.4.1 Experiment List Experiment 1 - tested the localization of a combined decoder using broadband white noise. Experiment 2 - band pass filtered white noise centered on 250Hz, for both the max rV and max rE decoders, giving a total of 20 stimuli. Experiment 3 - band pass filtered white noise centered on 700Hz, for both the max rV and max rE decoders, giving a total of 20 stimuli. Experiment 4 - band pass filtered white noise centered on 2.5KHz, for both the max rV and max rE decoders, giving a total of 20 stimuli. Experiment 5 - broadband white noise, for both the max rV and max rE decoders, giving a total of 20 stimuli. The results of the listening tests are detailed in the next chapter.
70
5.5 KEMAR Measurements

A G.R.A.S KEMAR dummy head and torso simulator was used to calculate the interaural level and time differences that were reproduced inside the anechoic chamber during the test procedures and the recorded data then used to compare to calculated control values. The KEMAR system is based on the worldwide average human male and female head and torso dimensions and meets the requirements of ANSI S3.36/ASA58-1985 and IEC 60959. The manikin simulates the changes that occur to sound waves as they pass a human head and torso such as the diffraction and reflection around each ear. The head and torso simulator (HATS) was set up in the centre of the reproduction area with the microphones located at the same height as the centre of the loudspeaker drivers. The test procedure was then carried out and the two signals from the binaural microphones recorded using Adobe Audition. Fact sheets for the G.R.A.S KEMAR system can be found in Appendix B
71
Equation Chapter (Next) Section 1C
Chapter 6
Results
6.1 Listening Test Results
The results of the subjective tests carried out as detailed in the previous chapter were collated into tables for each of the five experiments, and the scale response was converted into the perceived direction of the stimuli presented. The results were then sorted into decoding methods for the last 4 experiments, the subjective responses of the experiments are shown below in tables 6.1 to 6.5. Experiment 1 Results 50 60 125 275 45 40 35 200 45 45 45 235 45 45 65 235 45 50 65 250 45 55 70 250 45 55 90 255 50 55 95 255 50 55 115 265 50 60 115 265 50 60 115 270 50 65 115 270 50 65 120 270 55 65 120 270 55 65 120 275 55 65 120 275 60 70 130 275 60 70 130 275 60 75 130 275 65 80 130 275 65 85 135 280 65 90 135 295 70 110 140 295 Table 6.1: Experiment 1 (Broadband Stimuli) Results
340 310 315 320 325 325 330 330 335 335 335 335 340 340 345 345 345 345 345 350 350 360 360
72
50 40 45 45 45 50 50 55 55 55 55 55 55 55 60 60 65 65 70 90 100 105 125 73
Experiment 2 Results Velocity Decoding 60 125 275 340 55 35 255 180 60 45 260 255 60 45 260 280 60 60 265 290 60 60 265 320 60 60 270 335 60 60 270 335 65 65 270 335 65 65 270 335 65 75 270 335 65 80 270 340 65 80 270 340 65 80 270 340 65 85 270 340 65 85 275 340 70 100 275 340 75 105 275 340 75 115 275 340 85 120 275 340 95 120 280 345 100 120 285 345 110 140 285 345 Table 6.2: Experiment 2 (250Hz Bandpass Filtered Stimuli) Results
50 25 30 30 35 35 35 35 35 35 35 35 35 40 45 45 45 60 80 90 130 135 155
Experiment 2 Results Energy Decoding 60 125 275 35 25 235 35 30 245 35 30 260 40 30 265 40 30 270 40 35 275 40 35 275 40 40 280 45 40 280 45 40 280 45 40 280 45 40 285 45 45 285 50 50 285 50 55 290 50 55 290 60 65 295 75 70 295 80 110 295 105 115 300 110 115 305 135 125 305
340 180 190 240 335 340 340 340 340 340 340 345 345 345 345 345 345 345 345 350 350 350 355
50 40 45 45 45 45 50 50 50 50 50 50 50 55 55 55 55 55 60 60 65 120 130
50 35 35 35 40 40 40 40 40 45 45 45 45 45 50 50 50 50 75 90 115 135 135
340 180 185 190 195 330 335 340 340 340 340 340 340 340 345 345 345 345 345 345 345 345 345
74
50 35 35 35 35 40 45 45 45 45 50 50 50 55 55 60 60 60 60 105 120 135 140
50 30 35 35 40 45 45 45 45 50 50 50 50 55 60 65 85 85 100 125 145 150 150
340 165 170 175 180 190 200 210 320 325 335 335 335 340 340 340 340 345 345 345 345 355
75
50 40 45 45 45 45 50 50 50 50 50 50 50 50 50 50 50 50 55 55 85
Experiment 5 Results Velocity Decoding 60 125 275 340 50 55 250 190 50 60 250 315 50 75 255 315 50 115 255 325 55 120 260 325 55 120 260 335 55 120 265 335 55 120 265 335 55 120 270 340 55 125 270 345 60 125 270 345 60 135 270 345 60 135 275 345 60 135 275 345 65 135 275 345 65 135 275 345 65 135 280 345 75 140 280 345 75 140 280 345 90 145 280 350 Table 6.5: Experiment 5 (Recombined Broadband Stimuli) Results
50 30 35 35 40 40 40 40 40 45 45 45 45 45 50 50 50 50 50 65 90
Experiment 5 Results Energy Decoding 60 125 275 40 90 250 45 95 250 45 110 250 50 120 250 50 120 255 50 120 255 50 135 255 50 135 255 55 135 270 55 135 275 55 135 275 60 135 275 60 135 275 60 140 275 60 140 275 60 140 285 70 145 285 70 145 295 95 150 295 130 155 300
340 325 330 340 340 340 340 340 340 345 345 345 345 345 345 345 345 345 345 350 350
76
From the raw collated data the median and mode was calculated for each data set and using the raw data box plots were produced. The box plots for the 5 experiments are shown in figures 6.1 to 6.7. The box plots display the 25th and 75th percentiles, shown by the edges of the blue box, the median value displayed as a red line and the lines show values within 1.5 times the interquartile range of the lower quartile. Any values offside of this rage are shown as red crosses.
Figure 6.1: Box plot to show Experiment 1 (Recombined Broadband Stimuli) Results
Figure 6.2: Box plot to show Experiment 2 (250Hz Bandpass Filtered Stimuli) Results 77
Figure 6.3: Box plot to show Experiment 3 (700Hz Bandpass Filtered Stimuli) Results
Figure 6.4: Box plot to show Experiment 4 (2500Hz Bandpass Filtered Stimuli) Results
78
Figure 6.5: Box plot to show Experiment 5 (Broadband Stimuli) Results
6.2 KEMAR Results

The binaural recordings from the KEMAR system were used to calculate the interaural level and time differences experienced for each reproduction angle. Apples Logic Pro 8 software was used to measure both the ILD and ITD for the binaural recordings, not all of the recordings were analysed but instead 3 of the 5 angles of incidence were used; 50, 125 and 275. The ITD was calculated and compared over the 3 angles for test procedures 1, 2 and 5, while the ILD was calculated for test procedures 1, 4 and 5, this gave a representation of the two broadband signals and an appropriate frequency for the human localization methods proposed in the Duplex Theory of Lord Rayleigh as discussed in chapter 2. Graphs showing a comparison of the calculated ILD and ITD can be seen below in figures 6.6 and 6.7 respectively.
79
Figure 6.6: Graph to compare calculated interaural level differences from binaural recordings of test procedure 1, 4 and 5 measured using a KEMAR system.
Figure 6.7: Graph to compare calculated interaural time differences from binaural recordings of test procedure 1, 2 and 5 measured using a KEMAR system.
80
Figure 6.7 also includes a control, which is the calculated ITD for the given angles of incidence using the formula from Lord Rayleighs Duplex Theory for low frequencies, given in equation 2.1 as
" 3a sin ! inc low frequencies $ $ c ITD ! # $ 2a sin ! high frequencies inc $ c %
81
Chapter 7
Discussion
The hypothesis of this project was to study the accuracy of ambisonically reproduced sound fields using decoding principles devised by Gerzon in the early 70s. The main focus of this study was a subjective listening test, detailed in the previous chapters, which was designed to test the localization accuracy of different ambisonic decoding methods. It is in this chapter that the results of the subjective listening test, shown in Chapter 6, will be analysed and discussed, comparing and contrasting the results to the underlying principles of ambisonics and human localization principles. The results of the 5 listening tests can be collated and separated into two main areas of research that try to answer the questions raised from the literature review in the early chapters. Firstly how does the different decoding methods affect the subjective localization accuracy of the reproduced sound field for the 3 frequencies examined, furthermore are the energy and velocity decoding methods best suited for the frequencies stated by Gerzon in his Metatheory (Gerzon, 1992)? The second area of research looks at how the signal which was decoded using Gerzons (1992) Vienna decoder method, that is the signal was split into two, filtered into high and low pass content around 700Hz and then a velocity and energy decoding methods used for the low and high pass signals respectively before recombining, compares to using velocity and energy decoding methods on a broadband signal.
82
The second, third and fourth subjective listening tests were designed to compare the energy and velocity decoding methods for the frequencies of 250Hz, 700Hz and 2500Hz respectively. The results of experiment 2, the study into 250Hz were illustrated in a box plot, which can be seen in figure 6.2. Figure 6.2 shows that for all the angles of incidence presented to the subjects, the velocity decoding method produced a smaller range of answers, bar outliers, for all of the presented angles, with the exception of 125. This result corroborates with Gerzons postulation that the velocity decoding method, based on the interaural time difference, part of Lord Rayleighs Duplex theory (Strutt, 1907), is more accurate at reproducing low frequencies, suggested by Gerzon as up to a crossover area of 700Hz (Gerzon, 1992). Statistical analysis demonstrates that the median answer for stimuli with the velocity decoding method were closer to the true origin for all angles compared to stimuli with the energy decoding. For the origin angle presented to the subjects of 275 the calculated mean answer for both decoding styles equaled that of the median and modal answers, similarly for the angles of 50 and 340 the median answer equaled that of the mode for both decoding styles. The analysis also shows that the stimuli presented at 340 with the velocity decoding gave the smallest range of answers shown by the box plot, yet also the largest amount of plotted anomalies, which could indicate that the results some of the subjects experienced when localizing the stimuli were in fact at a wide range of angles and thus some subjects had trouble locating the origin of the source.
83
The angle of 275 with the velocity decoding was shown to have the smallest range of answers when outliers were included, thus presenting itself as the most accurate angle experienced by the subjects, this could be due to the fact that the localization accuracy of humans is far more effective when a stimuli is presented towards the front than from the sides or rear, as detailed in various experiments such as Bloch (1893), Mills (1958) and Gardner (1968). However if this effect helped the accuracy of localization one would expect 340 or 50 to be more accurate as they are closer to 0 relative to the subject. One possible reason for 50 not producing the smallest range in answers is the inclusion of a stimuli at 60. This may have had a bias effect on the answers if the resolution of the system was not small enough, i.e. the subject cannot discern between 60 and 50 if both reproduced stimuli appear to come from the same origin. Both sets of results for 60 and 50 provided outliers in similar positions, with a skew towards the left of the reproduced origin. The results for stimuli presented from 125 for both the velocity and energy decoding methods were the furthest from the correct answer for both modal and median results, a result which was also seen in the third study examining the decoder performance for 700Hz, the possible reasons for this will be presented later in this discussion. A box plot summarizing the results of the study into decoder performance for the localization of the 700Hz noise bursts can be seen in figure 6.3, from the box plot it can be seen that the results for the velocity decoding style give answers which are slightly more accurate than those decoded using the energy preservation method. The difference, however, is small and the results are close, with the energy method being more accurate at some frequencies. In his Metatheory Gerzon suggested that 700Hz be used as a crossing point, as both were capable of reproducing the target field, between the velocity decoding, used for lower frequencies, and the energy decoding method that was to be used for higher frequencies.
84
Although the two sets of results are close the results for the velocity-encoded stimuli were, as mentioned more accurate. Statistical analysis of the results show that using velocity decoding, for the angles of 50, 275 and 340 the median and modal answers recorded by the subjects were equal to the angles of source origins. Another observation is that for the angle of 60 the median answer for velocity decoding was 10 to the left of the origin, while the energy decoding was 10 to the right, closer to the front of the listener. The average answer given for the energy decoded 60 stimulus was however equal to that of the stimulus origin. The apparent tightening of the perceived stimulus to the front of the listener, i.e. towards 0 experienced for 60 with the energy decoding is a phenomenon that is also present in the study into 250Hz localization for the energy decoded results. One reason for this may be due to the decoding gains and the dominance of ITD at low frequencies. Specifically, ILDs are most pronounced at frequencies above approximately 1.5 kHz because it is at those frequencies that the head is large compared to the wavelength of the incoming sound producing substantial reflection (rather than total diffraction) of the incoming sound wave. Interaural timing cues, on the other hand, exist at all frequencies, but for periodic sounds they can be decoded unambiguously only for frequencies for which the maximum physically-possible ITD is less than half the period of the waveform at that frequency. Since the maximum possible ITD is about 660 s for a human head of typical size, ITDs are useful for stimulus components below about 1.5 kHz (Stern, Weng and Brown, 2006), as is the case for the two experiments.
85
The direction of the reproduced wave front is given by the vector U, defined as the mean of the loudspeaker directions weighted by the associated amplitude, as given in equation (3.38) as
N !1
U=
"G u
n=0 N !1 n=0 n
"G
This is the same as the original event, the apparent speed cU is given by Daniel et al. (1998) to be related to the modulus rU of the velocity vector U so that
c rU
cU =
where c is the "natural" sound speed (about 340 m/s). Since the two studies used frequencies low enough for the localization to be heavily dependent of the interaural time difference, the perceived ITD is given by Daniel et al. (1998) to be,
ITDreproduced = rU ITDnatural
While the energy decoding method is based on localization due to interaural level differences, the perceived stimuli in the two studies are still assessed by the subject using the ITD, due to the gains associated with the energy decoding method, as detailed in chapter 3, the rU for the stimuli using the energy decoding method will always be <1, therefore
cU > c
therefore the reproduced wave will go faster than a natural plane wave, thus the ITD will be shortened leading to a lateralization effect smaller than expected, i.e. the perceived direction shifts towards the median plane.
86
Figure 6.3 shows that for the angle of 340, with both decoding styles the stimuli were perceived and recorded as originating around 160 from the actual origin of the stimuli, shown as outliers on the box plot, this is also shown in figure 6.2. This apparent change in stimuli origin is termed the Front Back ambiguity, and relates to Woodworths cones of confusion (Woodworth, 1938). The cones of confusion are conic surfaces that extend from the ears on which every point the ILD and ITD are equal, leading to points where without head movement it is impossible to determine the direction of a sound (Woodworth, 1938). When the attention is focused to a single plane, i.e. the horizontal plane then a sound originating in front of the subject appears to come from the back, hence the name Front Back ambiguity, and is given as the angle equal to 180 minus the angle between the stimuli origin and the plane intersection axis. As with figure 6.2, figure 6.3 shows that stimuli originating from 125 produced the widest range in answers deemed to lie within 25% and 75% interquartile ranges, with the median answer for the energy decoded stimuli at 125 furthest from the actual origin. The reason for such a lack of accurate localization may be due to the subjects being asked to refrain from head movement during the reproduction of the stimuli, thus not allowing for the resolving of ambiguities that head movements can provide, a practice noted in several experiments by Stevens and Newman (1936). The practice of listening to a sound then turning to face the source and select a number to represent the origin may have added a bias to the experiment due to difficulty of describing an auditory sensation with a visual one (Wallace, 2004). However this may not be the case as in study 4, an experiment into a the localization of a higher frequency stimulus, shown in figure 6.4, although the range of answers given for stimuli originating from 125 is equally large as those given in experiments 2 and 3, the median answers for the velocity decoding methods was far closer to the actual origin, and the median answer for the energy decoding method was that of the origin, 125.
87
A possible answer for this may be that by the time the subjects were involved in study 4, an effect of training may have taken place, whereby the subjects were now more familiar with the types of sounds being presented. Also as the positions did not change in-between the individual experiments that were run sequentially, the subjects may have become familiar with the areas that the stimuli were presented from. From 6.4 it is shown that the range of answers recorded for the stimuli decoded using the energy method are greater than those of the velocity decoding method, however the median and modal answers recorded by the subjects were closer to the origin of the stimuli for the energy method than the velocity decoded stimuli. The velocity method is based on the principles of ITD, the binaural system is completely insensitive to ITD for narrow band stimuli over 1.5kHz, although it does respond to low-frequency envelopes in high-frequency stimuli (Stern, Weng and Brown, 2006). This may be a reason why it is less accurate at the higher frequencies than the energy decoder for equivalent stimuli. From figure 6.4 it can be seen that for the stimuli origin of 340, for both decoding styles the range of answers given is the largest for the study, one reason for this is that a Front Back ambiguity occurred as in the previous two studies, however it was experienced by more of the subjects so not considered to be an outlying answer in the box plot. One reason the angle may have produced more ambiguities than that of lower frequency studies is that the subjects would have been using the ILD method to localize the origin of the stimuli.
88
The ILD is a much more complicated function of frequency, even for a given source position. In a study by Wightman and Kistler (1992), found that the ILDs measured at the eardrums of subjects exhibited much more subject-to-subject variability. In a study by Carlsson (2004) it was shown that the response measured by a KEMAR set up contained spectrally different ILDs, which led to an inaccurate localization of a target sound field. The studies shown in figures 6.2 6.4 demonstrate that as the frequencies increase the median answers recorded by the subjects for both the velocity and energy decoding methods become closer to each other and closer to localizing the origin of the stimuli for all angles of incidence. This shows that as the frequency increases so does the comparative ability to localize stimuli with both methods of decoding, where as for lower frequencies, only the velocity decoding method accurately reproduces the target field allowing for the localization of the incident stimuli. The second main area of study for this dissertation is comparing the subjective localization accuracy of Gerzons proposed combined decoder with a decoder that used only the energy or velocity decoding methods. Figures 6.1 and 6.5 show box plots of the results of the broadband stimulus combined decoding study and the broadband stimulus energy and velocity decoding studies respectively. From figure 6.1 it can be seen that all the median answers recorded for the study were within 5 of the correct stimulus origin, which is the resolution of the measuring scale, apart from 125 which had a difference of 10. The studies shown in figures 6.2 6.4 looking at different frequencies had a similar result where the range of answers for 125 was the largest for the interquartile range, an effect as described previous as the result of disallowing head movements. Figure 6.5 shows a comparison of the energy and velocity decoding methods used in study 5 to examine their accuracy when reproducing a broadband Gaussian white noise pulse train. The box plot shows that the results for both decoding methods are similar, with the 50 velocity decoding stimulus having the smallest range of answers.
89
Statistical analysis of the results show that the median results from the velocity decoded stimuli are equal or closer to the angle of the incident wave than that of the energy decoding methods median answers, however the average answers given for the energy method are closer than those of the velocity method to the stimulus origin. For both decoders the angles of 275 and 340 the median answers equaled that of the modal answers. It can be seen in figure 6.5 that at the angle of 340 with the velocity decoding method a single outlier, this is the result of a Front Back ambiguity as described in previous chapters as an event which only happens at 340, leading to the conclusion that this must be the only angle of incidence in the study which lies on the cone of confusion for some subjects. Comparing the results of the two studies one can see that both the broadband and the combined results provide fairly equal answers, i.e. the combined decoder does not appear to be better than a standard one system decoder as was predicted by Gerzon in his Metatheory and as shown in research by Jerome Daniel (Daniel, 2001), a reason for this may have been due to the reproduction area being too small and so the benefits of the combined decoder were not beneficial over the single decoded stimuli, to test the validity of this claim a simple off-centre experiment could be set up to test the size of the reproduction areas. The results of the broadband stimuli produced more outliers and irregular answers, therefore it could be considered less accurate than that of the combined decoding tests, however the difference is minimal, this is shown in figure 6.6 a graph to compare the calculated ILDs from the KEMAR study. 2 of the 3 frequencies that compared the ILD for the combined decoded stimuli and the energy decoded stimuli are identical. The ITD KEMAR result for the combined study with an angle of incidence at 50 was calculated to be identical to the ideal ITD calculated according to Kuhn (1987), shown in figure 6.7.
90
Figure 6.7 shows that for the broadband study, the calculated ITD for the energy decoding method was for 2 angles greater than that of the calculated control. Similarly the velocity method ITD was calculated to be shorter for 2 of the angles, this is reflected in the results shown in figure 6.5, where the velocity decoded stimuli originate closer to the median plane.
91
Chapter 8
Conclusion
As mentioned previously, the aims of this thesis are to try and establish whether the energy and velocity models laid out by Gerzon in his 1992 Metatheory are correct, specifically whether Gerzons three decoder designs, the energy decoder, the velocity decoder and a combined decoder, are able to reproduce sound fields where a stimulus is accurately localized? And secondly are the velocity and energy decoding methods limited to the working frequency ranges suggested by Gerzon? From the literature and the simulations conducted in chapter 4, two main areas of development arose, with questions to be answered by means of a subjective listening test; Firstly how does the different decoding methods affect the subjective localization accuracy of the reproduced sound field and are the energy and velocity decoding methods best suited for the frequencies stated by Gerzon in his Metatheory? Secondly how does Gerzons Vienna decoder compare to using either velocity or energy decoding methods on a broadband signal? A analysis of the results of the subject listening tests it was shown in the previous chapter that for 250Hz the velocity decoding method was more accurate at reproducing a target sound field than the energy vector method, as was predicted by Gerzon. The accuracy of the velocity decoder was also greater than the energy decoder for 700Hz, although the performance of the two was closer than for 250Hz. It was suggested by Gerzon in his Metatheory that 700Hz be a cross over point as both decoding methods worked suitably well at that frequency, although this does hold true as is shown, the velocity decoding method would be better suited to reproducing these frequencies, and the crossover point could be reset at a higher frequency towards the limits of the ITD, around 1500Hz. An investigation into the crossover point could be conducted as a follow on to the work in this thesis using the same experimental procedures and methodology.
92
The results of the experiment investigating the 2500Hz tone bursts showed that the energy decoding method to be more accurate for localizing high frequency sounds. However the velocity decoding method was only slightly less accurate, as more high frequency content is added to the signal the energy decoding method again out performs the velocity decoding method for reproducing stimuli as is shown in figure 6.5 the broadband experiment and reflected in the KEMAR study. Comparing the results of the Vienna decoder to both the velocity and energy decoding methods shows that the combined Vienna decoder did not provide a significant improvement, with the difference minimal. However the number of outliers was reduced along with the range of answers in the interquartile range, a result reflected in the ILD and ITD calculations from the KEMAR study. All of the experiments adopted a centered listening position and as the Vienna decoder was seemed to be not significantly better at reproducing a target field than the single decoders in the centre of the listening area, a further test would be to set up an experiment to look at the reproduction of the decoder in an off-centre listening position. One advantage of the combined decoder is the increase in reproduction area, and this would provide a more comprehensive experiment comparing the 3 decoders, but was beyond the scope of this thesis. This study made use of an environment that was unfamiliar to most participants, and as such the environment may have had an effect on the subjective results, further testing could be conducted in a listening room environment with a small amount of reverberation, similar to that encountered daily to examine any changes in performance of the decoders. This test would be beneficial due to being the a more natural listening environment of complex sound reproduction systems. In summary the aims of the thesis were met and Gerzons statements evaluated and subjected to subjective and objective testing, in conclusion it was found that Gerzons hypothesis held true regarding the working frequency range and optimum use of the energy decoding method, the two methods were suitable for a cross over point of 700Hz, although this could be altered, and that the velocity decoder was optimum at a 93
frequency of 250Hz. However Gerzons hypothesis did not conclude that the velocity vector would be close to equaling the performance of the energy decoder at reproducing a frequency of 2500Hz, furthermore Gerzons idea that a combined decoder would be more accurate than a single decoder style was shown to be inconclusive from these experiments.
94
References
Abhayapala, T., 2003. Reproduction of a 3D Sound Field Using an Array of Loudspeakers. MSc Project Dissertation, Australian National University. Australia. Bates, E., 2009. The Composition and Performance of Spatial Music. PhD Thesis, Trinity College, Dublin Bamford, J.S., 1995. An analysis of ambisonic sound systems of first and second order. MSc dissertation, University of Waterloo, Ontario, Canada. Bech, S. and Zacharov, N., 2006. Perceptual Audio Evaluation. London: Wiley Benjamin, E., Lee, R. and Heller, A., 2006. Localization in Horizontal-Only Ambisonic Systems. 121st AES Conference San Francisco, USA, 5th 8th October. Bernstein, L.R. and Trahiotis, C., 2002. Enhancing sensitivity to interaural delays at high frequency. J. Acoust. Soc. Am. 112:1026-1036. von Bksy, G., 1930. On the theory of hearing. Phys. Z. 31:824-838. Bertet, S. et al. 2007. INVESTIGATION OF THE PERCEIVED SPATIAL RESOLUTION OF HIGHER ORDER AMBISONICS SOUND FIELDS: A SUBJECTIVE EVALUATION INVOLVING VIRTUAL AND REAL 3D MICROPHONES. AES 30th International Conference, Finland, 15th 17th March Bertet, S., 2009. Formats Audio 3D Hierarchiques. PhD Theses, MEGA, Lyon Blauert, J. 1968. A contribution to the theory of front-back impression in hearing. Proceedings, 6th Int. Congr. on Acoustics, Tokyo, A-3-10. Blauert, J., 1970. An experiment in directional hearing with simultaneous optical stimulation. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Blauert, J., 1974. Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Bloch, E. 1983. Binaural Hearing. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. de Boer, K., 1940. Stereophonic Sound Production. Philips Technical Review, 5:107 144. Boerger, G. 1965. The localization of Gaussian tone bursts. Dissertation, Technische Universitat, Berlin. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. 95
Bronkhorst, A., 1995. Localization of real and virtual sound sources. J. Acoust. Soc. Am. 98(5): p. 2542-2553 in Bertet, S. et al. 2007. INVESTIGATION OF THE PERCEIVED SPATIAL RESOLUTION OF HIGHER ORDER AMBISONICS SOUND FIELDS: A SUBJECTIVE EVALUATION INVOLVING VIRTUAL AND REAL 3D MICROPHONES. AES 30th International Conference, Finland, 15th 17th March. Brown, C.H., Beecher, M.D., Moody, D.B. and Stebbins, W.C., 1978. Localization of pure tones in Old World monkeys. J. Acoust. Soc. Am. 63:1484-1492. Brown, C.H. and May, B.J., 2005. Comparative Mammilian Sound Localization. in Popper, A.N. and Fay, R.R., ed. Sound Source Localization. New York: Springer. Carlile, S., Delaney, S. and Corderoy, A. 1999. The localization of spectrally restricted sounds by human listeners. Hear. Res. 128:175-189, in Foreman, L.K. 2007. Gender differences in sound localization in the horizontal and vertical planes. MSc project dissertation, ISVR, Southampton. Carlsson, K., 2004. Objective Localization Measures in Ambisonic Surround sound. MSc Project Dissertation, Royal Institute of Technology, Stockholm. Damaske, P. and Wagener, B., 1969. Investigations of directional hearing using a dummy head. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Daniel, J. 2001. Reprsentation de champs acoustiques, application la transmission et la reproduction de scnes sonores complexes dans un contexte multimedia. PhD Thesis, Universite Paris, Paris, France. Daniel, J., Nicol, R. and Moreau, S., 2003. Further Investigations of High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging. 114th AES Convention, March 2225, Amsterdam, The Netherlands. Paper 5788. Daniel, J., Rault, J. and Polack, J., 1998. Ambisonics Encoding of Other Audio Formats for Multiple Listening Conditions. 105th AES Convention, San Francisco, 26th 29th September Excell, D., 2003. Reproduction of a 3D Sound Field Using an Array of Loudspeakers. BEng Project Dissertation, Australian National University, Australia. Fahy, F. and Walker, J., 1998. Fundamentals of NOISE and VIBRATION. London: Spon Press Fastl, H. and Zwicker, E., 2007. PSYCHO-ACOUSTICS, Facts and Models, 3rd ed. Berlin: Springer-Verlag Gardner, M.B., 1968. Lateral Localization of 0 or apparent 0- oriented speech signals in anechoic space. J. Acoust. Soc. Am. 44:797-803. 96
Gardner, M.B., 1973. Some monaural and binaural facets of median plane localization. J. Acoust. Soc. Am. 54:1489-1495. Gardner, M. B. and Gardner, R. S., 1973. Problem of localization in the median plane: effect of pinnae cavity occlusion. J. Acous. Soc. Am., 53, 400- 408. Gerzon, M.A. 1972. Periphony (with-height sound reproduction). 72 AES Convention, Munich, 14 -16th March. Gerzon, M. A., 1974. Surround Sound Psychoacoustics. Wireless World, Vol. 80, pp. 483-486 Gerzon, M.A., 1977. Multi-system Ambisonic Decoder, parts 1 & 2. Wireless World, July & August. p. 43 47 & p. 63 73. Gerzon, M.A., 1992. General Methatheory of Auditory Localization. 92nd International AES Convention, Vienna, 24 27 March. Preprint 3306. Gerzon, M.A. 1998, Surround Sound Apparatus. US Patent, Number:5757927. Grantham, D.W., 1984. Interaural intensity discrimination. J. Acoust. Soc. Am. 75:1191-1194. Harris, J.D., 1972. A florilegium of experiments on directional hearing. Acta OtoLaryngol Suppl 298:1-26. in Popper, A.N. and Fay, R.R., ed. Sound Source Localization. New York: Springer. Haustein, B.G. and Schirmer, W. 1970. A measuring apparatus for the investigation of the faculty of directional localization. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Heffner, R.S. and Heffner, H.E. 1982. Hearing in the Elephant: absolute sensitivity, frequency discrimination and sound localization. in Popper, A.N. and Fay, R.R., ed. Sound Source Localization. New York: Springer. Heller, A., Lee, R. and Benjamin, E., 2008. Is My Decoder Ambisonic? 125th AES Conference, San Francisco, 2nd 5th October Henning, G.B., 1974. Detectability of interaural delay with high frequency complex waveforms. J. Acoust. Soc. Am. 55:84-90. Klump, R.G. and Eady, H.R., 1956. Some measurements on interaural time difference thresholds. J. Acoust. Soc. Am. 28:859-860. Kuhn, G.F., 1987. Physical Acoustics and Measurements Pertaining to Hearing. in Yost, W.A, ed. Directional Hearing. New York: Springer. Lungwitz, H., 1923. The discovery of the soul. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. 97
Makita, Y. 1960. On the directional localization of sound in a stereophonic sound field. 12th Meeting of the Technical Committee of the E.B.U., Monte Carlo, October, 1960. Makous, J. C. and Middlebrooks, J. C., 1990. Two dimensional sound localization by human listeners. J. Acoust. Soc. Am. 87(5): p. 2168- 2180 Malham, D., 2003. Space in music music in space. MSc Dissertation, University of York, York. Mallock, A., 1908. Note on the sensitivity of the ear to the direction of explosive sounds. Proc. Roy. Soc. Med. 80:110. Middlebrooks, J.C., Makous, J.C. and Green, D.M., 1989. Directional sensitivity of sound-pressure levels in the human ear canal. J. Acoust. Soc. Am. 86:89-108. Mills, A.W., 1958. On the minimum audible angle. J. Acoust. Soc. Am. 30:237-246. Moore, B.C.J., 1977. Introduction to the Psychology of Hearing. London: Macmillan Press. Moore, B.C.J., 2003. An introduction to the psychology of hearing (5th ed). in Plack, C.J., 2005. The Sense of HEARING. New Jersey: LEA. Moreau, S., Daniel, J. and Bertet, S., 2006. 3D Sound Field Recording with Higher Order Ambisonics Objective Measurements and Validation of a 4th Order Spherical Microphone. 120th AES Conference, Paris, 20th 23rd May Perekalin, W.E., 1930. On acoustical orientation. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Plack, C.J., 2005. The Sense of HEARING. New Jersey: LEA Poulton, E. C., 1989. Bias in Quantifying Judgments. Hove & London: Lawrence Erlbaum Rumsey, F. and McCormick, T., 1994. Sound & Recording an introduction. Oxford: Focal Press. Rumsey, F., 2001. Spatial Audio. London: Focal Press Sandel, T.T., et al. 1955. Localization of sound from single and paired sources. J. Acoust. Soc. Am. 27:842-852. Skudrzyk, E., 1971. Foundations of Acoustics. New York: Springer-Verlag, in Williams, E., 1999. Sound Radiation and Nearfield Acoustical Holography. London: Academic Press Stern, R. M., Wang, De L., and Brown, G., 2006. Binaural Sound Localization, 98
Chapter in Computational Auditory Scene Analysis, G. Brown and De L. Wang, Eds. New York: Wiley/IEEE Press. Stevens, S.S. and Newman, E.B., 1936. The localization of actual sources of sound. Am. J. Psychol. 48:297-306. Strutt, J.W. (Third Baron of Rayleigh), 1907. On our perception of sound direction. Philosoph. Mag. 13:214-232. Thurlow, W.R. and Runge, P.S., 1967. Effect of Induced Head Movements on Localization of Direction of sounds. J. Acoust. Soc. Am. 42:480-488. Toole, F. E., 1994. Loudspeakers and Rooms for Sound Reproduction. J. Audio. Eng. Soc., Vol. 54, 451-476 van de Veer, R.A., 1957. Some experiments on directional hearing. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Wallace, M.T, et al., 2004. Unifying multisensory signals across time and space. Exp Brain Res. 158(2):252-8 Ward, D.B. and Abhayapala, T.D., 2001. Reproduction of a Plane-Wave Sound Field Using an Array of Loudspeakers. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 9, no. 6. Wiggins, B., 2004. An investigation into the real-time manipulation and control of three-dimensional sound fields. PhD Dissertation, University of Derby, Derby. Wightman, F.L. and Kistler, D.J., 1992. Sound Localization. in Yost, W.A., Popper, A.N. and Fay, R.R., ed. Human Psychophysics. New York: Springer. Wilkens, H. 1972. Head-related stereophony. in Blauert, J., Spatial Hearing: the psychophysics of human sound localization. London: MIT Press. Williams, E., 1999. Sound Radiation and Nearfield Acoustical Holography. London: Academic Press Woodworth, R.S. 1938. Experimental Psychology. New York: Holt. Yost, W.A., Wightman, F.L. and Green, D.M., 1971. Lateralization of filtered clicks. J. Acoust. Soc. Am. 50:1526-1531. MathWorks, 2010. MATLAB and Simulink for Technical Computing. [Online] Available at: http://www.mathworks.co.uk/ [Accessed 12th December 2010]
99
Appendix A
A.1 Proof of decoding from wave to matrix, Equation (3.31)
Assuming that p = p , ! ! p( x, ! ) = p( x, ! ),
e jK0 ! x = " e jK L ! x WL ,
Expanding both sides using equation (3.29) gives
! ! ! !
(A.1) (G.2) (A.2)
$ 4! j Y
n n, m L
m n
(" K , # K ) * jn (kr)Ynm (" , # ) =

n L m n
$ W $ 4! j Y
L =1 n, m
(" L , # L ) * J n (kr)Y (" , # )

m n
(A.3) (G.3)
Using the spherical harmonic orthogonality defined in equation (3.20), both sides are multiplied giving,
$ 4! j Y
n n,m L L=1
m n
(" K , # K ) * jn (kr) ' Ynm (" , # )Ynm ' (" , # ) * % & = '
&
$ WL $ 4! j nYnm (" L ,# L ) * J n (kr) ' Ynm (" ,# )Ynm' ' (" ,# ) * %&
n,m &
(G.4) (A.4)
All terms go to zero apart from n=n, leaving
Y (! K , " K )* = # Ynm ' (! L , " L ) *WL '

m' n' L =1
(A.5) (G.5)
# setting Ynm ' (! K , " K )* = Bnm this now equals equation (3.31). '
100
A.2 Computing the energy vector taken from (Daniel et. al., 1998)
Decoding the ambisonic field produced by a single source at ! with the correcting gains {gm, m=0, , M} implies gains Gn for a loudspeaker feeding
M , 1) g0 + 2 ( gm cos $ m (!n " # ) &. for n = 0, ... , N " 1 % '+ N* m =1
Gn =
(A.6)
The energy Vector is computed as
$ N !1 2 ' " Pn cos #n ) " P un 1 & n=0 ) E = n = 0!1 = N !1 & N !1 N 2 2 & 2 " Pn " Gn & " Pn sin #n ) ) % n=0 ( n=0 n=0
N !1 2 n
(A.7)
Developing terms
M 1 2 (g0 + 4g0 ! gm cos % m ("n # $ ) ' & ( N2 m =1
G2 = n
+4
m, p =1
!g
g p cos % m ("n # $ ) ' cos % p ("n # $ ) ') & ( & (
M 1 2 = 2 g0 + 4g0 ! gm cos % m ("n # $ ) ' & ( N m =1
(A.8)
+2
m, p =1
!g
N !1 n=0
g p cos %( m + p ) ("n # $ ) ' + cos %( m # p ) ("n # $ ) ' & ( & (
))
2 N " G = g + 2 " gm 2 n 2 0 m =1
(A.9)
101
2 N $ Gn cos !n = 2g0 g1 cos" n=0
N #1
+$
N #1
n = 0 m, p =1
$g
g p cos %( m + p + 1)!n # ( m + p )" ' & ( + cos %( m + p # 1)!n # ( m + p )" ' & ( + cos %( m # p + 1)!n # ( m # p )" ' & ( + cos %( m # p # 1)!n # ( m # p )" ' & (
(A.10)
Since 1 ! m, p ! M , the assertion 1 ! m + p 1 ! 2M + 1 is always verified. Thus, with the condition 2M+1<N and the property (A.6), the sum contribution of the first two cosine terms in (A.10) is null. The contribution of the last two terms is non-zero only for couples (m,p = m+1) and (m,p = m-1) respectively. With such simplifications, (A.10) can be expressed in the concise form
N "1 n=0
$ Gn2 cos !n =
with similar work it is shown that
N "1 n=0
2 M $ gm gm "1 cos# N m =1
(A.11)
$ Gn2 sin !n =
2 M $ gm gm "1 sin# N m =1
(A.12)
From equations (A.9), (A.11) and (A.12), the following result is given
M
E = rE u! where rE =
2 # gm gm "1
m =1
g + 2# g
2 0 m =1
(A.13)
2 m
102
Appendix B
Data Sheets
Loudspeakers KEF HTS3001SE
Model Design Drive Units HTS3001SE Satellite - Two-Way Bass reflex, Centre - Three-Way Closed box Satellite - 115mm (4.5in.) Uni-Q array with 19mm (0.75in.) aluminium HF, Centre - 115mm (4.5in.) Uni-Q array with 19mm (0.75in.) aluminium HF, 2 x 75mm (3in.) LF Satellite - 2.2kHz, Centre - 500Hz, 2.2kHz Satellite - 88dB, Centre - 90dB Satellite - 70Hz - 55kHz, Centre - 65Hz - 55kHz Satellite - 108dB, Centre - 110dB 8 Ohms Yes Satellite - 1.8litres, Centre - 2.4litres 100W Satellite - 2.0kg (4.5lbs), Centre - 2.6kg (5.8lbs) Satellite - 245 x 125 x 150 mm (9.6 x 4.9 x 5.9 in.), Centre - 130 x 300 x 185 mm (5.1 x 11.8 x 7.3 in.) High gloss black, High gloss silver
Crossover Frequencies Sensitivity (2.83V/1m) Frequency Response (+/3dB) Maximum Output Input Impedance Magnetic Shielding Internal Volume Power Handling Weight Dimensions (H x W x D) Finishes
Serial Numbers of Loudspeakers Used: o 3150481 G Port Blocked o 3150516 G Port Blocked o 3150569 G Port Blocked o 3150476 G Port Blocked o 3150460 G Port Blocked o 3150482 G Port Blocked o 3150767 G Port Blocked o 3150448 G Port Blocked 103
Calibration Equipment
Microphone Type: Brel & Kjaer 4189 Serial Number: 2370983 Preamp Type: Brel & Kjaer 2669L Serial Number: 2370041 Conditioning Amplifier Type: Brel & Kjaer 2960 Serial Number: 2165582
G.R.A.S KEMAR Equipment

Microphone Type: G.R.A.S 26AC Serial Number: Left 56807, Right 56808 Coupler Type: G.R.A.S RA0045 Serial Number: Left 58206, Right 58210 Ears Type: Left - G.R.A.S KB0061, Right KB0060 Serial Number: Left 69957, Right 69939 Power Module Type: G.R.A.S 12A Serial Number: 58675
104
Appendix C
MatLab Code
Simulations
clear all close all N=6; %Number Of Speakers ang=40;% Angle of Incident Wavefront f=700; %frequency R=2; % Speaker Radius M=3; % Order c=343; % speed of sound w=2*pi*f; % angular Frequency k=w/c; % Wavenumber gamma=ang*pi/180; dphi=(2*pi)/N; % Delta Phi, Angle Between Speakers phi=[0:dphi:2*pi-dphi]; % Vector of Speaker Positions a=phi-gamma; % Angular Difference between speaker position and wavefront %Velocity Decoding gv=1; %Max rV Gv=zeros(N,1); Gv=gv; for n=1:M Gv=Gv+gv*2*cos(a*n); end Gv=Gv/N; % Decoding Gains % Energy Decoding g=zeros(M,1); g0=sqrt(2); for m=1:M g(m)=cos(m*pi/(2*M+2)); % Max rE end Ge=zeros(N,1); Ge=g0; for n=1:M Ge=Ge+g(n)*2*cos(a*n); end Ge=Ge/N; % Decoding Gains [X,Y]=meshgrid(-R:0.01:R, -R:0.01:R); %Generate points R m x R m, resolution of 1cm same as speaker radius Xl=cos(phi)*R; % Speaker Positions X and Y Yl=sin(phi)*R; Pe=zeros(size(X)); P=zeros(size(X)); for l=1:N Rn=sqrt(((X-Xl(l)).^2)+((Y-Yl(l)).^2)); % Matrix of distance from point to speaker Pe=Pe+exp(-1j*k*Rn)./(4*pi*Rn) *Ge(l); % Energy Decoding Pressure Field P=P+exp(-1j*k*Rn)./(4*pi*Rn)*Gv(l); % Velocity Decoding Pressure Field end figure(1) pcolor(X, Y, real(P)) shading interp caxis([-1 1]*0.1) axis equal
105
hold on plot(Xl, Yl, 'o') hold off title([num2str(ang),' degrees Velocity Decoding ' Speakers ', 'Order ',num2str(M)]) figure(2) pcolor(X, Y, real(Pe)) shading interp caxis([-1 1]*0.1) axis equal hold on plot(Xl, Yl, 'o') hold off title([num2str(ang),' degrees Energy Decoding Speakers ', 'Order ',num2str(M)])
', num2str(f), ' Hz
', num2str(N),
', num2str(f), ' Hz
', num2str(N), '
figure(3) polar([phi 0],[abs(Gv) abs(Gv(1))], 'r') % add element for plotting hold on polar([phi 0],[abs(Ge) abs(Ge(1))], 'b') hold off title([num2str(ang),' degrees ', num2str(f), ' Hz ', num2str(N), ' Speakers 'Order ',num2str(M)])
',
250Hz Filter
% Butterworth Bandpass filter designed using the BUTTER function. % All frequency values are in Hz. Fs = 48000; % Sampling Frequency Fstop1 Fpass1 Fpass2 Fstop2 Astop1 Apass Astop2 = = = = = = = 150; 200; 300; 350; 60; 1; 80; % % % % % % % First Stopband Frequency First Passband Frequency Second Passband Frequency Second Stopband Frequency First Stopband Attenuation (dB) Passband Ripple (dB) Second Stopband Attenuation (dB)
% Calculate the order from the parameters using BUTTORD. [N,Fc] = buttord([Fpass1 Fpass2]/(Fs/2), [Fstop1 Fstop2]/(Fs/2), Apass, ... max(Astop1, Astop2)); % Calculate the zpk values using the BUTTER function. [z,p,k] = butter(N, Fc); % To avoid round-off errors, do not use the transfer function. Instead % get the zpk representation and convert it to second-order sections. [sos_var,g] = zp2sos(z, p, k); Hd = dfilt.df2sos(sos_var, g); fvtool(Hd)
700Hz Filter
106
% Calculate the order from the parameters using BUTTORD. [N,Fc] = buttord([Fpass1 Fpass2]/(Fs/2), [Fstop1 Fstop2]/(Fs/2), Apass, ... max(Astop1, Astop2)); % Calculate the zpk values using the BUTTER function. [z,p,k] = butter(N, Fc); % To avoid round-off errors, do not use the transfer function. Instead % get the zpk representation and convert it to second-order sections. [sos_var,g] = zp2sos(z, p, k); Hd = dfilt.df2sos(sos_var, g); fvtool(Hd)
2500Hz Filter
% Calculate the order from the parameters using BUTTORD. [N,Fc] = buttord([Fpass1 Fpass2]/(Fs/2), [Fstop1 Fstop2]/(Fs/2), Apass, ... max(Astop1, Astop2)); % Calculate the zpk values using the BUTTER function. [z,p,k] = butter(N, Fc); % To avoid round-off errors, do not use the transfer function. Instead % get the zpk representation and convert it to second-order sections. [sos_var,g] = zp2sos(z, p, k); Hd = dfilt.df2sos(sos_var, g);
Crossover Filter
close all clear all O=2; % Order Cut=700; % Cutoff Freq Fs=48000; % Sample Rate % High Pass [z,p,k]=butter(O,(Cut/(Fs/2)),'high'); % high pass [sos,g]=zp2sos(z,p,k); % convert to sos type Hd=dfilt.df2tsos(sos,g); % create dfilt object %h=fvtool(Hd); %set(h,'Analysis','freq') % Low Pass [z1,p1,k1]=butter(O,(Cut/(Fs/2)),'low'); % high pass [sos1,g1]=zp2sos(z1,p1,k1); % convert to sos type Hd1=dfilt.df2tsos(sos1,g1); % create dfilt object %h1=fvtool(Hd1); %set(h1,'Analysis','freq')
107
HD=dfilt.cascade(Hd, Hd); HD1=dfilt.cascade(Hd1, Hd1); h2=fvtool(HD, HD1); set(h2,'Analysis','freq')
108
Appendix D
Consent Forms
109
Instruction Sheet Ambisonics Sound Localization

General Information Firstly, thank you for participating in this study; you are contributing to an important part of the project. The project in which your are helping is a study into sound localization in ambisonic arrays, the main aims of the project are to explore and investigate different decoding techniques in order to better refine an ambisonic system for accurate localization for sounds in the horizontal plane. The test is structured in a way to avoid fatigue with large gaps between samples for adequate time to record your answer on the sheet provided. You will be aloud to familiarize yourself to the sounds used and also the test procedure will be demonstrated. The sound level of the reproduced sound has been calibrated so that it falls below that recommended by UK legislation, as well as being approved by the Safety and Ethic Committee of the University of Southampton. You are free to opt out of the test at any time during or after the test procedure, without reason or prejudice. Your data will not be passed on to third parties and kept securely. The Listening Test The listening test is about perception of the location of a sound source, it is important to note that there are no incorrect answers and it is not you under test, the human perception to sound is what is being tested. You will sit in a chain in the middle of the room surrounded by a circular curtain, upon the curtain is a series of numbers ranging from 1 to 72, a sound will be presented to you and you have to write down the number corresponding to the position around the circle you thought the sound came from on the answer sheet. Thank you for your time,
Thomas Leach (MSc Sound and Vibration Studies)
110
Questionnaire Ambisonics Sound Localization

Experimenter Thomas Leach (MSc Sound and Vibration Studies) Name: . Age: Please tick Yes No
Do you currently have any problems with your hearing? Have you recently suffered any colds or infections? Have you ever had any surgery to your hearing system? Have you been exposed to loud noises or music in the last 48h? Do you suffer from balance problems? Have you been involved in sound localization tests before?
Participants Signature Date Experimenters Signature ... Date
111
Answer Sheet Participants Name ... Test 1 Sample Answer Number 1 2 3 4 5 6 7 8 9 10 Test 2 Sample Answer Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Test 5 Sample Answer Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Test 3 Sample Answer Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Test 4 Sample Answer Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
112
Participant Consent Form

Ambisonics Sound Localization MSc Sound and Vibration Studies
Name: ............................................................................... Address: ................................................................. ............................................................................... ............................................................................... ...............................................................................

Please Tick
1. I agree to participate in this research. 2. This agreement is of my own free will. 3. I have had the opportunity to ask any questions about the study 4. I realise that I may withdraw from the study at any time without giving a reason and without any effect. 5. I have been given full information regarding the aims of the research. 6. All personal information provided by myself will remain confidential and no information that identifies me will be made publicly available.
Signed: ................................................. Date: .................................... (by participant) Print name: .................................................................. Signed by researcher Signed: ................................................. Date: ....................................
113

A Study of The Subjective Performance of Ambisonic Decoders - Thomas Leach

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

A Study of The Subjective Performance of Ambisonic Decoders - Thomas Leach

Cargado por

Copyright:

Formatos disponibles

University of Southampton

Faculty of Engineering and the Environment

Thomas Samuel Leach

A thesis submitted in partial fulfillment for the degree of Master of Science

72 72 79 82 92 94 99 99 100 102 104 108 109 110 111 112

Equation Sectio n (Next) Equati on Chapter (Next) Sectio n 1a

Equation Chap ter (Next ) Section 1C

2.3 Localization & Blur

2.4 Methods of Measurement

2.5 Localization In the Horizontal and Median Planes

2.6 Binaural Cues

2.7 Cones of Confusion

Equation Chap ter (Next ) Section 1C

The spherical coordinate system is related to Cartesian coordinate system through

x = r sin (! ) cos (" ) , y = r sin(! )sin(" ), z = r cos(! ).

3.4 The Spherical Wave Equation

1 " 2 p(x,t) c 2 "t 2

1 ! 2 !p 1 ! !p 1 !2 p 1 !2 p (r )+ 2 (sin " ) + 2 2 $ =0 r 2 !r !r r sin " !" !" r sin " !# 2 c 2 !t 2

p(r,! , " ,t) = R(r)#(! )$(" )T (t)

(3.8) (3.9) (3.10) (3.11)

1 d d" m2 (sin ! ) + [n(n + 1) # 2 ]" = 0 sin ! d! d! sin !

(1) 2 hn (kr) m Pn (cos" )e jm# , n $!, m $! and m < n (2) ! hn (kr)

(1) hn (x) = jn (x) + jnn (x) =

% ! " $ J n + 1 (x) + jN n + 1 (x) ' 2x # 2 & 2

(2) hn (x) = jn (x) ! jnn (x) =

# & % J n + 1 (x) ! jN n + 1 (x) ( $ 2 ' 2

Pnm (x)Pnm (x)dx = '

2 (n + m)! #n 'n 2n + 1 (n ! m)!

(2n + 1) (n % m)! m Pn (cos! )e jm" 4$ (n + m)!

h (kr) + Bmn jn (kr))Ynm (! , " )

Equation (3.18) is termed the ambisonic signals (Daniel et al., 2003).

3.5 Spherical Harmonics and Higher Order Ambisonics

Cnm = $ d!Ynm (" , # )* f (" , # )

& d! " & d# & sin$ d$

3 cos! 4# 3 sin ! 8# 3 sin ! 8#

Y1$1 (! , " ) = e$ j" Y11 (! , " ) = $e j"

! and K ! x = cos " , remembering that = k the wave number. c

The workings of this can be seen in Appendix A.1

% ' ' " ' B=' ' ' ' &

" +1 B00 ( * " +1 B11 * * " #1 B11 * ! * * "$ Bnm * )

% W1 ' ' W2 W=' ! ' ' ! ' WN &

! To ensure B = B , the matrix in equation (3.30) needs to be inverted, therefore the

D = pseudo inv(C) = CT (C ! CT )"1 .

3.7 Gerzons Metatheory

velocity vector (V) to the B format first order wave field by

"W = P! & $ $ $ $ # X = 2P! cos(! ) ' . $ $ $ $ %Y = 2P! sin(! ) (

3.8 Decoding Gains

with the optimizing gains (gm) to be equal to

3.9 Gerzons Vienna Decoder

3.10 Aims of Thesis

Equation Chap ter (Next ) Section 1C

4.2 Ambisonic Decoder Model

whereby defining the positions of the loudspeakers in two vectors as

4.3 Frequency Response Model

4.4 Increasing Ambisonic Order

4.5 Number of Loudspeakers

Equation Chapter (Next) Section 1C

5.2 Introducing the Response

5.3 Design of Listening Test

Figure 5.7: Power Spectral Density of 250Hz Stimuli.

Figure 5.8: Power Spectral Density of 700Hz Stimuli.