Está en la página 1de 4

ON COMPARING SPEECH QUALITY OF VARIOUS NARROW- AND WIDEBAND SPEECH CODECS

Anssi Rm and Henri Toukomaa Multimedia Technologies laboratory, Nokia Research Center, Tampere, Finland {anssi.ramo, henri.toukomaa}@nokia.com ABSTRACT In recent years wideband speech coding has attracted quite a lot research. However, there have been only a few papers describing the quality difference between narrow and wideband coded speech. In this paper we describe the testing methodology and how the sample processing was done in order to evaluate these two conditions in a single test. Results from a six different listening tests are shown and explained in detail. In addition to standard AMR wideband and narrowband codecs we have tested two so called open source codecs namely iLBC and Speex. We have no knowledge that these codecs have been tested earlier in a formal listening test. 1 INTRODUCTION 2 OPEN SOURCE CODECS

With the introduction of wideband speech coding the traditional target for good communication quality speech is changing. It cannot any longer be said that AMR [1] 12.2k is MOS 4.3 and AMR-WB [2] 14.65k MOS 4.5. They have got such results in different tests, but each test has to be considered individually. The term NB in AMR-NB codec refers to narrowband, which is generally understood as the communication links net speech bandwidth of 300-3400Hz. Likewise term WB refers to wideband speech, which represents bandwidth of 50-7000, thus widening the bandwidth both up and downwards. It should be noted that in the Adaptive Multi-Rate standard, there is no NB in the name. In this paper AMR-NB is used in order to make the tables easier to read. An interesting new thing happening in Internet is the open source movement. It is also making its first appearance in the area of speech coding. Two new speech codecs have become available freely from the net namely Speex [3] and iLBC [4][5]. For example Speex performance has not been earlier evaluated with subjective listening testing due to prohibitive expenses. We have included these codecs to some of our tests. The organization of this paper is as follows. Section 2 gives some information about new open source codecs meant mainly for VoIP (voice over internet protocol). Section 3 gives a brief overview of the testing methodology used in our listening facilities. Section 4 listening test results are shown and some notes are given to explain the results. Finally, conclusions are drawn in Section 5.

In recent couple of years VoIP has become increasingly important in IETF standardization. Also music and video files are of more interest among Internet users. There has already been various more or less open source codecs available for audio coding (ogg vorbis) [6] and video coding (Theora) [7]. Thus it is no wonder that also free speech codecs have emerged. Two such codecs are Speex [3] and iLBC [5]. Nobody actually knows how free those are in terms of intellectual property rights before somebody makes a legal case from their usage. These open source codecs have not taken a part in any official standardization effort and no vigorous listening testing has been done like in AMR and AMRWB standardization [8][9]. Speex codec developed by Jean-Marc Valin is highly flexible narrow- and wideband codec with many different bit rates. Currently there are efforts to port the codec to Symbian platform and a fixed-point solution is also available. The iLBC is on the other hand only narrowband codec with two bit rates of 13.33k for 30ms frames and 15.2k for 20 ms frames. Since the 15.2k mode is quite new only the 13.33k was tested. No free version of the fixed-point implementation is available. One specific problem was noticed concerning the open source codecs. Especially Speex seems to be in constant development all the time, so it is likely that quality of the codec is varying over time. The quality evaluations presented in this paper were done in time frame from autumn 2002 to summer 2004, so the Speex codec might have changed during the time. More generally this can be understood as an Internet phenomenon; nothing is guaranteed between versions. 3 TESTING METHODOLOGY

All test results represented in this paper were obtained in the Nokia Research Centers Listening Test Laboratory in Tampere, Finland. In the Laboratory there are six identical, high quality listening booths for the critical subjective evaluation of audio in a well-controlled acoustic environment. Each booth simulates the ITU-R BS.1116-1 listening room conditions as closely as possible employing a smaller footprint solution. More detailed information can be found in [10]. Every booth functions as an independent test station in terms of audio distribution and response collection. Booths fulfill the ITU-T recommendation P.800 for subjective listening

0-7803-9243-4/05/$20.00 2005 IEEE

603

tests [11]. Each listening booth accommodates one subject at a time with stable and comfortable air conditioning and lighting. 30 dBA Hoth noise is played back to booths to fill the noise floor described in the recommendation, otherwise the booths are too silent. A control room houses all instruments for control and administration of the test and audio distribution. In all of the tests the listeners listened four samples for each condition, but all the listeners did not listen to the same samples. There were four different sets of samples in all of the tests but in the open source NB test (Table 5) where there were eight sets. The more there are different sets the less the speech samples affect the results. Also this system prevents listener fatigue, because they dont have to listen to the same samples all the time. The listeners listened the samples using monaural high quality Sennheiser HD25 headsets. The listeners were asked to use preferred ear. Nave listeners for these tests were obtained from various sports and hobby clubs around the city. Nave listener is defined to be a representative of general telephone using population without pre-knowledge of used coding technology. The listeners may not have attended in any subjective listening test in previous year. Instructions given prior testing were minimal. Only quality range of Bad, Poor, Fair Good, and Excellent was given with each level having respectively value of 1-5. In all of the tests the gender of listeners were nominally balanced. Hearing properties of the listeners were tested using audiometer to verify that all the listeners had normal hearing. MNRU conditions were used in every test and the resulting values were found to be normal. 3.1 Processing of samples The comparison of narrowband and wideband speech signals in a single test is not a trivial task. It can be discussed whether the testing of the narrowband and wideband signals in the same test is relevant, but the used processing chain is very near the actual implementation used in the mobile terminals and networks. In practice the consumer using his or hers mobile phone would hear the same difference. The processing chains for narrowband and wideband coded speech signals are shown in Figures 1 and 2 respectively. All results in this paper were obtained using nominal speech level of 26dBov. The processing chains are the same that were used in AMR-WB characterization tests[12]. Programs used for the processing can be obtained from [13].

16 kHz speech source file

P.341 filtering

P.56 Level adjustment to -26dBov

16 -> 14 bit conversion

16 kHz wideband coded file

16 -> 14 bit conversio

wideband decoder

wideband encoder

Figure 1. Processing chain for testing wideband speech.


16 kHz speech source file GSM-send characteri stics P.56 Level adjustment to -26dBov 2:1 high quality down sampling 16 -> 13 bit conversion

16 kHz narrowband coded file

2:1 high quality up sampling

16 -> 13 bit conversion

narrowband decoder

narrowband encoder

Figure 2. Processing chain for testing narrowband speech. 4 LISTENING TEST RESULTS

The results from six listening tests are presented in Tables 1 to 6. The values for MNRUs are omitted in order to simplify the tables. The number behind each codec name tells the used bit rate in kbps. I.e. AMR-NB 7.95k means that results were obtained running AMR codec with 7.95 kbps mode. The following columns contain the 95% lower and upper confidence limits for each codecs MOS value. The last column shows the actual MOS value. 4.1 Narrowband versus wideband performance Table 1. Comparison of AMR and AMR-WB performance with 32 nave listeners. 95% 95% lower upper Average Codec limit limit MOS value Direct NB 3.48 3.70 3.59 Direct WB 4.24 4.42 4.33 AMR-NB 4.75k 2.49 2.70 2.59 AMR-NB 5.90k 2.79 3.01 2.90 AMR-NB 6.70k 2.79 3.01 2.90 AMR-NB 7.40k 2.88 3.12 3.00 AMR-NB 7.95k 2.99 3.21 3.10 AMR-NB 10.2k 2.97 3.19 3.08 AMR-NB 12.2k 3.08 3.31 3.20 AMR-WB 6.60k 2.96 3.17 3.07 AMR-WB 8.85k 3.23 3.45 3.34 AMR-WB 12.65k 3.68 3.88 3.78 AMR-WB 14.25k 3.84 4.03 3.93 AMR-WB 15.85k 3.84 4.04 3.94 AMR-WB 18.25k 3.96 4.16 4.06 From Table 1 it can be seen that wideband coded speech quality is preferred over narrowband speech signal by a wide margin. All but the lowest bit rate of 6.6k of the wideband codec got better values than even the best narrowband codec (AMR-NB 12.2k). To be accurate

604

AMR 12.2k is statistically as good as AMR-WB 6.60k or 8.85k. Table 2. Comparison of AMR and AMR-WB performance with 24 nave listeners. Codec Direct NB Direct WB AMR-NB 5.90k AMR-NB 7.40k AMR-NB 12.2k AMR-WB 6.60k AMR-WB 8.85k AMR-WB 12.65k AMR-WB 14.25k
95% lower 95% upper Average limit limit MOS value

3.14 4.32 2.05 2.41 2.78 2.25 3.30 3.78 3.86

3.43 4.55 2.34 2.72 3.07 2.56 3.60 4.04 4.14

3.28 4.44 2.19 2.57 2.92 2.40 3.45 3.91 4.00

as high. The performance of iLBC 13.3k is statistically the same as AMR 7.95k or 12.2k. The enormous improvement of AMR-NB over older GSM codecs (GSM full rate and GSM half rate) is also visible. As a lower reference couple of older codecs (LPC-10e 2.4k and MELP 2.4k) were used. Their poor values (MOS 1.29 and 2.40) indicate that they really should not be used for telecommunications. Table 4. Narrowband test with 24 nave listeners. Codec Direct G.726 ADPCM 32k G.729 8k G.729d 6.4k GSM FR 13k AMR-NB 6.70k AMR-NB 10.2k Speex 5.95k Speex 8.35k Speex 11.0k Speex 15.0k
95% lower 95% upper Average limit limit MOS value

Table 2 represents a slightly smaller set of conditions with the same codecs. Although done in same environment with nave listeners the absolute MOS values are quite different. In overall the MOS values for AMR-NB codec presented in Table 2 are smaller than in Table 1. Without wideband reference it would be strange to obtain MOS 2.92 for AMR-NB 12.2k, but with higher quality references this is quite natural. However, same conclusions can be drawn as in the previous test. Only the lowest AMR-WB mode was worse than the highest AMR-NB mode. Without a question wideband is hugely preferred over narrowband in both tests. 4.2 Open source codec narrowband performance
95% lower 95% upper Average limit limit MOS value

3.89 3.19 3.55 3.29 2.84 3.48 3.55 2.43 3.18 3.53 3.69

4.21 3.52 3.93 3.67 3.16 3.82 3.88 2.76 3.53 3.84 4.04

4.05 3.35 3.74 3.48 3.00 3.65 3.72 2.59 3.35 3.69 3.86

From Table 4 it can be seen that different modes of AMR codec again can compete in speech quality with Speex codec, which have even 50% higher bit rate like in case of AMR 6.7k versus Speex 11.0k. Table 5. Narrowband tandem test with 32 nave listeners. Codec
Direct 95% low 95% upp Average limit limit MOS value 4.24 4.45 4.34

Table 3. Narrowband test with 28 nave listeners. Codec Direct AMR-NB 4.75k AMR-NB 5.90k AMR-NB 7.95k AMR-NB 12.2k GSM FR 13k GSM HR 5.6k LPC-10e 2.4k DOD MELP 2.4k G.723.1 5.3k G.723.1 6.3k G.729 8.0k G.726 ADPCM 32k G.726 ADPCM 40k iLBC 13.33k Speex 6.0k Speex 8.0k Speex 11.0k Speex 15.0k 3.86 3.29 3.43 3.56 3.87 2.91 2.62 1.20 2.28 3.32 3.35 3.49 3.27 3.45 3.69 2.67 3.04 3.39 3.55 4.13 3.54 3.67 3.80 4.13 3.16 2.86 1.37 2.53 3.58 3.58 3.73 3.53 3.72 3.94 2.90 3.27 3.65 3.80 3.99 3.42 3.55 3.68 4.00 3.04 2.74 1.29 2.40 3.45 3.46 3.61 3.40 3.58 3.82 2.79 3.15 3.52 3.68

From Table 3 it can be seen that different modes of AMR codec can compete in speech quality with Speex running at much higher bit rate. E.g. AMR 7.95k can compete against Speex 15k and AMR 5.9k can compete against Speex 11.0k although its bit rate is almost twice

AMR-NB 5.15k AMR-NB 5.15k Tandem AMR-NB 7.95k AMR-NB 7.95k Tandem AMR-NB 10.2k AMR-NB 10.2k Tandem AMR-NB 12.2k AMR-NB 12.2k Tandem Speex 5.95 Speex 5.95k Tandem Speex 8.35k Speex 8.35k Tandem Speex 11.0k Speex 11.0k Tandem Speex 15.0k Speex 15.0k Tandem Speex 18.2k Speex 18.2k Tandem iLBC 13.33k iLBC 13.33k Tandem

3.49 2.67 3.95 3.55 3.97 3.48 4.11 3.84 2.74 2.06 3.23 2.64 3.65 3.34 4.13 3.74 4.26 3.98 3.92 3.55

3.76 2.97 4.17 3.81 4.22 3.75 4.34 4.11 3.01 2.32 3.49 2.92 3.91 3.63 4.37 3.96 4.46 4.22 4.16 3.81

3.63 2.82 4.06 3.68 4.09 3.62 4.23 3.98 2.88 2.19 3.36 2.78 3.78 3.48 4.25 3.85 4.36 4.10 4.04 3.68

From Table 5 it can be seen that only Speex 15k and 18.2k can compete with AMR 12.2k. Other modes of

605

Speex below 15k are worse than AMR 7.95k. The performance of iLBC 13.33k is equal to AMR 10.2k. It can be noticed that the MOS values are higher in this test compared to other tests (Tables 3 and 4). This is due to the tandem cases were also compared. Naturally tandem can only be worse than once coded. These many lower grade codecs tend to raise the relatively better results. The tandem results also show the importance of TFO (tandem free operation) in practical networks. Especially with lower bit rates tandem really affects codec performance negatively. 4.3 Open source wideband performance Unlike narrowband speech coding wideband speech coding has not yet entered to large consumer networks. Although there are several high quality wideband codecs available network operators have not thought the speech quality improvement is commercially beneficial. VoIP networks might actually be the first larger system to introduce wideband speech quality to bigger audience. Unfortunately the only open source wideband codec Speex is currently quite immature as can be seen from the result in Table 6. Table 6. Wideband test with 12 nave listeners. Codec Direct G.722 64k G.722 56k G.722 48k AMR-WB 6.60k AMR-WB 8.85k AMR-WB 12.65k AMR-WB 14.25k Speex 9.80k Speex 12.8k Speex 16.8k Speex 20.6k Speex 23.8k
95% lower 95% upper Average limit limit MOS value

as well as standard AMR or ITU-T codecs at similar bit rates. The iLBC is in terms of speech quality quite close to AMR-NB 7.95k/10.2k or G.729 8.0k. However its bit rate is 13.3k or 15.2k, which is significantly more. Speex codec is even worse. For narrowband it needs over 11.0k to reach AMR-NB 5.9k quality. For transparent narrowband quality 15kbps is required. For reasonable wideband quality again over 20k is required with Speex, which is much more than similar quality achieved with AMR-WB working at 12.65k. Thus it can be said that quite a lot maturing is needed for these codecs. Tedious testing and competition during standardization effort of the standardized codecs has definitely given them a quality edge over the open source codecs. 6 REFERENCES

4.26 3.30 3.32 3.06 2.26 3.21 3.58 3.69 1.78 2.34 2.80 3.32 3.40

4.70 3.78 3.80 3.61 2.74 3.75 4.00 4.23 2.18 2.74 3.28 3.80 3.94

4.48 3.54 3.56 3.33 2.50 3.48 3.79 3.96 1.98 2.54 3.04 3.56 3.67

From Table 6 it can be seen that wideband Speex cannot compete with AMR-WB with similar bit rates. iLBC does not support wideband speech. 5 CONCLUSIONS

As can be seen wideband speech coding gives much better quality over narrowband speech coding. The average improvement in the various tests was nearly one MOS score with clean speech. Surprisingly even with quite low bit rates wideband was preferred (e.g. AMRWB 8.8k vs. AMR-NB 10.2k). Old GSM standard codecs (half and full rate) also show their age in comparative listening tests. Even if networks stay for some time with narrowband speech services updating to latest codec improves speech quality almost by one MOS score. The new open source codecs do not perform

[1] K. Jrvinen, Standardisation of the Adaptive MultiRate Codec, Proc. Of X European Signal Processing Conference (EUSIPCO), Tampere, Finland, Sep. 2000. [2] B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J Vainio, H. Mikkola, K. Jarvinen, The adaptive multirate wideband speech codec (AMR-WB), IEEE Tran. on Speech and Audio Processing, Volume: 10 , Issue: 8 , pp. 620636, Nov. 2002. [3] http://www.speex.org/ [4] S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, and J. Skoglund, iLBC - a linear predictive coder with robustness to packet losses, Proc. of the IEEE Speech Coding Workshop, pp. 23-25, Oct. 2002. [5] http://www.ilbcfreeware.org/ [6] http://www.vorbis.com/ [7] http://www.theora.org/ [8] 3GPP TR 26.975 Performance Characterization of the AMR Speech Codec, Version 4.0.0, Mar. 2001. [9] 3GPP TR 26.976 "Performance characterization of the Adaptive Multi-Rate Wideband (AMR-WB) speech codec". [10] Kylliinen, H. Helimki, N. Zacharov and J. Cozens, Compact high performance listening spaces, Proceedings of Euronoise, Italy, 2003. [11] ITU-T Recommendation P.800: Methods for subjective determination of transmission quality. [12] Draft AMR-WB Characterisation processing plan (WB-7c) Version 1.0, 3GPP TSG-SA Codec Working Group TSG-S4#18 Erlangen, Germany, Sep. 2001. [13] ITU-T Software Tool Library 2000. STL-2000 Release 3 version. http://www.itu.int/TIES/

606

También podría gustarte