Está en la página 1de 22

Effects of Phoneme

Characteristics on TEO FeatureBased Automatic Stress


Detection in Speech
BY
VENKATESH SHARAN D038
RACHIT MANCHALWAR D063
RUSHABH BARBHAYA D069

Introduction

Necessity of speech detection

Applications of speech detection :

Automatic Assessment of Stress Levels of personnel in critical positions :

Pilots

Air Traffic Controllers

National Defense Personnel

The paper follows the processing of the some speech samples taken
from a U.S Army Soldier of the Month

The main focus is on the use of vowels

Continued

Reliable speech processing not possible ,compounded by the various


characteristics of the speech samples used in recognition scheme

Vowels used as token in such recognition systems

LDC(Linguistics Data Consortium) phonemes :

Consists 20 different types of vowels each having different general


characteristics

The Scheme

Since the duration, scope and breadth of detectable types of the


spoken vowels vary.

These variations can degrade stress detection performance

In addition to this phoneme durations can vary depending upon the


type of speech (Isolated word, Spontaneous, Read)

Hence to analyze and address this a test set of stressed and neutral
speech samples are used

The answer to every question s The answer to that question is no

Cont.

Neutral to Stress conditions :

The mean heart rate increased by 34.2%

Systolic blood pressure increased by 33.0%,

Diastolic blood pressure increased by 22.5%

We decide the condition (i.e. stressed or not stressed), of the speaker


by employing a weighted majority decision rule

This rule is based upon the decisions of each vowel within the sentence.
The decisions are weighted by the absolute difference between two
Hidden Markov Model (HMM) scores.

Teager Energy Operator

Most approaches to speech modeling have taken a linear plane wave


point-of-view.

Teager did extensive research on the non-linear speech modelling

This was done by devising a simple non-linear energy tracking operator

The operator models the airflow through the vocal tract, shown for
discrete-time signals as follows:

Where

is the Teager Energy Operator (TEO)

The Teager Energy Operator, Critical Band, Autocorrelation Envelope


(TEO-CB-Auto-Env) parameterization feature has been shown to reflect
variations in excitation \under stressful conditions

A speech signals fundamental frequency will change and hence the


distribution pattern of pitch harmonics across critical bands will be
different than for speech under non-stressful conditions.

This finer frequency resolution comes from the partitioning of the


entire audible frequency range into many critical bands

The TEO-CB-Auto-Env feature used in this study is extracted through a


process shown in the flow diagram and illustrated mathematically using
critical bandpass filters (BPF) as

Study addresses two issues

Effective stress classification is possible using voiced speech data from


vowels with the TEO-CB-AutoEnv where critical bands are weighted
based on neutral/stress detection performance from a development test
set.

Since the TEO-CB-AutoEnv employs an autocorrelation envelope


analysis on the input speech, it should be less sensitive to phoneme
spectral differences in stress detection than spectral features such as
MFCCs

Further analysis done only on one phoneme /OW/

Extraction and Processing

8kHz Digital data from SOM

252 samples

Hidden Markov Model (HMM) used for extraction

Manual processing front and back end

Cropping and removing /N/ from no

Manually checking for /OW/

PDF of /OW/ token

Truncation

The samples of 100% are truncated to 80%, 60%, 40% and 20%

TEO is extracted frame by frame.

Output of TEO is equally truncated.

Band weighing scheme is not used in this test.

Testing

100% vowels which were used to train in HMM are used in round robin
testing.

5 of 6 vowels token from all users are taken and used to train HMM
models

Neutral token in Neutral and Stress trained HMMs

Error Reduction

100 sample frame

40% duration 32/216 didn't have enough information.

20% duration 105/216

25 sample frame

40% duration All tokens made through

20% duration 16/216 didn`t have enough information

Only 7.4% samples weren't used

Effect on different vowels

Isolate the vowel

50ms duration

Novel scheme for improving stress


detection across vowel types with
decreased durations

Exploitation of multiple vowel types per speaker utterance is seen and


a classification is made upon a majority rule over individual decisions
on each vowel.

The block diagram for the above procedure is shown in the next slide.

The test results using the HMM training scheme employed in the
previous section, with an overall decision based on the composite
weighted majority decision rule scheme results in significant
performance gains.

The table for the average error rate is shown ahead.

Conclusion

In this paper, the effects of phoneme duration and type on automatic Teager Energy
Operator feature-based stress detection performance is seen.

It was shown that both phoneme duration and type mismatch affect the stress detection
performance.

In the case of vowel duration, shortening the vowel duration was shown to adversely
affect stress detection performance if the vowel duration is less than a threshold of about
50% of the original duration in the case of the vowel /OW/, or at about 85-128 ms.

When a speakers utterance contains multiple vowel types, using a composite decision
process by applying a weighted majority decision rule to the individual decisions with
equal (i.e. baseline) band-weighting provides relative error rate reductions of 24% and
39% in the neutral and stress conditions, respectively.

The final composite stress detection scheme achieves overall average error rates of
approximately 27% for neutral/stress detection, where the vowel durations are 50 ms.

The error rates presented may be further improved by


employing a soft decision rule algorithm, based upon a criteria
associated with the characteristics of specific vowels, such as
weighting decisions made on longer duration vowels more
heavily than those from shorter duration vowels

También podría gustarte