Effects of Phoneme Characteristics On TEO Feature-Based Automatic Stress Detection in Speech

Effects of Phoneme
Characteristics on TEO FeatureBased Automatic Stress

Detection in Speech
BY
VENKATESH SHARAN D038
RACHIT MANCHALWAR D063
RUSHABH BARBHAYA D069
Introduction
Necessity of speech detection
Applications of speech detection :
Automatic Assessment of Stress Levels of personnel in critical positions :
Pilots
Air Traffic Controllers
National Defense Personnel
The paper follows the processing of the some speech samples taken
from a U.S Army Soldier of the Month
The main focus is on the use of vowels
Continued
Reliable speech processing not possible ,compounded by the various

characteristics of the speech samples used in recognition scheme
Vowels used as token in such recognition systems
LDC(Linguistics Data Consortium) phonemes :
Consists 20 different types of vowels each having different general

characteristics
The Scheme
Since the duration, scope and breadth of detectable types of the

spoken vowels vary.
These variations can degrade stress detection performance
In addition to this phoneme durations can vary depending upon the

type of speech (Isolated word, Spontaneous, Read)
Hence to analyze and address this a test set of stressed and neutral
speech samples are used
The answer to every question s The answer to that question is no
Cont.
Neutral to Stress conditions :
The mean heart rate increased by 34.2%
Systolic blood pressure increased by 33.0%,
Diastolic blood pressure increased by 22.5%
We decide the condition (i.e. stressed or not stressed), of the speaker

by employing a weighted majority decision rule
This rule is based upon the decisions of each vowel within the sentence.
The decisions are weighted by the absolute difference between two
Hidden Markov Model (HMM) scores.
Teager Energy Operator
Most approaches to speech modeling have taken a linear plane wave

point-of-view.
Teager did extensive research on the non-linear speech modelling
This was done by devising a simple non-linear energy tracking operator
The operator models the airflow through the vocal tract, shown for
discrete-time signals as follows:
Where
is the Teager Energy Operator (TEO)
The Teager Energy Operator, Critical Band, Autocorrelation Envelope

(TEO-CB-Auto-Env) parameterization feature has been shown to reflect
variations in excitation \under stressful conditions
A speech signals fundamental frequency will change and hence the

distribution pattern of pitch harmonics across critical bands will be
different than for speech under non-stressful conditions.
This finer frequency resolution comes from the partitioning of the

entire audible frequency range into many critical bands
The TEO-CB-Auto-Env feature used in this study is extracted through a

process shown in the flow diagram and illustrated mathematically using
critical bandpass filters (BPF) as
Study addresses two issues
Effective stress classification is possible using voiced speech data from

vowels with the TEO-CB-AutoEnv where critical bands are weighted
based on neutral/stress detection performance from a development test
set.
Since the TEO-CB-AutoEnv employs an autocorrelation envelope

analysis on the input speech, it should be less sensitive to phoneme
spectral differences in stress detection than spectral features such as
MFCCs
Further analysis done only on one phoneme /OW/
Extraction and Processing
8kHz Digital data from SOM
252 samples
Hidden Markov Model (HMM) used for extraction
Manual processing front and back end
Cropping and removing /N/ from no
Manually checking for /OW/
PDF of /OW/ token
Truncation
The samples of 100% are truncated to 80%, 60%, 40% and 20%
TEO is extracted frame by frame.
Output of TEO is equally truncated.
Band weighing scheme is not used in this test.
Testing
100% vowels which were used to train in HMM are used in round robin
testing.
5 of 6 vowels token from all users are taken and used to train HMM
models
Neutral token in Neutral and Stress trained HMMs
Error Reduction
100 sample frame
40% duration 32/216 didn't have enough information.
20% duration 105/216
25 sample frame
40% duration All tokens made through
20% duration 16/216 didn`t have enough information
Only 7.4% samples weren't used
Effect on different vowels
Isolate the vowel
50ms duration
Novel scheme for improving stress

detection across vowel types with
decreased durations
Exploitation of multiple vowel types per speaker utterance is seen and

a classification is made upon a majority rule over individual decisions
on each vowel.
The block diagram for the above procedure is shown in the next slide.
The test results using the HMM training scheme employed in the
previous section, with an overall decision based on the composite
weighted majority decision rule scheme results in significant
performance gains.
The table for the average error rate is shown ahead.
Conclusion
In this paper, the effects of phoneme duration and type on automatic Teager Energy
Operator feature-based stress detection performance is seen.
It was shown that both phoneme duration and type mismatch affect the stress detection
performance.
In the case of vowel duration, shortening the vowel duration was shown to adversely
affect stress detection performance if the vowel duration is less than a threshold of about
50% of the original duration in the case of the vowel /OW/, or at about 85-128 ms.
When a speakers utterance contains multiple vowel types, using a composite decision
process by applying a weighted majority decision rule to the individual decisions with
equal (i.e. baseline) band-weighting provides relative error rate reductions of 24% and
39% in the neutral and stress conditions, respectively.
The final composite stress detection scheme achieves overall average error rates of
approximately 27% for neutral/stress detection, where the vowel durations are 50 ms.
The error rates presented may be further improved by

employing a soft decision rule algorithm, based upon a criteria
associated with the characteristics of specific vowels, such as
weighting decisions made on longer duration vowels more
heavily than those from shorter duration vowels

Effects of Phoneme Characteristics On TEO Feature-Based Automatic Stress Detection in Speech

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Effects of Phoneme Characteristics On TEO Feature-Based Automatic Stress Detection in Speech

Cargado por

Copyright:

Formatos disponibles

Effects of Phoneme

Characteristics on TEO FeatureBased Automatic Stress

Necessity of speech detection

Applications of speech detection :

Automatic Assessment of Stress Levels of personnel in critical positions :

Air Traffic Controllers

National Defense Personnel

The main focus is on the use of vowels

Reliable speech processing not possible ,compounded by the various

Vowels used as token in such recognition systems

LDC(Linguistics Data Consortium) phonemes :

Consists 20 different types of vowels each having different general

Since the duration, scope and breadth of detectable types of the

These variations can degrade stress detection performance

In addition to this phoneme durations can vary depending upon the

The answer to every question s The answer to that question is no

Neutral to Stress conditions :

The mean heart rate increased by 34.2%

Systolic blood pressure increased by 33.0%,

Diastolic blood pressure increased by 22.5%

We decide the condition (i.e. stressed or not stressed), of the speaker

Teager Energy Operator

Most approaches to speech modeling have taken a linear plane wave

Teager did extensive research on the non-linear speech modelling

This was done by devising a simple non-linear energy tracking operator

is the Teager Energy Operator (TEO)

The Teager Energy Operator, Critical Band, Autocorrelation Envelope

A speech signals fundamental frequency will change and hence the

This finer frequency resolution comes from the partitioning of the

The TEO-CB-Auto-Env feature used in this study is extracted through a

Study addresses two issues

Effective stress classification is possible using voiced speech data from

Since the TEO-CB-AutoEnv employs an autocorrelation envelope

Further analysis done only on one phoneme /OW/

Extraction and Processing

8kHz Digital data from SOM

Hidden Markov Model (HMM) used for extraction

Manual processing front and back end

Cropping and removing /N/ from no

Manually checking for /OW/

PDF of /OW/ token

TEO is extracted frame by frame.

Output of TEO is equally truncated.

Band weighing scheme is not used in this test.

Neutral token in Neutral and Stress trained HMMs

100 sample frame

40% duration 32/216 didn't have enough information.

20% duration 105/216

40% duration All tokens made through

20% duration 16/216 didn`t have enough information

Only 7.4% samples weren't used

Effect on different vowels

Isolate the vowel

Novel scheme for improving stress

Exploitation of multiple vowel types per speaker utterance is seen and

The table for the average error rate is shown ahead.

The error rates presented may be further improved by

También podría gustarte