Está en la página 1de 573

comp.

speech WWW site

Welcome to the comp.speech Frequently Asked Questions WWW site. This site provides a range of
information on speech technology, including speech synthesis, speech recognition, speech coding, and
related material. The information is regularly posted to the comp.speech newsgroup as the
"comp.speech FAQ" posting. This site is mirrored at several other WWW sites around the world
(Australia, UK, Japan and USA) and the information is also available in a plain text format.

There are 250 comp.speech WWW pages and they include over 500 hyperlinks to speech technology
web sites, ftp servers, mailing lists, and newsgroups.

Contents
SpeechLinks: Speech Technology Hyperlinks Pages
Table Of Contents
List Of Software/Hardware/Resources
Update Times
Availability
Odds 'n Ends

FAQ Section 1: General Information on Speech Technology


FAQ Section 2: Signal Processing for Speech
FAQ Section 3: Speech Coding and Compression
FAQ Section 4: Natural Language Processing
FAQ Section 5: Speech Synthesis
FAQ Section 6: Speech Recognition

http://mi.eng.cam.ac.uk/comp.speech/ (1 of 3) [10/31/2003 8:41:02 AM]


comp.speech WWW site

Comp.Speech FTP Site


The comp.speech ftp site is an excellent repository of speech technology information, software and
resources. It contains the following (see Question 1.2 for more detail):

Archives for the comp.speech newsgroup


Speech data and phonetic dictionaries
Software for speech analysis, coding, recognition, synthesis and modelling..
Mirrors of the Simtel directories for sound and voice..

Admin
Minor changes each month. Thanks to all the companies and individuals who send in information.

Acknowledgements
Hundreds of people and companies have made contributions to the comp.speech FAQ over the last
few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo
who have provided a wide range of information and assistance. Tony Robinson also maintains the
comp.speech ftp site which is an excellent resource for all people working with speech technology. I
am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for
supporting the FAQ on their WWW sites.

Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the information presented
here, the author assumes no responsibility for errors or omissions, or for damages resulting from the
use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as representing the views or
products of my employer, Sun Microsystems, Inc.

Copyright and Reproduction


Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.
The comp.speech WWW pages may not be distributed for financial gain and may not be included in
any collections or compilations without express permission from the author.

http://mi.eng.cam.ac.uk/comp.speech/ (2 of 3) [10/31/2003 8:41:02 AM]


comp.speech WWW site

You may make links to the documents, but you may not make copies without permission of the
author.
Note: hyperlinks to the comp.speech WWW pages are encouraged.

Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by

Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681
andrew.hunt@east.sun.com

Last Revision: 18:40 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/ (3 of 3) [10/31/2003 8:41:02 AM]


comp.speech FAQ/WWW Availability

Availability
comp.speech FAQ/WWW
The comp.speech FAQ is available in two forms: text for posting to newsgroup and availability by ftp,
and HTML for the WWW. The original was the text version, and since September 1994 both WWW
and text versions have been supported. The WWW version is now the master version.

WWW Availability
The WWW version of the comp.speech FAQ is mirrored at a number of web sites.

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

Text Version on the WWW


The three parts of the text version are available on this WWW site:

Part 1 - General information


Part 2 - Signal Processing, Coding and NLP
Part 3 - Speech Synthesis and Speech Recognition

Text by Anonymous ftp


The text version is available by anonymous ftp from:

comp.speech ftp server


RTFM server

http://mi.eng.cam.ac.uk/comp.speech/availability.html (1 of 2) [10/31/2003 8:41:05 AM]


comp.speech FAQ/WWW Availability

Text by email
Finally, the text version can be obtained by sending email to mail-server@rtfm.mit.edu with the
following line in the body of the message:

send usenet/news.answers/comp-speech-faq/*

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 03:10 01-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/availability.html (2 of 2) [10/31/2003 8:41:05 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 1/3
From: andrew.hunt@east.sun.com (Andrew Hunt)
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Followup-To: comp.speech
Organization: Speech Applications Group, Sun Microsystems Laboratories
Summary: Information on Speech Technology
Approved: news-answers-request@MIT.Edu

Archive-name: comp-speech-faq/part1
Last-modified: 1997/09/06
URL: http://www.speech.su.oz.au/comp.speech/

COMP.SPEECH FAQ POSTING - PART 1/3

[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]

Comp.Speech Frequently Asked Questions

The Frequently Asked Questions (FAQ) is a regular posting to


comp.speech which attempts to answer some of the regular questions in
the comp.speech newsgroup. It covers speech synthesis, speech
recognition, speech coding and a range of related material. It
contains lists of speech technology software and hardware, including
commerical products, public domain and freeware software, plus it
contains over 500 links to speech technology sites and software.

The FAQ is not meant to discuss any topic exhaustively. It will


hopefully provide readers with pointers on where to find useful
information, especially material available on the Internet.

If you have not already read the Usenet introductory material posted
to news.announce.newusers, please do. For help with FTP (file transfer
protocol) look for a regular posting of anonymous FTP FAQ in
comp.misc, comp.archives.admin or news.answers.

This FAQ is posted every 4 weeks to comp.speech, comp.answers and


news.answers.

It is also available on the World Wide Web:

* Australia: http://www.speech.su.oz.au/comp.speech/
* Britain: http://svr-www.eng.cam.ac.uk/comp.speech/
* Japan: http://www.itl.atr.co.jp/comp.speech/
* USA: http://www.speech.cs.cmu.edu/comp.speech/

Or by anonymous ftp from the comp.speech archive site:

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete

Or from the news.answers ftp site (and its mirrors):

* ftp://rtfm.mit.edu/pub/usenet/comp.speech/*

Or by sending email to mail-server@rtfm.mit.edu with the following


line in the body of the message:

* send usenet/news.answers/comp-speech-faq/*

If you only have email access to the internet, then I suggest you
obtain the Internet-by-email guide. Send email to
mail-server@rtfm.mit.edu with the following line in the body of the
message:

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (1 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* send usenet/news.answers/internet-services/access-via-email

Admin

Minor changes each month. Thanks to all the companies and individuals
who send in information.

Acknowledgements

Hundreds of people and companies have made contributions to the


comp.speech FAQ over the last few years - too many to name
individually. Special thanks go to Tony Robinson and Kevin Lenzo who
have provided a wide range of information and assistance. Tony
Robinson also maintains the comp.speech ftp site which is an excellent
resource for all people working with speech technology. I am grateful
to the people at Sydney University, Cambridge University, ATR ITL and
CMU for supporting the FAQ on their WWW sites.

Disclaimer

The comp.speech FAQ and WWW pages are provided as is without any
express or implied warranties. While every effort has been taken to
ensure the accuracy of the information presented here, the author
assumes no responsibility for errors or omissions, or for damages
resulting from the use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as
representing the views or products of my employer, Sun Microsystems,
Inc.

Copyright and Reproduction

Copyright (c) 1994-6 by Andrew Hunt, all rights reserved.


The comp.speech FAQ posting may not be distributed for financial gain.

The comp.speech FAQ posting may not be included in any collections or


compilations without express permission from the author.
The comp.speech FAQ posting may be posted to any USENET newsgroup,
on-line service, or BBS as long as it is posted in its entirety with
this copyright statement, and that a current version is always
maintained.
[Note: hyperlinks to the comp.speech WWW pages are encouraged.]

Maintainer

The FAQ posting and the Comp.Speech WWW Site are maintained on a
volunteer basis by

Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681 Fax: (508) 250 5067
andrew.hunt@east.sun.com

___________________________________________________________________________

comp.speech FAQ

Table of Contents

+ SpeechLinks: Speech Technology Hyperlinks Pages

* SpeechLinks: 500+ Speech Technology Links


* SpeechLinks: General Speech Technology Links
* SpeechLinks: Signal Processing for Speech
* SpeechLinks: Speech Coding
* SpeechLinks: Speech Synthesis

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (2 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* SpeechLinks: Speech Recognition

+ List Of Software/Hardware

+ Update Times

+ Availability

+ Odds 'n Ends

+ FAQ Section 1: General Information on Speech Technology

* SpeechLinks: General
* Q1.1: What is comp.speech?
* Q1.2: comp.speech ftp site
* Q1.3: Common abbreviations and jargon
* Q1.4: Related newsgroups and mailing lists
* Q1.5: Associations, publications and conferences
* Q1.6: Handicap Aids
* Q1.7: Speech Databases
* Q1.8: Speech File Formats and Conversion
* Q1.9: Speech Laboratory Environments and Audio Editors
* Q1.10: Speech Research Sites
* Q1.11: Miscellaneous Software and Resources

+ FAQ Section 2: Signal Processing

* SpeechLinks: Signal Processing for Speech


* Q2.1: What sampling do I need for speech?
* Q2.2: Finding the pitch of a speech signal
* Q2.3: How do I find the start and end points of a speech
signal?
* Q2.4: Where can I find FFT software?
* Q2.5: Signal processing in speech technology
* Q2.6: Speech sampling and signal processing hardware
* Q2.7: How do I convert to/from mu-law format?
* Q2.8: Signal Processing Software

+ FAQ Section 3: Speech Coding and Compression

* SpeechLinks: Speech Coding


* Q3.1: Speech compression techniques
* Q3.2: Information on speech coding and compression
* Q3.3: Speech Compression / Coding Software

+ FAQ Section 4: Natural Language Processing

* Q4.1: NLP References and Books


* Q4.2: NLP Software

+ FAQ Section 5: Speech Synthesis

* SpeechLinks: Speech Synthesis


* Q5.1: What is speech synthesis?
* Q5.2: How can speech synthesis be performed?
* Q5.3: References/Books on Synthesis
* Q5.4: Speech Synthesis on the WWW
* Q5.5: Speech Synthesis Software/Hardware

+ FAQ Section 6: Speech Recognition

* SpeechLinks: Speech Recognition


* Q6.1: What is speech recognition?
* Q6.2: How is speech recognition performed?
* Q6.3: How can I build a simple speech recogniser?
* Q6.4: References & books on speech recognition
* Q6.5: Speech Recognition Hardware/Software

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (3 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Q6.6: Speaker Recognition (Verification and Identification)


* Q6.7: Integrated Speech Products

___________________________________________________________________________

List of Software/Hardware/Information

The comp.speech FAQ provides information on a range of software,


hardware and resources.

Q1.6: Handicap Aids

* Man-Machine Interfacing
* SpeechViewer II

Q1.7: Speech Data

* Bavarian Archive for Speech Signals


* BUPT Spoken Digit Database (Chinese)
* Center for Spoken Language Understanding (CSLU)
* Examples of IPA Symbols
* Linguistic Data Consortium (LDC)
* NOISEX
* Oxford Acoustic Phonetic Database
* Phonemic Samples
* RELATOR project
* ShATR
* University of Victoria Phonetic Database

Q1.9: Speech Processing Environments

* CSRE: Computerized Speech Research Environment


* DADiSP from DSP Development Corporation
* Entropic Signal Processing System (ESPS) and Waves
* GoldWave
* Kay Elemetrics Computer Speech Lab
* Khoros
* Matlab plus Signal Processing Toolbox
* MacSpeech Lab II
* N!Power
* OGI Speech Tools
* Ptolemy
* Quadravox Speech Processing Products - Qbox
* Speech Filing System (SFS)
* Signalyze 3.0 from InfoSignal
* SoundScope

Q1.11: Miscelaneous Software and Resources

Speech Application Interfaces

* ASAPI: Advanced Speech API (AT&T)


* SAPI: Microsoft Windows Speech API
* SRAPI: Speech Recognition API
* TAPI: Microsoft Windows Telephony API

Network "Phone" Software

* CUSeeMe
* CyberPhone
* DigiPhone
* InterFACE from Hijinx
* FAQ: How can I use the Internet as a telephone?
* Nautilus: Secure Computer Telephony
* NEVOT (1.4v) from AT&T BL
* PGPfone

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (4 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Speak Freely
* Internet Phone from VocalTec
* WebPhone
* WebTalk

Audio Processing Software

* AF version AF3R1
* Voice E-Mail from Bonzi Software
* MicNotePad Recording Software for Macs
* MixViews
* Network Audio System Release 1.1
* NIST Software - SPHERE and SCORE
* Sound Processing Kit
* TCPplay

Human Audio Perception

* Auditory Modeller 1
* Auditory Modeller 2
* Auditory Toolbox for Matlab
* Human Audio Perception Document

Dictionaries and other Lexical Tools

* BEEP dictionary
* CMU dictionary
* CUVOLAD dictionary (Oxford Dictionary)
* Comprehensive Word List
* EAT: Edinburgh Associative Thesaurus
* Homophone List
* Moby Lexical Resources
* MRC Psycholinguistic Database
* WordNet
* Dictionaries on the WWW

Phonetic Fonts and Phonetic Samples

* International Phonetic Alphabet


* WWW: Phonetic Fonts and Examples Online
* Summer Institute of Linguistics IPA Fonts
* Phonetic Fonts for TeX and LaTeX
* Yamada Language Center

Very Miscellaneous Software

* The vOICe
* The Learning Company's Language Training
* Wildfire - an Electronic Assistant

Q2.6: Audio Hardware

* Macintosh Audio Hardware


* PC Audio Hardware
* Unix Audio Hardware

Q2.8: Signal Processing Software

* SigLib from Numerix Ltd.

Q3.3: Compression Software and Hardware

* 32 kbps ADPCM
* Castleton Network Systems - G.729 Voice Coder
* CELP 3.2a & LPC-10
* 8 Kbit/s CELP on the TMS320C5x family of DSP chips
* CyberVoice

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (5 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Rockwell's DigiTalk
* File format conversion
* G.711/721/723 Compression
* G.728 LD-CELP vocoder
* G.728 Compression
* GSM 06.10 Compression
* Lernout & Hauspie Speech Coding (5 products)
* Lernout & Hauspie Speech Coding SDK
* MPEG Audio
* shorten - a lossless compressor for speech signals
* Sipro Lab Telecom Inc. Coding
* Sonarc: Digital Audio Compression
* StarAudio Compressor/Player
* TrueSpeech from DSP Group
* U.S.F.S. 1016 CELP vocoder for DSP56001
* ToolVox from Voxware

Q4.2: Natural Language Processing

* Natural Language Software Registry (NLSR) - NLP Tools


* Part of Speech Tagger

Q5.5: Speech Synthesis

_Apple Macintosh_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* Infovox Product Range
* Macintosh Speech Output Applications
* Macintosh Speech Synthesis Manager
* MacYack Pro
* MBROLA: Free Speech Synthesis Project
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* Sound Bytes DeveloperUs Kit
* Macintosh Speech Synthesis Manager

_Windows (including 95, NT, 3.1)_


* AcuVoice
* AT&T Watson Speech Synthesis
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* Creative TextAssist and TextAssist API
* DECtalk: Text-to-Speech from Digital
* ETI-Eloquence
* HADIFIX
* Infovox Product Range
* IPOX: All Prosodic Speech Synthesis Architecture
* Lernout and Hauspie Text-To-Speech Windows SDK
* Listen2 Text Reader
* MBROLA: Free Speech Synthesis Project
* Monologue for Windows from First Byte
* PAM - A Text-To-Speech Application
* ProVerbe Speech Engine from ELAN Informatique
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* Sound Bytes DeveloperUs Kit
* Tinytalk
* TruVoice from Centigram
* WinSpeech
* ZMD Speech Synthesis

_DOS_
* CSRE: Computerized Speech Research Environment
* Infovox Product Range
* MBROLA: Free Speech Synthesis Project
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (6 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* spchsyn.exe
* Tinytalk
* ZMD Speech Synthesis

_OS/2_
* ProVerbe Speech Engine from ELAN Informatique
* ProVoice Developer's Speech Toolkit from First Byte
* Sound Bytes DeveloperUs Kit

_Unix_
* AcuVoice
* AsTeR
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* DECtalk: Text-to-Speech from Digital
* ETI-Eloquence
* Emacspeak - A Speech Output Subsystem For Emacs
* Festival Speech Synthesis System
* JSRU
* Klatt-style synthesiser
* KPE80 - A Klatt Synthesiser and Parameter Editor
* "learph": Trainable text-to-phoneme software by Antonio Lucca

* MBROLA: Free Speech Synthesis Project


* Orator from Bellcore
* ProVerbe Speech Engine from ELAN Informatique
* rsynth
* SENSYN speech synthesizer
* SGI Developers Toolbox Synthesiser
* Speak
* TrueTalk
* TruVoice from Centigram

_Integrated Circuits and Dedicated Hardware_


* Eurovocs
* Infovox Product Range
* ProVerbe Speech Engine from ELAN Informatique
* RC Systems V8600/V8601 Text to Speech synthesizers

_Other Platforms_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* TheBigMouth (NeXT)
* MBROLA: Free Speech Synthesis Project
* Narrator Translator Library (Amiga)
* Narrator (Amiga)
* TextToSpeech Kit (NeXT)
* Orator from Bellcore
* SENSYN speech synthesizer
* WreadFiles: File reader for Commodore Amiga

_Unknown_
* Lernout and Hauspie Text-To-Speech (3 products)
* Lucent Technologies Bell Labs Text-to-Speech system
* SIMTEL
* Text to Phoneme Program 1
* Text to phoneme program 2
* Text to phoneme program 3

Q6.5: Speech Recognition

_Apple Macintosh_
* Digital Dreams Speech Recognition Plug-Ins
* Dragon Dictation Products
* Macintosh Speech Recognition Manager
* PowerSecretary

_Windows (including 95, NT, 3.1)_

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (7 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* AT&T Watson Speech Recognition


* Cambridge Voice for Windows
* CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
* DragonDictate for Windows
* Dragon Dictation Products
* Dragon Developer Tools
* Ficomp Interpreter 6000
* IBM VoiceType Dictation and Control
* IN CUBE
* Kurzweil Speech Recognition (2 products)
* Lernout & Hauspie ASR SDK
* Listen for Windows 2.0 from Verbex Voice Systems
* Microsoft Speech Recognition
* NCC Dictate
* Phonetic Engine 500 (PE500) from Speech Systems, Inc.
* Philips Speech Recognition (2 products)
* ProNotes Voice Tools
* PureSpeech
* smARTspeak from Advanced Recognition Technologies, Inc.
* Visual Voice from Stylus Innovation
* VoiceAssist for Windows from Creative Labs, Inc.
* VoiceServer for Windows
* Whisper
* WildCard Speech Products

_DOS_
* DATAVOX - French
* Dragon Developer Tools
* Ficomp Interpreter 6000
* Jialong He's Speech Recognition Research Tool
* smARTspeak from Advanced Recognition Technologies, Inc.
* Votan VPC2100 Voice Card and VSP 1010 Speech Processor

_OS/2_
* IBM VoiceType Dictation and Control

_Unix_
* AbbotDemo
* BBN Hark Telephony Recognizer
* EARS: Single Word Recognition Package
* Ficomp Interpreter 6000
* Hidden Markov Model Toolkit (HTK) from Entropic
* IN CUBE
* Jialong He's Speech Recognition Research Tool
* Lotec Speech Recognition Package
* Myers' Hidden Markov Model software
* NICO Artificial Neural Network Toolkit
* Nuance Speech Recognition System
* PureSpeech
* recnet

_Integrated Circuits and Dedicated Hardware_


* HM2007 - Speech Recognition Chip
* OKI VRP6679 - Speech Recognition Chip
* Sensory Inc. Integrated Circuits
* Speech Commander - Verbex Voice Systems
* Voice Control Systems Recognition
* VCS 2030 & 2060 Voice Dialer

_Other Platforms_
* Simon Says (NeXT)
* Voice Command Line Interface (Amiga)
* Visus SpeechKit

_Unknown_
* Berkeley Restaurant Project (BeRP)

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (8 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Lernout & Hauspie ASR (3 products)


* Voice-Trek 2.0
* Voicetek Corp.
* Voice Processing Corporation Speech Recognition Product Line

Q6.6: Speaker Verification and Identification

* ImagineNation: Voice Activated UnLock Technology


* Jialong He's Speaker Recognition (Identification) Tool
* Keyware Biometric Security Products
* SpeakerKey Voice Verifier from ITT
* SpeakEZ Voice Print Speaker Verification
* Voice Control Systems: Speaker Verification Technology

Q6.7: Integrated Speech Products

* SpeechWorksfrom Applied Language Technologies, Inc.


* Nortel Speech Technology Products

___________________________________________________________________________

General Speech Technology

comp.speech FAQ Section 1

* SpeechLinks: General
* Q1.1: What is comp.speech?
* Q1.2: comp.speech ftp site
* Q1.3: Common abbreviations and jargon
* Q1.4: Related newsgroups and mailing lists
* Q1.5: Associations, publications and conferences
* Q1.6: Handicap Aids
* Q1.7: Speech Databases
* Q1.8: Speech File Formats and Conversion
* Q1.9: Speech Laboratory Environments and Audio Editors
* Q1.10: Speech Research Sites
* Q1.11: Miscellaneous Software and Resources

Q1.1: What is comp.speech?

Comp.speech is an unmoderated newsgroup for discussion of speech


technology and speech science. It covers a wide range of issues from
the application of speech technology, to research, to products and
lots more. By its nature, speech technology is an inter-disciplinary
field and the newsgroup reflects this. However, computer application
is the basic theme of the group.

Note: If you don't know what a newsgroup is, then talk to your local
system administration about how to get access. A useful newsgroups for
beginners is news.announce.newusers. You might also find the following
documents useful.

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Us
enet?

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to
_Frequently_Asked_Questions_about_Usenet

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_
posting_to_Usenet

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about
_FAQs

The following is a list of some of the topics covered by comp.speech.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (9 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Speech Recognition - discussion of methodologies, training,


techniques, results and applications. This should cover the
application of techniques including HMMs, neural-nets and so on to
the field.
* Speech Synthesis - discussion concerning theoretical and practical
issues associated with the design of speech synthesis systems.
* Speech Coding and Compression - both research and application
matters.
* Phonetic/Linguistic Issues - coverage of linguistic and phonetic
issues which are relevant to speech technology applications. Could
cover parsing, natural language processing, phonology and prosodic
work.
* Speech System Design - issues relating to the application of
speech technology to real-world problems. Includes the design of
user interfaces, the building of real-time systems and so on.
* Other matters - relevant conferences, jobs, books, software,
hardware, and products.

___________________________________________________________________________

Q1.2: comp.speech ftp site

Tony Robinson maintains the comp.speech ftp site. The ftp site is a
comprehensive repository of software and information related to speech
technology. The site is

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/

Comp.speech Archives

The comp.speech ftp site provides full archives of the comp.speech


newsgroup dating back to the creation of the group in 1991. The
postings are stored in the order in which they arrive. Batches of 1000
articles are grouped into gzip'ed tar file. Matching files listing the
subjects are also provided.

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/

Software and Other Resources

The comp.speech ftp site includes a wide range of useful software and
resources. Tony has arranged it into a series of sub-directories:

/analysis : Speech analysis software


FFT code, a pitch tracker, RASTA code, and IEEE DSP code.

/auditory : Auditory model software


AIM, Auditory Toolbox and Lutear.

/coding : Speech coding software


ADPCM, CELP 3.2a, G711, G721, G723, GSM, LDCELP, LPC10,
Shorten.

/data : Repository for (small) speech-related databases


BEEP, CMUDict, Homophone list, hVd database, Peterson Barney
database

/dictionaries : Phonetic dictionaries


BEEP, CMUDict, CUVOALD, Homophone list, MRC database

/info : Key postings to comp.speech archives by subject


Lots of interesting info!

/recognition : Speech recognition software


AbbotDemo, Ears, Lotec, recnet, sound blaster recognition,
whistle

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (10 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

/simtel_sound : Mirror of the simtel/msdos/sound directory


Range of useful software

/simtel_voice : Mirror of the simtel/msdos/voice directory


Another range of useful software

/synthesis : Speech synthesis software


Klatt synthesis software, Klatt parameter editor and rsynth.

/tools : Miscelaneous tools


Part-of-speech tagger, OGI speech tools, sox audio file format
conversion, SPHERE software and more.

___________________________________________________________________________

Q1.3: Common abbreviations and jargon.

* ANN - Artificial Neural Network.


* ASR - Automatic Speech Recognition.
* ASSP - Acoustics Speech and Signal Processing
* AVIOS - American Voice I/O Society
* CELP - Code-book Excited Linear Prediction.
* COLING - COmputational LINGuistics
* DTW - Dynamic Time Warping.
* FAQ - Frequently Asked Questions.
* HMM - Hidden Markov Model.
* IEEE - Institute of Electrical and Electronics Engineers
* JASA - Journal of the Acoustic Society of America
* LPC - Linear Predictive Coding.
* LVQ - Learned Vector Quantisation.
* MFCC - Mel Frequency Cepstral Coefficients
* NLP - Natural Language Processing.
* NN - Neural Network.
* TIMIT - A speech corpus with phoneme labels - see Q1.7
* TTS - Text-To-Speech (i.e. speech synthesis).
* VQ - Vector Quantisation.

___________________________________________________________________________

Q1.4: Related newsgroups and mailing lists.

Newsgroups

comp.ai - Artificial Intelligence newsgroup.


Postings on general AI issues, language processing and AI
techniques. The comp.ai FAQ covers NLP, NN and other AI
information.

comp.ai.nat-lang - Natural Language Processing Group


Postings regarding Natural Language Processing. Set up to cover
a broard range of related issues and different viewpoints. A
comp.ai.nat-lang FAQ posting is available.

comp.ai.nlang-know-rep - Natural Language Knowledge Representation


Moderated group.

comp.ai.neural-nets - discussion of Neural Networks and related


issues.
There are often posting on speech related matters - phonetic
recognition, connectionist grammars and so on. A
comp.ai.neural-nets FAQ posting is available.

comp.compression - occasional articles on compression of speech.


The comp.compression FAQ has some info on audio compression
standards.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (11 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

comp.dcom.telecom - Telecommunications newsgroup.


Has occasional articles on voice products.

comp.dsp - discussion of signal processing - hardware and algorithms


and more.
Has a good FAQ posting which is also available on the WWW and
by ftp (addresses below). Has a regular posting of a
comprehensive list of Audio File Formats.

+ http://www.bdti.com/faq/dsp_faq.htm
+ ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

comp.multimedia - Multi-Media discussion group.


Has occasional articles on voice I/O.

sci.lang - Language.
Discussion about phonetics, phonology, grammar, etymology and
lots more. A sci.lang FAQ is available.

alt.sci.physics.acoustics
Some discussion of speech production & perception.

alt.binaries.sounds.* - posting and discussion of sound samples.

Mailing Lists

Voice-Users Mailing List


For discussion of any aspect of using voice recognition
systems.

+ Using such systems safely, without muscle or voice strain


+ Techniques for improving recognition accuracy
+ How to set up the physical voice workstation
+ Tips for effective use of voice interfaces
+ Configuration of specific systems, troubleshooting, etc

To subscribe fill out the web-based subscription form


Posts to the list should go to:
voice-users@voicerecognition.com

Colibri
News about language, speech, logic and information.
Email: colibri@let.ruu.nl
WWW: http://colibri.let.ruu.nl/

ECTL - Electronic Communal Temporal Lobe


Founder & Moderator: David Leip. Moderated mailing list for
researchers with interests in computer speech interfaces. This
list serves a broad community including persons from signal
processing, AI, linguistics and human factors. To subscribe,
send your name, institute, department, daytime phone and email
address to:

+ ectl-request@snowhite.cis.uoguelph.ca

The ECTL archive site is


ftp://snowhite.cis.uoguelph.ca/pub/ectl

Prosody Mailing List


Unmoderated mailing list for discussion of prosody. The aim is
to facilitate the spread of information relating to the
research of prosody by creating a network of researchers in the
field. If you want to participate, send the following one-line
message to

+ listserv@msu.edu

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (12 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

+ subscribe prosody Your Name

foNETiks
A moderated monthly newsletter distributed by e-mail. It
carries job advertisements, notices of conferences, and other
news of general interest to phoneticians, speech scientists and
others. The editors are Linda Shockey and Gerry Docherty. To
subscribe send the following 1 line message to

+ mailbase@mailbase.ac.uk
+ join fonetiks your_first_name your_second_name

Digital Mobile Radio


Covers lots of areas include some speech topics including
speech coding and speech compression. Mail Peter Decker
dec@dfv.rwth-aachen.de to subscribe.

___________________________________________________________________________

Q1.5: Associations, Journals and Conferences

[Note: Also see the list provided in Shikano's WWW site on Speech and
Acoustics:
http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e
-www-site.html.]

Associations

Institute of Electrical and Electronics Engineers (IEEE)

* Publications: include IEEE Transactions on Signal Processing, IEEE


Transactions on Speech and Audio (from Jan 93), IEEE Transactions
on Acoustics, Speech, and Signal Processing (now obsolete), IEEE
Signal Processing Magazine. (More information on the WWW:
http://www.ieee.org/sp/index.html).
* Speech-Related Conferences: ICASSP - Intl. Conf. Acoustics,
Speech, and Signal Processing. IEEE also runs speech technology
related workshops and many other conferences. (Does anyone have a
list?)
* Contact: IEEE Service Center
445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855, USA
Phone: 1-800-678-IEEE or (201) 981-0060
* WWW: IEEE: http://www.ieee.org/
IEEE Signal Processing Society http://www.ieee.org/sp/index.html

The Acoustical Society of America (ASA)

* Publications: Journal of the Acoustical Society of America (JASA)


* Conferences: ASA holds four meetings a year. Information is
available on the WWW: http://asa.aip.org/meetings.html.
* Contact: ASA Office Manager,
500 Sunnyside Blvd, Woodbury, NY 11797-2999, USA
Ph: (516) 576-2360, FAX (516) 576-2377
Email: asa@aip.org
* WWW: http://asa.aip.org/

European Speech Communication Association (ESCA)

* Publications: Speech Communications


* Conferences: EUROSPEECH is held every two years. E'97 will take
place in Patras, Greece, in September 1997. ESCA organises regular
speech-related workshops: see their WWW pages for details.
* Contact: Secretariat ESCA
ICP, Universite Stendhal,
BP 25X, F38400 Grenoble Cedex 9, France
Ph: (+33).76.82.43.36 Fax (+33).76.82.43.35

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (13 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Email: esca@icp.grenet.fr
* WWW: http://ophale.icp.grenet.fr/esca/esca.html

Association for Computational Linguistics (ACL)

* Publications: Computational Linguistics


* SIGPHON: Special Interest Group for Computational Phonology. The
home page is provided by the Centre for Cognitive Science at the
University of Edinburgh. A special issue on Computational
Phonology appeared in Vol 20, Num 3 of Computational Linguistics
and included an Introduction to Computational Phonology by Steven
Bird
* Conferences: COLING is held bi-annually. ACL also organises a
range of workshops. See the WWW pages for details.
* Contact: P.O. Box 6090
Somerset, NJ 08875, USA
Ph: (908) 873 3893
Email: acl@bellcore.com
* WWW: http://www.cs.columbia.edu:80/~acl/

American Voice Input/Output Society (AVIOS)

* Description: AVIOS is a not-for-profit organization, dedicated to


disseminating information about applications using speech
technology. It aims "to bridge the gap between emerging voice
technology and its application, by providing an interactive forum
for the technologists, students, system developers, business
managers, and users actively involved in or with an interest in
the field of voice processing."
* Publications: International Journal of Speech Technology (with
Kluwer Academic Publishers)
The Journal of the American Voice Input/Output Society was
published from 1984 to 1994.
* Conferences: The International Voice Input/Output Applications
Conference is held annually (since 1982): Sept 10-12, San Jose,
CA.
* Contact: 4010 Moorpark Avenue, Suite 105M, San Jose, CA 95117, USA

Ph: +1-408-248-1353, Fax: +1-408-248-0251


Email: avios@pilot.net
WWW: http://www.avios.com/

European Language Resources Association

* Description: The European Language Resources Association was


established in Luxembourg in February, 1995, with the goal of
creating an organization to promote the creation, verification,
and distribution of language resources in Europe. A non-profit
organization, ELRA aims to serve as a central focal point for
information related to language resources in Europe, It will help
users and developers of European language resources, as well as
government agencies and other interested parties, exploit language
resources for a wide variety of uses. It will also oversee the
distribution of language resources via CD-ROM and other means and
promote standards for such resources.
* More info: see the ELRA Home page for membership information,
lists of resources etc.
* Contact: K. Choukri, Executive Director ELRA
87, Avenue d'Italie, 75013 Paris, FRANCE
Ph: +33 1 45 86 53 00, Fax: +33 1 45 86 44 88
Email: elra@calvanet.calvacom.fr
WWW: http://www.icp.grenet.fr/ELRA/home.html

ASSTA: Australian Speech Science and Technology Association

* Conference: SST, the Australian conference on Speech Science and

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (14 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Technology, is held bi-annually. SST-96 will be held in Adelaide.


* WWW: Home Page: http://cslab.anu.edu.au/~bruce/assta/
List of members: http://ciips.ee.uwa.edu.au/~roberto/assta-users/

SALT: UK Speech and Language Technology Club

* WWW home page: http://salt.essex.ac.uk/salt/

Linguistic Associations

* A comprehensive list of linguistic associations and linguistic WWW


links is available at
http://engserve.tamu.edu/files/linguistics/linguist/associations.h
tml

Industry Publications

ASR News

* Description: Monthly newsletter covering developments in the


speech recognition and speech synthesis marketplace.
* Note: Voice Information Associates also publish "Automatic Speech
Recognition: A study of the world-wide market" (revised 1995) and
"Text-to-Speech Technology Markets: 1995-2000" (revised 1995)
* Contact: Voice Information Associates, Inc.
14 Glen Road South, P.O. Box 625, Lexington, MA 02173, USA
Ph: +1-617-861-6680, Fax: +1-617-863-8790
Email: asrnews@tiac.net
WWW: http://www.tiac.net/users/asrnews/

Voice News

* Description: Monthly newsletter reporting on voice mail, voice


response, speech recognition, speech synthesis, digital voice
record/playback and related technologies, markets and company
activities. Review copy available on request.
* Contact: Stoneridge Technical Services
P.O. Box 1891, Rockville, MD, 20849, USA
Ph: +1-301-424-0114, Fax: +1-301-424-8971
Email: info@stoneridgetech.com
WWW: http://www.stoneridgetech.com/

Speech Recognition Update

* Description: Monthly news and analysis of speech recognition


markets, applications and technology.
A free sample copy is available by contacting TMA Associates.
* Also: TMA Associates also publishes market studies, including The
Advanced Speech Technology Market: Recognition, Synthesis and
Compression (1996) and Voice ID (1996)

Contact: TMA Associates


6021 Wish Avenue, Encino, CA 91316, USA
Ph: +1-818-708-0962, Fax: +1-818-345-2980
Email: 72162.3172@compuserve.com
http://www.tmaa.com/

Voice Technology and Services News

* Description: Follows integrated PC LAN messaging (voice, fax,


mail, video) and speech technology. It follows the merging
computer and telephone technologies, provides insights into
business and marketing opportunities and offers executive timely
information on industry trend analysis.
* Contact: Phillips Business Information

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (15 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

1201 Seven Locks Rd., Potomac, Maryland, 20854, USA


Ph: 1-800-777-5006 OR +1-301-340-1520
Subscription FAX: +1-301-309-3847
Editorial FAX: +1-424-4297

Telleconnect

* Contact: +1-212-691-8215

Computer Telephony

* Contact: +1-212-691-8215

Voice Processing Magazine

* Contact: 1-800-854-3112

Speech Technology

* Description: No longer published

Technical and Research Publications

Computer Speech and Language

* Price: $US170 (Institutions), $US75 (Individuals), 4 issues per


year.
* Publisher: Academic Press Limited
24-28 Oval Road, London NW1, England
WWW: http://www.apnet.com/

Speech Communication

* Contact: ESCA (see above)


* Publisher: Elsevier Science B.V.
P.O. Box 521, 1000 AM Amsterdam, The Netherlands.
WWW: http://www.elsevier.com/

IEEE Transactions on Speech and Audio Processing,

IEEE Signal Processing Magazine,

IEEE Transactions on Acoustics, Speech, and Signal Processing: OBSOLETE

* Contact: IEEE (see above)

Free Speech Journal

* Description: A Web Journal dedicated to the state of the art in


human language technology. Past volumes, editorial and submission
information, and so on are
* Contact: Editor-In-Chief: Ron Cole: cole@cse.ogi.edu
WWW: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html

Linguistics Abstracts Online

* Description: online access to all abstracts published in


Linguistics Abstracts since 1985, plus all current material as it
becomes available. Over 250 publications are indexed. Free trial
available.
http://www.blackwellpublishers.co.uk/labs/

Computational Linguistics

* Contact: Published by Computational Linguistics Assoc. (see above)

Journal of the Acoustical Society of America (JASA)

* Contact: Published by Acoustical Society of America (see above)

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (16 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

International Journal of Speech Technology (was the AVIOS Journal)

* Description: Focuses on speech technology and its applications,


and promotes research and description of all aspects of speech
input and output: applications, base technology, theory, approach,
experiment, and testing.
* Publisher: Kluwer Academic Publishers
101 Philip Drive, Norwell, MA 02061, USA
Ph: +1-617-871-6300, Fax: +1-617-871-0449
* Submissions to: International Journal of Speech Technology
Journals Editorial Office, Ms. Kelly Riddle
Kluwer Academic Publishers
(Address, phone, fax as above)
Email: krkluwer@world.std.com

Conferences

ICSLP: Intl. Conference on Spoken Language Processing


Next: 30 Nov to 4 Dec, 1998, Sydney, Australia
Held in even years.

ICASSP - Intl. Conf. Acoustics, Speech, and Signal Processing

Eurospeech

Computational Linguistics (COLING), held bi-annually

International Voice Input/Output Applications Conference

SST: Australian Speech Science and Technology Conference

Also see the following lists on the WWW:

Shikano's WWW site on Speech and Acoustics


http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res
ource/e-www-site.html

Institute of Phonetic Sciences WWW list


http://fonsg3.let.uva.nl/Other_pages.html#Meetings

___________________________________________________________________________

Q1.6: Handicap Aids

The following are products and companies which support users who can
benefit from the use of speech technology in a user interface. Please
feel free to submit information on relevant products, names of
companies and links to useful information on the Internet (especially
WWW sites).
[Of course, most of the products listed in Q5.5 and Q6.5 are useful.]

* Man-Machine Interfacing
* SpeechViewer II

Man-Machine Interfacing

* Description: Offers a service designed for people with physical


challenges. Can successfully implement a computerized voice
controlled system adapted to unique needs.
They have developed a free-standing microphone and signal
processing system to compensate for speech/articulation
distortions, and background noise produced by electronic devices
such as wheelchairs and respirators.
* Contact: Man-Machine Interfacing

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (17 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

P.O. Box 5371, Evanston, IL 60204


Ph: 1-888-425-2001, Fax : (847) 328-7975
Email: jwhite@mcs.com
WWW: http://www.speechrec.com/

SpeechViewer II

* Platform: IBM Machines from Mod 25 on.


* Description: SpeechViewer II is a speech therapy tool. It provides
graphical feedback of various speech features so that speech
impaired individuals can improve their speech. It works with an
audio bandwidth of 7.3 Khz and thus allows the therapist to work
with sustained vowels and fricatives. A wide range of graphics are
used to provide adequate variability to hold client interest. An
extensive set of statistics are gathered which allows a therapist
to do research or keep therapy records. The speech therapy modules
are:
+ Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing
+ Skill Building - Pitch, Voicing, Phonology
+ Patterning - Pitch & Loudness - Waveform & Spectrogram,
Spectra
+ Clinical Management - Profiles, Models, Client Data
A multilingual option is available which provides support for 12
languages: Danish, Dutch, Finnish, French, German, Icelandic,
Italian, Norwegian, Portuguese, Spanish, Swedish, and UK English.
With the Multilingual Option, clinicians can use SpeechViewer II
as a training tool for English as a second language and for
foreign language training.
* Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture
Playback Adapter). It has a TI TMS320C25 DSP chip. The input
sampling rate is 44.1 Khz stereo, 88.2 Khz mono. This is a 16 bit
card. It has the following jacks: mic in, stereo line in, stereo
line out, speaker out. Note: This card is being replaced by Mwave
technology. For more info on Mwave contact Texas Instruments.
* Price:
+ The software is $2130 list, $1491 educational, part number
92F2066.
+ The M-ACPA is $370 list, $222 educational, part number
92F3378.
+ The MicroChannel adapter part number is 92F3379 (same price).
* Contact: IBM Special Needs Information
1000 N. W. 51st Street, Internal Zip 5432, Boca Raton, Florida
33431, USA
Ph: 1-800-426-4832, TDD: 1-800-426-4833, Fax: 1-407-982-6059
Email: IBM_SPEC_NEEDS_INFO@vnet.ibm.com
WWW: http://www.austin.ibm.com/pspinfo/snsspv2.html

___________________________________________________________________________

Q1.7: Speech databases

A wide range of speech databases have been collected. These databases


are primarily for the development of speech synthesis/recognition and
for linguistic research.

Some databases are free but most are not. The databases normally
require lots of storage space (100's of MBytes is not unusual). Do not
expect to be able to ftp large amounts of speech data.

In addition to the descriptions of speech databases and speech


database providers below, information can be obtained from

LDC: Linguistic Data Consortium


Provides a very wide range of speech and text data to research

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (18 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

and commercial users: see below.

COCOSDA Home Page: http://www.itl.atr.co.jp/cocosda/


The International Committee for the Co-ordination and
Standardisation of Speech Databases and Assesment Techniques
for Speech Input/Output.

Shikano's WWW site on Speech and Acoustics


http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res
ource/e-www-site.html

RELATOR Project
European resource initiative: see below.

The following speech data resources are described in the FAQ.

* Bavarian Archive for Speech Signals


* BUPT Spoken Digit Database (Chinese)
* Center for Spoken Language Understanding (CSLU)
* Examples of IPA Symbols
* Linguistic Data Consortium (LDC)
* NOISEX
* Oxford Acoustic Phonetic Database
* Phonemic Samples
* RELATOR project
* ShATR
* University of Victoria Phonetic Database

Bavarian Archive for Speech Signals

* Description: The Bavarian Archive for Speech Signals (BAS) was


founded in January 1995 as an initiative of the Institute of
Phonetics at the University of Munich, Germany. The BAS will
develop, validate, administrate and disseminate corpora of spoken
German to the speech community as well as to speech engineering
industry. Presently the following German speech corpora are
available on ISO 9660 CDROM:

Siemens 1000 - SI1000


5 CDROMs, newspaper corpus, read speech, 10 speakers x
1000 utterances

Siemens 100 - SI100


7 CDROMs, read speech, 101 speakers x 100 sentences

PhonDat 1 - PD1
6 CDROMs, new edition in preparation, read speech, 201
speakers x 450+ sentences

PhonDat 2 - PD2
1 CDROM, read speech, 2nd edition, 16 speakers x 200
sentences, various labelled information

Verbmobil
Spontaneous speech recorded in a dialog task (appointment
scheduling). More information on the VERBMOBIL project:
http://www.dfki.uni-sb.de/verbmobil/

Corpora in Preparation

PhonDat I - PD1: 2nd extended edition (Jul 1995)

Strange Corpora - SC
Reference Corpora that reflect certain well known
problems in speech processing, like accents, repair,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (19 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

breaks, hesitations, repetitions, extreme F0, backround


noise, pathological speech, speaker adaptation. The first
SC corpus (SC1 Accents) will be edited in Jul 1995.

BAS Edition of Verbmobil Corpora - VM: 2nd extended edition

Articulatory data - AD: EMA data of speakers of SI1000 corpus

ERBA: 10000 utterances from a train inquiry task

* Misc: BAS is currently developing tools for the automatic


annotation and segmentation of very large speech corpora. This
includes the automatic detection of variants of pronunciation, a
statistical based alignment and a rule-based refinement of the
outcome. The BAS seeks to cooperate with public institutions as
well as with industrial partners to further develop new German
speech databases. BAS can be a platform to re-distribute existing
German speech.
* Contact and More Information: The BAS is located at the University
of Munich, Germany.
BAS c/o Institut fuer Phonetik
Schellingstr. 3/II
80799 Muenchen, Germany
Ph: +49-89-21802758, Fax: +49-89-2800362
Email: bas@sun1.phonetik.uni-muenchen.de
WWW: http://www.phonetik.uni-muenchen.de/BASSeng.html

BUPT Spoken Digit Database (Chinese)

* Vocabulary : {0, 1/yi/, 2, 3, 4, 5, 6, 7, 8, 9, 1/yao/, /dui/,


/cuo/ }, 13 words in total.
* Size: 1202 speakers in total, 789 Males and 413 Females. Each
speaker utters each word 2 times. Total of 31252 utterances.
* Format: 8000Hz 14bit sampling. One utterance per file.
* Contact:

GLuck Co.
195 Berlioz 1C, Nun's Island
Verdun H3E 1C1, Canada
e-mail: weigang@zaphod.math.mcgill.ca

Center for Spoken Language Understanding (CSLU)

* The ISOLET speech database of spoken letters of the English


alphabet. The speech is high quality (16 kHz with a noise
cancelling microphone). 150 speakers x 26 letters of the English
alphabet twice in random order. The ISOLET data base can be
purchased for $100 by sending an email request to
vincew@cse.ogi.edu. (This covers handling, shipping and medium
costs). The data base comes with a technical report describing the
data.
* CSLU has a telephone speech corpus of 1000 English alphabets.
Callers recite the alphabet with brief pauses between letters.
This database is available to not-for-profit institutions for
$100. The data base is described in the proceedings of the
International Conference on Spoken Language Processing.
+ Contact vincew@cse.ogi.edu if interested.
* CSLU has released for universities its Continuous English Speech
Corpus. The corpus contains recorded speech from 690 different
speakers, with label files at various levels - including word
level and phonetic labels. The data were collected as part of the
OGI Multi-language telephone corpus. CSLU provides speech corpora
to all universities without charge. To order a corpus, print the

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (20 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

license agreement/order form, complete it, and fax it to the CSLU.


A description of the corpora and an order form are available:

http://www.cse.ogi.edu/CSLU/
ftp://speech.cse.ogi.edu/pub/releases

* Contact: Mike Noel: noel@cse.ogi.edu

Examples of IPA Symbols

UCLA Sounds of the World's Languages

* Description: The UCLA Sounds of the World's Languages are


available for Macintosh users (no DOS based system currently
available). The sounds are stored in a Hypercard database
developed at the UCLA Phonetics Laboratory. The aim is to
illustrate and teach about the range of sounds used in human
languages with material on more than 80 languages. The set
demonstrates particular highlights of the sound systems focusing
especially on rarer sounds that students may not otherwise have a
chance to hear from a native speaker. The recordings are based on
the archives of recordings collected at UCLA, with additional
contributions from outside collaborators. All the languages can be
accessed from the list of language names, or by clicking on the
language name in a set of maps. Support for part of this work was
provided by NSF. The database currently includes examples of
languages from Agul and Akan to Zulu.
* Availability: 15 DSDD disks, requiring about 35 meg of disk space
when expanded. Available for $50 individual $100 institutions.
Prepayment in US dollars (checks or international money orders
payable to "UC Regents") must accompany all orders.
* Contact: The UCLA Phonetics Laboratory
Linguistics Department, UCLA, Los Angeles, CA 90095 1543
Tel: (310) 825-1254
E-mail: oldfogey@ucla.edu

John Eslings "IPA Labels"

* Description: A HyperCard stack which is available for free or a


nominal fee.
* Contact: John Esling can be reached by email: pdb@uvvm.uvic.ca.

Linguistic Data Consortium (LDC)

The LDC was established to broaden the collection and distribution of


speech and natural language data bases for the purposes of research
and technology development in automatic speech recognition, natural
language processing and other areas where large amounts of linguistic
data are needed. Detailed information on the LDC is now available on
the WWW: http://www.ldc.upenn.edu/. The LDC WWW server provides
information on membership agreements, license agreements, and
summaries of speech and text corpora available.

Speech Corpora

* TIMIT Acoustic-Phonetic Continuous Speech Corpora and NYNEX


Telephone Version of TIMIT Corpus (NTIMIT)
* Resource Management Corpora
* Air Travel Information System (ATIS) Corpora (multiple)
* ARPA Continuous Speech Recognition Corpora (WSJ etc)
* Switchboard Corpus of Recorded Telephone Conversations and
Switchboard Corpus Excerpts (Credit Card Conversations)
* Texas Instruments 46-Word Speaker-Dependent Isolated Word Corpus

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (21 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

(TI46)
* Texas Instruments Speaker-Independent Connected-Digit Corpus
(TIDIGITS)
* Road Rally Conversational Speech Corpus
* HCRC Map Task Corpus
* Air Traffic Control Corpus (ATC0)
* SPIDRE Speaker Identification Corpus
* YOHO Speaker Verification Corpus
* OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone
Corpus
* BRAMSHILL
* MACROPHONE
* King Corpus for Speaker Verification Research
* WSJCAM0: Cambridge Read News Corpus
* TRAINS Spoken dialog corpus
* NYNEX PhoneBook Database
* Frontiers in Speech Processing

Text Corpora

* Association for Computational Linguistics Data Collection


Initiative (ACL/DCI)
* The Penn Treebank Project - Release 2
* TIPSTER Information Retrieval Text Research Collection
* United Nations Parallel Text Corpus (English, French, Spanish)
* Japanese Language Financial New
* European Corpus Initiative-1

Lexical Databases

* CELEX Lexical Database


* COMLEX : COMmon LEXical Database of English (English syntax and
pronunciation)

Contact information:

Linguistic Data Consortium


3615 Market Street, Suite 200, Philadelphia, PA, 19104-2608, USA.
Phone: +1 (215) 898-0464 Fax: +1 (215) 573-2175
e-mail: ldc@ldc.upenn.edu
WWW: http://www.ldc.upenn.edu/

NOISEX-92

* Description: Database of recording of various noises available on


2 CDROMs. Some material from the same source is available by
anonymous ftp in the IEEE's Signal Processing Information Base.
The samples include
+ Voice babble
+ Factory noise
+ HF radio channel noise, pink noise, white noise
+ Various military noises; fighter jets (Buccaneer, F16),
destroyer noises (engine room, operations room), tank noise
(Leopard, M109), machine gun
+ Volvo 340
* Availability 1: The cost of this database is 135 Pounds Sterling
for the set of two CD-ROMs. Send payment with order to:
The Speech Research Unit,
Ex1, DRA Malvern, St.Andrew's Road,
Malvern, Worcestershire, WR14 3PS, UK
Tel +44-684-894074 Fax +44-684-894384
Note: The supply of CD-ROMs is limited so please check that they
are still available before placing an order. The only acceptable
methods of payment are cheques (from the UK only) or bank drafts
in Pounds Sterling drawn on a UK bank. They should be made payable

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (22 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

to:-
Public Sub Account HMG 4768.
* Availability 2: Information on how to obtain a copy of the NATO
RSG.10 NOISE-ROM-0 can be obtained from the DRA Speech Research
Unit (address above) or from:
Dr. Herman Steeneken,
TNO Institute for Perception,
P.O. Box 23, 3769 ZG Soesterberg,
The Netherlands.
* Availability 3 (WWW): Examples of the NOISEX database are
available on the Rice University Digital Signal Processing (DSP)
group home page. (Note the files are large (>20MB).
http://spib.rice.edu/spib/select_noise.html

Oxford Acoustic Phonetic Database

* Available on compact disc, from J. Pickering and B. Rosner. It


contains data on vowel-consonant and consonant-vowel combinations
in both stressed and unstressed locations. The language covered
include French, German, Hungarian, Italian, Japanese, British
English, Spanish and English. For further information write to

Electronic Publishing, Oxford University


Press, Walton Street, Oxford OX2 6DP, UK.
The ISBN is 0-19-268086-2
* Contact:

Prof. B. Rosner
Dept. of Experimental Psychology
South Parks Rd, Oxford, OX1 3UD, UK
email: burton.rosner@wolfson.ox.ac.uk

Phonemic Samples

* Some basic data. The following ftp sites have samples of English
phonemes (American accent I believe) in Sun audio format files.
See Question 1.8 for information on audio file formats.

ftp://sounds.sdsu.edu/.1/phonemes: This ftp site appears to be


obsolete. Does anyone know a new address?

ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes: There appears


to be some config problem with this ftp server.

ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes

The RELATOR project

* Description: RELATOR is a European-wide consortium of researchers


who, with the support of the European Commission, are striving to
establish a European repository of linguistic resources.
Linguistic resources comprise a variety of spoken and written
language materials, including lexicons, grammars, corpora, and
spoken language databases. RELATOR will ensure that the
requirements of the European language processing community receive
attention.
The RELATOR WWW pages provide information on the consortium, The
languages currently covered by the RELATOR consortium include
Danish, Dutch, English, French, German, Greek, Italian,
Portuguese, Spanish plus multilingual resources. The resources
include both text and speech.
* WWW: http://cristal.icp.grenet.fr/Relator/homepage.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (23 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ShATR

* Description: Multi-simultaneous-speaker corpus available on one


CDROM. This specialised corpus is primarily intended to provide
acoustic material for studies in auditory scene analysis. However
many researchers in the speech sciences, ranging from acoustics to
discourse analysis may find it a valuable source of information.
The corpus has been transcribed and aligned at four different
levels of analysis. An overlap analysis between the individual
speaker channels and word counts are available. There is also a
general tool for accessing concurrent events in transcribed
multi-sound-source databases.
* Cost: 30 Pounds Sterling for one CD-ROM. Availability, licensing
and ordering information is provided on ShATR's home page.
* Examples: Samples of the ShATR database are available on ShATR's
home page and by anonymous ftp
ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
* Contact: Speech and Hearing Research Group
Department of Computer Science, University of Sheffield
Regents Court, 211 Portobello Street, Sheffield S1 4DP, U.K.
WWW:
http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.ht
ml

University of Victoria Phonetic Database

* Platform: Computerized Speech Lab CSL4300, MultiSpeech on Winxx or


Win95 with any multimedia card, or a SoundBlaster16 option with
support from the PDBAUDIO program.
* Description: Phonetic database consisting of proprietary format
digitized speech samples from 45 world languages on CDROM. The
CDROM is supported by hardcopy documentation containing the
phonetic inventory of each language, transcriptions and
orthography of each digitized speech sample. The PDB depicts and
compares the the sounds, symbols and conventions of transcription
used by these languages. More information is available from the
STR web site.
* Contact: Speech Technology Research Ltd.,
Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
Ph: +1-250-477-0544
Email: products@speechtech.com
WWW: http://www.speechtech.com/home/speechtech/

___________________________________________________________________________

Q1.8: Speech File Formats and Conversion

Q2.7 of this FAQ has information on mu-law coding.

A very good and very comprehensive list of audio file formats is


prepared by Guido van Rossum. The list is posted regularly to comp.dsp
and alt.binaries.sounds.misc, amongst others. It includes information
on sampling rates, hardware, compression techniques, file format
definitions, format conversion, standards, programming hints and lots
more. It is also available by ftp from

WWW: ftp://ftp.cwi.nl/pub/audio/index.html

Text: ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2

A useful source of software (Sox, ulaw conversion, SoundKit etc) is:

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (24 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://peace.wit.com/sounds/SoundConversion/

___________________________________________________________________________

Q1.9: Speech Laboratory Environments and Audio Editors

First, what is a Speech Laboratory Environment? A speech lab is a


software package which provides the capability of recording, playing,
analysing, processing, displaying and storing speech. Your computer
will require audio input/output capability. The different packages
vary greatly in features and capability - best to know what you want
before you start looking around.

Most general purpose audio editing packages will be able to process


speech but do not necessarily have some specialised capabilities for
speech (e.g. formant analysis).

The following article provides a good survey.

* Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An


Evaluation" Journal of Speech and Hearing Research, pp 314-332,
April 1992.

The following is a list of the speech labs described in the FAQ.

* CSRE: Computerized Speech Research Environment


* DADiSP from DSP Development Corporation
* Entropic Signal Processing System (ESPS) and Waves
* GoldWave
* Kay Elemetrics Computer Speech Lab
* Khoros
* Matlab plus Signal Processing Toolbox
* MacSpeech Lab II
* N!Power
* OGI Speech Tools
* Ptolemy
* Quadravox Speech Processing Products - Qbox
* Speech Filing System (SFS)
* Signalyze 3.0 from InfoSignal
* SoundScope

CSRE: Computerized Speech Research Environment

* Platform: DOS
* Description: CSRE (pronounced "Caesar") is a speech processing
system for the PC. It provides
+ Signal recording and playback
+ Signal editing
+ Pitch and spectral analysis and formant analysis
+ Speech synthesis with an implementation of the Klatt-1980
parametric speech synthesizer
* Requirements: PC compatible (80486DX), 1 Meg RAM (recommend 4M),
DOS 3.2 (recommend 6.22), VGA graphics (640x480; 16 colors) 30 Meg
of hard disk space (5 Meg for CSRE plus space for audio
recordings), and a supported audio card .
* Cost: See AVAAZ WWW Pages
* Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
2B0
Ph: +1-519-472-7944, Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/
* Note: See also the CSRE entry in Q5.5 on speech synthesisers.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (25 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

DADiSP from DSP Development Corporation

* Platform: Windows and various Unix


* Description: DADiSP is designed for scientists and engineers to
collect, analyze, and display scientific and technical data.
Packages available include AdvDSP, Controls, DADiMP, Filters,
GPIBLab, NeuralNet, and Stats.
A description of the application of DADiSP to speech processing is
provided on the DSP Development Corporation WWW site.
Detailed product information is available on the DSP Development
Corporation WWW site and by filling out a WWW form.
* Cost: Unknown
* Availability: See the DSP Development Corporation WWW site
A free, fully featured demo of DADiSP 4.0 is available from the
DSP Development Corporation WWW site and can be mailed on floppy
disk.
A special Student Edition of DADiSP is available for free.
* Contact: DSP Development Corporation
One Kendall Square, Cambridge, MA 02139, USA
Ph: (617) 577-1133 Fax: (617) 577-8211
EMail: info@dadisp.com
WWW: http://www.dadisp.com/

Entropic Signal Processing System (ESPS) and Waves

* Platform: Range of Unix platforms.


* Description: ESPS is a comprehensive set of speech
analysis/processing tools for the UNIX environment. The package
includes UNIX commands, and a comprehensive C library (which can
be accessed from other languages). Waves is a graphical front-end
for speech processing. Speech waveforms, spectrograms, pitch
traces etc can be displayed, edited and processed in X windows and
Openwindows (versions 2 & 3). Waves also includes a signal
labelling utility which provides multiple feature labelling and
useful features for fast labelling of large speech databases.
Other Entropic products are HTK (see Q6.5) and TrueTalk (see
Q5.5).
* Misc: A more detailed description is provided on the Entropic WWW
pages (http://www.entropic.com/esps.html).
* Cost: On request.
* Contact:

Entropic Research Laboratory, Washington Research Laboratory


600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003
(202) 547-1420
email: info@entropic.com
WWW: http://www.entropic.com/

GoldWave

* Platform: Windows
* Description: GoldWave is a digital audio editor for Microsoft
Windows. It features realtime amplitude/spectrum oscilloscopes,
large file editing, effects, and support for a wide variety of
sound formats.
+ Editing of multiple waveforms and large waveforms
+ Realtime amplitude/spectrum oscilloscopes
+ Resizable device controls window for accessing audio devices
+ Realtime fast forward and rewind playback
+ Effects: distortion, Doppler, echo, filter, mechanize,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (26 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

offset, pan, volume shaping, invert, resample, transpose, etc


+ Multiple file formats and conversions: .WAV, .AU, .IFF, .VOC,
.SND, .MAT, .AIFF, and raw data
+ CD-ROM controls window
More information is available on the GoldWave home page.
* Cost: Shareware
* Availability: Through the GoldWave home page:
http://web.cs.mun.ca/~chris3/goldwave/goldwave.html
* Contact: Chris Craig: chris3@cs.mun.ca

Kay Elemetrics CSL (Computer Speech Lab) 4300

* Platform: Minimum IBM PC-AT compatible with extended memory (min


2MB) with at least VGA graphics. More powerful machines
preferable.
* Description: Speech analysis package, with optional separate LPC
program for analysis/synthesis. Uses its own file format for data,
but has some ability to export data as ascii. The main
editing/analysis prog (but not the LPC part) has its own macro
language, making it easy to perform repetitive tasks.
Options - more information on the Kay Elemetrics Corp. WWW site:
+ Multi-Dimensional Voice Program (MDVP)
+ Voice Range Profile (Phonetograph)
+ Real-Time Spectrogram
+ Sona-Match
+ Palatometer Database
+ IPA Transcription Tutorial
+ Delayed Auditory Feedback (DAF)
+ Disordered Voice Database
+ Auditory Perception Program and Database
+ Motor Speech Profile Program
+ CSL-Pitch
+ Real-Time EGG Processing
+ Signal Enhancement in Noise Program
+ Synthesis Program
+ DAT Interface and Four Channel Input
+ Phonetic Database
+ Direct-to-Disk Program
+ Programmers Kit
+ Condenser Microphone
+ Multi-Speech
* Cost: Contact Kay Elemetrics Corp.
* Contact: Kay Elemetrics Corp.
2 Bridgewater Lane, Lincoln Park, NJ 07035, USA
Ph: +1-201-628-6200, Fax: +1-201-628-6363
Toll free tel. 1-800-289-5297
[WWW: http://www.kayelemetrics.com/ - available soon]

Khoros

* Platform: Any Unix - source code available.


* Description: Khoros is a technical computing environment for image
and signal processing, visual programming and software
development.
* Price: On request.
* Availability: Khoral Research Inc.
6001 Indian School Rd. NE Suite 200, Albuquerque, NM 87110, USA
Ph: (505)837-6500, Fax: (505) 881-3842
Email: info@khoral.com
ftp: ftp://ftp.khoral.com/
WWW: http://www.khoral.com/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (27 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Matlab plus Signal Processing Toolbox

* Platform: Wide range


* Description: Matlab (MATrix LABoratory) is a technical computing
environment for numerical computation and visualization based on a
matrix oriented, interpreted programming language. The programming
environment provides support for the development of customized
operations, along with debugging facilities and a graphical user
interface toolkit. Audio output is provided.
A specialised Signal Processing Toolbox is available which
provides many functions which are useful for speech analysis. It
includes filter design, spectral estimation, statistical signal
processing, waveform generation, and signal and spectrogram
display.
A specialised Auditory Toolbox is available which contains
functions useful to people interested in auditory/cochlear models.
A more detailed description is given in Q1.10.
* Price: On request.
* Contact: The Math Works Inc. 24 Prime Park Way, Natick, MA
01760-1500 USA
Ph: 1-508-653 1415 Fax: 1-508-653 6284
Email: info@mathworks.com
ftp: ftp://ftp.mathworks.com
WWW: http://www.mathworks.com/

MacSpeech Lab II (MSL II)

* Platform: Macintosh
* Description: A sound analysis and acquisition for Macs. MSL II
delivers the most common functions for speech analysis (FFTs,
LPCs, f0 extraction, etc.) & produces grayscale spectrographic
displays. Can be used for various speech technology and phonetic
training tasks.
* Hardware: Requires MacADIOS ("Macintosh Analog/Digital
Input/Output System") hardware for speech I/O at 12/16 bits.
* Misc: Software no longer updated by GW Instruments; MSL
soft/hardware will not perform input/output on Quadras, for
example, though analysis seems fine. Known to operate properly on
systems as high as IIcx & II fx.
* Availability: MSL has been replaced by SoundScope; see the
SoundScope entry for more detail.
* Contact:

GW Instruments
35 Medford Street, Somerville, MA 02143, USA
Phone: (617) 625-4096 Fax: (617) 625-1322

N!Power

* Platform: SUN, DEC and HP workstations.


* Description: An object-oriented software package with a MOTIF GUI
interface and a range of functionality for data analysis/editing,
signal analysis, speech processing, real-time A/D and D/A, and
2D/3D interactive graphics. N!Power replaces ILS.
N!Power can provide a Block Diagram user interface, menus,
pop-ups, and a high-level IEEE standard symbolic scripting
language. You can customize the blocks, menus and pop-ups with
mouse point-and-click operations.
* Contact: Signal Technology, Inc.
104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126
Phone: +1-805-899-8300, Fax: +1-805-899-4344

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (28 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Email: stisales@signal.com
WWW: http://www.silcom.com/~stilarry/

OGI Speech Tools

* Developers from the Center for Spoken Language Understanding


(CSLU) at the Oregon Graduate Institute of Science and Technology
(Portland Oregon)
* Platform: Unix
* Description: The OGI Speech tools include :
+ An X windows display tool (LYRE) for displaying data in a
time synchronous fashion for a. the speech signal b.
spectrograms c. phoneme labels, and other information.
+ A Neural Network (NOPT) training package.
+ An set of C library routines (LIBNSPEECH) for the
manipulation of speech data, including: a. PLP Analysis, b.
Rasta PLP Analysis, c. Linear Predictive Coding, d. Mel
Cepstrum Coding, e. Fast Fourier Transform
+ A set of utilities for converting file formats such as ADC,
NIST, mu-law, binary files, and ascii. Includes filtering.
+ A database utility (find_phone) to automate speech database
related enquiries. It allows the user to specify a particular
label or set of labels in a given context, display all
occurrences of the label, and relabel the occurrences if
desired.
+ A Vector-Quantizer based on the Linde Buzo and Gray (LBG)
algorithm.
+ A set of PERL Scripts which have been used mainly to automate
the use of the OGI Speech Tools.
+ MAN Pages for all routines and programs developed, as well as
a User manual in both in postscript and tex format.
* Misc: Software is written in ANSI C.
* Contact: Email: tools@cse.ogi.edu
WWW: http://www.cse.ogi.edu/CSLU/
ftp: ftp://speech.cse.ogi.edu/pub/tools/

Ptolemy

* Platform: Sun SPARC, DecStation (MIPS), HP (hppa).


* Description: Ptolemy provides a highly flexible foundation for the
specification, simulation, and rapid prototyping of systems. It is
an object oriented framework within which diverse models of
computation can co-exist and interact. Ptolemy can be used to
model entire systems.
Ptolemy has been used for a broad range of applications including
signal processing, telecomunications, parallel processing,
wireless communications, network design, radio astronomy, real
time systems, and hardware/software co-design. Ptolemy has also
been used as a lab for signal processing and communications
courses. Ptolemy has been developed at UC Berkeley over the past 3
years. Further information, including papers and the complete
release notes, is available from the FTP site.
* Cost: Free
* Availability: The source code, binaries, and documentation are
available by anonymous ftp from

ftp://ptolemy.berkeley.edu/pub/README

Quadravox Speech Processing Products - Qbox

* Platform: Windows 3.1, Windows 95

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (29 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Description: Qbox comprises a Windows-based LPC-12 analysis and


editing sytem and a parallel-port driven programmer for
one-time-programmable TI TSP50P11 synthesis chips. The analysis
software utilizes standard 11025Hz, 16bit monaural .wav files for
input and allows graphical editing of the coded pitch, gain, and
reflection coefficients. It can also be used to define
concatenation sequences of individual phrases. Data rates depend
on the original sound, but are typically below 2000bits/sec. The
processed data can then be merged with synthesis and control
routines and programmed into the TI synthesizer. The
Quadravox-developed synthesis routine accepts run-time
modifications of pitch and frame-length (speed), as well as
externally defined concatenation sequences. The synthesis chip
interface can be defined as a matrixed-keyboard drive, a simple
parallel control, or a serial bus control supporting up to 31
individually addressed devices and modules.
* Cost: $90-$150 depending on options selected.
* Contact: Quadravox, Inc.
1701 N. Greenville Ave., Suite 608, Richardson, TX, 75081 USA
Ph: 214-669-4002
Email: info@quadravox.com
WWW: http://www.quadravox.com/

Speech Filing System (SFS)

* Platform: Unix and DOS


* Description: SFS provides a computing environment for conducting
speech research. It comprises software tools, file and data
formats, subroutine libraries, graphics, standards and special
programming languages. It performs standard operations such as
recording, replay, waveform editing and labelling, spectrographic
and formant analysis and fundamental frequency estimation. For
more information, see
ftp://ftp.phon.ucl.ac.uk/pub/sfs/README
* Misc: SFS is copyrighted University College London, but is
currently supplied free of charge to research establishments for
non-profit use.
* Availability: SFS source code is available by anonymous FTP from:
ftp://ftp.phon.ucl.ac.uk/pub/sfs/
* Contact: Mark Huckvale
University College London, Gower Street, London WC1E 6BT, UK
Email: SFS@phonetics.ucl.ac.uk
ftp: ftp://ftp.phon.ucl.ac.uk/pub/sfs/

Signalyze 3.0 from InfoSignal

* Platform: Macintosh
* Description: Signalyze is an interactive program for the analysis
of speech and other acoustic material. Signalyze's basic concept
revolves around the display of up 100 signals in HyperCard
fashion. The program offers a range of signal editing features,
spectral analysis tools, manual scoring tools, pitch extraction
routines, signal manipulation tools, and extensive input-output
capacity. It also has a range of capabilities for creating,
editing and manipulating label files with flexibility in labelling
format.
Signalyze handles the following file formats: Signalyze, MacSpeech
Lab, AudioMedia, SoundDesigner II, SoundEdit/MacRecorder,
SoundWave, sound resource formats, and ASCII-text.
Sound I/O: Direct sound input from Apple 8- or 16-bit sound input
Sound output via Macintosh 8- or 16-bit sound.
* Compatibility: MacPlus and higher. Takes advantage of large

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (30 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

screens, multiple screens and 16/256 color/grayscales. System 7.0


compatible. Runs in background with adjustable priority.
* Misc: Manuals and tutorials included (250 pp.). Program is
switchable to English, French, and German. For more information
and demo:
WWW: http://www.agoralang.com:2410/pubdirsoftware.html
WWW: http://www.agoralang.com:2410/signalyze.html
Gopher: gopher://uldns1.unil.ch:70/11/unilgophers/gopher_lett/LAIP
* Cost: Individual licence US$450, departmental license US$750,
organisational license US$1250, plus shipping. Upgrades from
version 2.0 are available.
* Contact: The Americas: Network Technology Corporation
91 Baldwin St., Charlestown, MA 02129, USA
Phone: +1-617-241-9205, Fax: +1-617-241-5064
---
Elsewhere: InfoSignal Inc.
C.P. 73, 1015 LAUSANNE, Switzerland,
Fax: +41 21 691-1372,
Email: 76357.1213@COMPUSERVE.COM

SoundScope

* Platform: Macintosh: 68K and PowerPC native


* Description: The SoundScope product family is used primarily in
speech teaching & research, with some applications in animal
sounds, forensics, and general acoustic analysis. It can record,
view, analyze, play, copy, paste, store and print sound waveforms.
Analysis functions include spectrogram, fundamental frequency
(Fo), Linear Predictive Coding (LPC) including formant tracking,
LPC residual, jitter (pitch perturbation), shimmer (amplitude
perturbation), HNR, frequency spectrum, spectral slice, envelope,
energy and zero crossing. Includes limited built-in filtering,
runs any filter created with WLFDAP. An integrated text editor
stores notes and calculation results. SoundScope lets you design
your own custom "instrument" screen, tasks (macros) and menus.
Supplied instruments include 1 channel analyser (dual snap, dual
time, spectrogram, spectrum), 2 channel analyser, segment
analyser, multi-channel recorder, etc.
* Note: Supercedes MacSpeech Lab II.
* Price: $490 to $4990, less educational discount
* Availability: In North America, directly from GW Instruments.
Contact the company for international distributors.
* Contact: GW Instruments
35 Medford Street, Somerville, MA 02143, USA
Ph: +1-617-625-4096, Fax: +1-617-625-1322
Email: info@gwinst.com

___________________________________________________________________________

Q1.10: Speech Research Sites

Rather than try to list the places round the world which perform
speech research this FAQ lists sites on the WWW where other
comprehensive lists are maintained. Try the following:

Shikano's WWW site on Speech and Acoustics


http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res
ource/e-www-site.html
Lists of speech research sites by country. Currently includes
around 100 sites. The list of Japanese sites is particularly
comprehensive.

Mambo Speech Research List

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (31 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://mambo.ucsc.edu/psl/speech.html
Lists about 50 speech research sites and related information
sources. Very nice presentation!

ESCA: European Speech Communication Association


http://ophale.icp.grenet.fr/esca/labos.html
Links to around 15 European speech research sites and around 15
related sources of information.

Institute for Perception Research: Speech on the Web


http://www.tue.nl/ipo/hearing/webspeak.htm
Jan Roelof de Pijper at the Institute for Perception Research
has a long list of research sites plus links to lots of other
speech material on the WWW.

Russ Wilcox's list of Commercial Speech Recognition


http://www.tiac.net/users/rwilcox/speech.html
Links to information on speech technology vendors, speech
research labs, speech resources, on-line demos and more.

Speech Groups List: Leeds University Cognitive Psychology


Research Group
http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html
List of about 25 research sites.

Institute of Phonetic Sciences, Amsterdam


http://fonsg3.let.uva.nl/Other_pages.html#Phonetics
Good list of European sites.

Speech and Hearing Research Group, University of Sheffield,


UK
http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclink
s.html
Links to sites in the UK, USA, Europe and the rest of the
world.

Duncan M. Forrest's Speech Recognition Resource List


http://www.skye.co.za/dmf/speech/

Most speech research sites have links to other speech research sites
somewhere in their WWW pages.

___________________________________________________________________________

Q1.11: Miscellaneous Software and Resources.

Speech Interface Standards: APIs etc

* ASAPI: Advanced Speech API (AT&T)


* SAPI: Microsoft Windows Speech API
* SRAPI: Speech Recognition API
* TAPI: Microsoft Windows Telephony API

Network "Phone" Software

* CUSeeMe
* CyberPhone
* DigiPhone
* InterFACE from Hijinx
* FAQ: How can I use the Internet as a telephone?
* Nautilus: Secure Computer Telephony
* NEVOT (1.4v) from AT&T BL
* PGPfone
* Speak Freely
* Internet Phone from VocalTec
* WebPhone

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (32 of 50) [10/31/2003 8:41:13 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* WebTalk

Audio Processing Software

* AF version AF3R1
* Voice E-Mail from Bonzi Software
* MicNotePad Recording Software for Macs
* MixViews
* Network Audio System Release 1.1
* NIST Software - SPHERE and SCORE
* Sound Processing Kit
* TCPplay

Human Audio Perception

Other useful information on Auditory Modeling can be found in

Malcolm Slaney's home page


http://www.interval.com/~malcolm/

Martin Cooke's home page


Speech and Hearing Research Group, Dept of Computer Science,
University of Sheffield, UK.
http://www.dcs.shef.ac.uk/~martin/

* Auditory Modeller 1
* Auditory Modeller 2
* Auditory Toolbox for Matlab
* Human Audio Perception Document

Dictionaries and other Lexical Tools

* BEEP dictionary
* CMU dictionary
* CUVOLAD dictionary (Oxford Dictionary)
* Comprehensive Word List
* EAT: Edinburgh Associative Thesaurus
* Homophone List
* Moby Lexical Resources
* MRC Psycholinguistic Database
* WordNet
* Dictionaries on the WWW

Phonetic Fonts and Phonetic Samples

* International Phonetic Alphabet


* WWW: Phonetic Fonts and Examples Online
* Summer Institute of Linguistics IPA Fonts
* Phonetic Fonts for TeX and LaTeX
* Yamada Language Center

Subjective Evaluation of Speech Quality

Dynastat, Inc.
Speech Intelligibility Testing with Diagnostic Rhyme Test
(DRT), Modified Rhyme Test (MRT), Phonetically Balanced Word
Lists (PB), Diagnostic Medial Consonant Test (DMCT), Diagnostic
Alliteration Test (DALT), ICAO Spelling Alphabet Test (SpAT)
Speech Quality (Acceptability) Evaluation with Diagnostic
Acceptability Measure (DAM), Mean Opinion Score (MOS),
Degredation Mean Opinion Score (DMOS)
Contact: Dynastat, Inc.
2704 Rio Grande, Suite 4, Austin, TX 78705, USA
Ph: +1-512-476-4797, Fax: 512/472-2883
Email: sharpley@dynastat.com
WWW: http://www.bga.com/dynastat/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (33 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ANSI S3.2-1989: American National Standard for Measuring the


Intelligibility of Speech Over Connunication Systems
Available from American National Standards Institute (ANSI)
Ph: +1-212-642-4900, Fax: +1-212-398-0023
WWW: http://www.ansi.org/

Louis Pols' List of References on Synthesis Development And Assessment

700 references:
http://www.itl.atr.co.jp/cocosda/output/synth.refs

Very Miscellaneous

* The vOICe
* The Learning Company's Language Training
* Wildfire - an Electronic Assistant

ASAPI: Advanced Speech API (AT&T)

* Description: The AT&T ASAPI Specification is a open,


cross-platform, easy-to-use speech API that can support speech
engines from AT&T and other vendors. ASAPI does not replace the
Microsoft Speech API, but it provides extensions and enhancements
to the Microsoft SAPI Specification including support for
SAPI-compatible applications.
The ASAPI Specification defines two types of interfaces. The
"ASAPI Extensions" interface which provides extensions to the
MS-SAPI interface as well as C++ class encapsulation of SAPI
functionality. The "Visual ASAPI" interface provides an even
higher-level abstraction of SAPI/ASAPI low-level functionality
such that application developers can quickly and easily embed
speech technology into existing or new applications. Special
Purpose Recognizers are examples of Visual ASAPI interfaces which
integrate lower-level functionality that an application developer
can access via a simple interface.
* More information: Contact Jose Garcia at AT&T on (908) 957-5457 or
by email: jrg@att.com. For more information on the WATSON Speech
Engine which supports ASAPI and news about ASAPI please visit the
AT&T Advanced Speech Products Group home page or call
1-800-5-WATSON.

SAPI: Microsoft Windows Speech API

* Platform: Windows 95 and Windows NT 3.51


* Description: The Microsoft Speech API provides applications with
the ability to incoporate speech recognition (command & control or
dictation) or text-to-speech, using either C/C++ or Visual Basic.
SAPI follows the OLE Component Object Model (COM) architecture. It
is supported by many major speech technology vendors. The major
interfaces are
+ Voice Commands: high level speech recognition API for command
and control.
+ Voice Text: simple high level text-to-speech API.
+ Speech Recognition: provides detailed control of a speech
recognition engine for both command-and-control and
dictation.
+ Text-to-Speech: provides detailed interface to a
text-to-speech engine for control of playback, speaking
style, voice quality etc.
+ Multimedia Audio Objects: audio I/O for microphones,
headphones, speakers, telephone lines, files etc.
* Availability: Download Microsoft's latest speech technology,
including the Microsoft Speech SDK, command and control

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (34 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

recognition, the Microsoft dictation research demonstration and


text-to-speech.
* More information: Email: MSSpeech@Microsoft.Com
WWW: The Microsoft Speech API
WWW: An Overview of the Microsoft Speech API
Documentation included with the Microsoft SDK.
* See also: TAPI: Microsoft Telephone API

SRAPI: Speech Recognition API

* Platform: Various
* Description: The SRAPI provides support for speech recognition,
text-to-speech and other media playback. The SRAPI Committee is a
nonprofit Utah corporation with the goal of providing solutions
for interaction of speech technology with applications.
Core members include: Novell, Inc., Dragon Systems, IBM, Kurzweil
AI, Intel, and Philips Dictation Systems. Additional contributing
members include Articulate Systems, DEC, Kolvox Communications,
Lernout and Hauspie, Syracuse Language Systems, Voice Control
Systems, Corel, Verbex and Voice Processing Corporation.
* More information: WWW: http://www.srapi.com/
Email: For more information on the SRAPI Developer CD, send email
to srapi@srapi.com with Subject "SRAPI CD Info".

TAPI: Microsoft Windows Telephony API

* Description: TAPI allows applications to support telephone


communication. TAPI facilitates include:
+ Connecting directly to a telephone network.
+ Automatic phone dialing.
+ Transmission of data (files, faxes, electronic mail).
+ Access to data (news, information services).
+ Conference calling.
+ Voice mail.
+ Caller identification.
+ Control of a remote computer.
+ Collaborative computing over telephone lines.
Windows 95 comes with a telephony application, DIALER.EXE, that
can dial voice calls, act as a proxy for applications making
simple telephony requests, and maintain a call log.
* More information: The Win32 Software Development Kit (SDK)
contains documentation, tools, and sample code for TAPI including
the Microsoft Telephony Programmer's Reference and the Microsoft
Telephony Service Provider Interface (TSPI) for Telephony.
WWW: Tapping in TAPI, TAPI White Paper
* See also: SAPI: Microsoft Speech API

CUSeeMe

* Platform: Macintosh and Windows


* Description: Cornell University software for audio and video
conferencing over the Internet.
* Requirments: Macintosh to RECEIVE video:
+ Macintosh platform with a 68020 processor or higher
+ System 7 or higher operating system
+ Minimum 16-level-grayscale (e.g. color)
+ IP network connection and MacTCP
+ Apple's QuickTime, to receive slides with SlideWindow
Macintosh to SEND video:
+ All the above plus
+ Quicktime installed

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (35 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

+ video digitizer (with vdig software) and Camera


For Windows:
+ Video receive only 386SX, Video send & receive 386DX, Video
receive w/Audio 486SX, Video send & receive w/Audio 486DX
+ Windows 3.1 or higher running in Enhanced Mode.
+ Winsock
+ 256 color (8 bit) video driver
+ Video camera and a video capture board that supports
Microsoft Video For Windows
+ For audio: Windows Sound board that conforms to the Windows
MultiMedia Specification, speakers and a microphone
* Availability: Mac: http://cu-seeme.cornell.edu/get_cuseeme.html
Windows: http://cu-seeme.cornell.edu/PC.CU-SeeMeCurrent.html
* More information: http://cu-seeme.cornell.edu/

CyberPhone

* Platform: Sun Workstations running Solaris 2.x (SunOS 5.x)


* Description: Provides voice communications over the internet. Has
a graphical user interface and requires no additional hardware. An
optional centralized server system is available to make finding
and connecting to other users easier.
* Availability: a free demonstration is available by anonymous ftp

ftp://magenta.com/pub/cyberphone

* Contact: Email: cyberphone@magenta.com. More information is


available on the WWW: http://magenta.com/cyberphone/.

DigiPhone

* Platform: Macintosh, Windows 3.1 and Windows 95


* Description: DigiPhone provides two-way phone conversations by
dialing direct and over the Internet. Includes encryption for
privacy, caller ID, call screening, call timer, adjustable sound
and compression quality, messaging, and access to the Global
Directory providing a database of DigiPhone users.
+ DigiPhone v1.03: provides the standard features listed above.
[ More information].
+ DigiPhone Deluxe: provides the standard features of DigiPhone
v1.03 and adds conference calling, mute, speed dial, call
recording and playback, voice effects, customizations, and
internet tools. [ More information].
+ DigiPhone for Mac: provides the standard features listed
above, plus cross-platform compatibility and mute. [ More
information].
* Requirements: DigiPhone v1.03 requires 386DX/33 or faster, 4MB
RAM, 9,600 bps modem, Sound Blaster 16 card (or any compatible
half or full duplex card), and a local internet connection with
SLIP or PPP. [Recommend 486DX/33 and 14,400 bps modem]
DigiPhone Deluxe has the same requirements on v1.03 but requires
486DX/33 or faster.
DigiPhone for Mac requires a 68030 33Mhz, 68040 25Mhz or Power PC,
4 MB RAM, System 7.x, 14,400 bps modem or better, Sound Manager
3.x for System 7, microphone and speakers, MacTCP or Open
Transport and a local internet connection with SLIP or PPP.
* Price and Availability: Contact Third Planet Publishing for
pricing. Trial software is available from Third Planet Publishing.
Orders and Upgrades can be made on the Web. Also available through
many retailers.
* Contact: Third Planet Publishing, Inc.
17770 Preston Rd, Dallas, Texas 75252, USA

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (36 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Ph: +1-972-733-3005, Fax: +1-972-380-8712


Email: 3pp@planeteers.com
WWW: http://www.planeteers.com/

InterFACE from Hijinx

* Platform: Windows
* Description: InterFACE provides voice communication on the
Internet through IRC (Internet Relay Chat) services.
* Requirments: Recommend a 486DX, 8meg Ram, Windows, VGA Monitor and
a 16 bit sound card.
* Availability: Available on CD Only for $60.00 US, which includes,
postage and handling.
Demo versions available from the HiJiNX WWW site.
* Contact: HiJiNX, Brisbane, Australia
Email: jester@hijinx.com.au
WWW: http://www.hijinx.com.au/

FAQ: How can I use the Internet as a telephone?

* Description: Kevin M. Savetz and Andrew Sears have prepared an FAQ


document titled _FAQ: How can I use the Internet as a telephone?_
The current document has the following sections:
+ Can I use the Internet as a telephone?
+ What do I need to call others on the Internet?
+ How does it work?
+ How do I make calls using a modem?
+ Is the sound quality as good as a regular telephone?
+ Is there a noticeable delay in hearing the other user?
+ What is the difference between full duplex and half duplex?
+ What is multicasting?
+ Can I talk to users of other phone software?
+ What software is available?
The section on available software covers the following:
+ Mac: Maven, NetPhone, CU-Seeme, PGPfone
+ Windows: Speak Freely, CU-Seeme, Internet Phone, Digiphone,
Internet Voice Chat, Internet Global Phone, Web Phone
+ UNIX: Speak Freely, nevot, vat, mtalk, ztalk
* Availability:

By Email
Mail voice-faq-request@northcoast.com
with "Subject: archive"
and "Body: send voice-faq"

FTP
ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_
How_can_I_use_the_Internet_as_a_telephone?

WWW:
http://rpcp.mit.edu/~asears/voice-faq.html

* Contact: Andrew Sears: asears@mit.edu


Kevin Savetz: savetz@northcoast.com

Nautilus: Secure Computer Telephony

* Platform: DOS, Linux, SunOS, Solaris.


* Description: Nautilus is software which allows two users to hold a
secure conversation with either over ordinary phone lines or over
a computer network. Nautilus uses your computer's audio hardware

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (37 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

to digitize and play back your speech using speech compression


algorithms built into the program. It encrypts the compressed
speech using your choice of the Blowfish, Triple DES, or IDEA
block ciphers, and transmits the encrypted packets over the
internet or your modem to another computer. At the other end, the
process is reversed. Nautilus operates in half duplex mode like a
speakerphone -- only one person can talk at a time. Either user
can hit a key to switch between talking and listening. Audio
quality ranges from fair to very good depending on which of the
four speech coders is selected. The Nautilus WWW page provides
more detailed information.
* Requirements: Nautilus runs on IBM PC-compatible computers
(386DX25 or faster) under MSDOS or Linux as well as audio-capable
Sun workstations running SunOS or Solaris. The MSDOS version of
Nautilus requires a Soundblaster compatible sound card and
currently only runs over ordinary phone lines with a modem. To use
Nautilus over ordinary telephone lines, a modem capable of
connecting at 4800 bps or faster is required.
* Availability: Nautilus is available in three different formats. As
a DOS executable, it is available as an archive in zip format
along with it's associated documentation. In source format, it is
available as either a zip-ed archive, or a gzip-compressed tar
archive.
Nautilus is distributed freely (subject to US export restrictions)
with full source code. This insures that its security can be
independently examined and verified. Follow the instructions in
the following README files to obtain Nautilus.
+ ftp://ftp.csn.org/mpj/README
+ ftp://ripem.msu.edu/pub/crypt/README
* More information: WWW: http://www.lila.com/nautilus/
* Contacts: The Nautilus development team includes Bill Dorsey, Paul
Rubin, Andy Fingerhut, Paul Kronenwetter, Bill Soley, and Pat
Mullarky. To contact the developers, send email to
nautilus@lila.com.

NEVOT (1.4v) from AT&T BL

* Platforms: Sun Sparc Station (SunOS 4.1.x) and Silicon Graphics


* Description: Audio-conferencing tool which supports both
point-to-point and broadcasting of audio using multicast IP. Audio
encoding:
+ PCM 64kb/s 8-bits u-law encoded 8KHz PCM (G.711)
+ ADPCM 32 kb/s [Sun only] (G.721)
+ DVI ADPCM 32 kb/s
+ ADPCM 24 kb/s [Sun only] (G.723)
+ CELP 4.8 kb/s
+ LPC 2.4 kb/s
* Availability: by anonymous ftp from

ftp://gaia.cs.umass.edu/pub/hgschulz/nevot

* Contact: Henning Schulzrinne (hgs@researh.att.com)

PGPfone

* Platform: Macintosh and Windows


* Description: Pretty Good Privacy Phone is free secure audio
connection software for the internet. It uses speech compression
and strong cryptography protocols to give you the ability to have
a real-time secure telephone conversation via a modem-to-modem
connection.
* Requirements (Mac): Fast modem: at least 14.4 Kbps V.32bis (28.8

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (38 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Kbps V.34 recommended). An Apple Macintosh with at least a 25MHz


68LC040 processor (PowerPC recommended), running System 7.1 or
above, Thread Manager 2.0.1, ThreadsLib 2.1.2, and Sound Manager
3.0. (These are available from Apple's FTP sites.)
* Requirements (Windows): Fast modem: at least 14.4 Kbps V.32bis
(28.8 Kbps V.34 recommended). A multimedia PC running Windows 95
or NT, with at least a 66 MHz 486 CPU (Pentium recommended), sound
card, microphone, and speakers or headphones.
* Contact: Jeffrey I. Schiller
Email: jis@mit.edu
WWW: http://web.mit.edu/network/pgpfone/

Speak Freely

* Platform: Windows and Unix


* Description: Free "Internet Phone" software supporting voice mail,
multicasting, encryption and several coding methods. Includes 4
forms of data compression and encryption with DES, IDEA and PGP.
The Windows and Unix versions are compatible. You can designate a
bitmap file to be sent to users who connect so they can see who
they're talking to. The Unix version does not have the graphical
user interface of the Windows edition, but supports all its
compression and encryption modes.
* More information:
http://www.fourmilab.ch/netfone/windows/speak_freely.html

Internet Phone from VocalTec

* Platforms: IBM Compatible


* Description: Supports real-time conversations with Internet users
by compressing speech. Voice-activation feature and interactive
display. Features an graphical interface and on-line help. Up to
date listing of all on-line users running Internet Phone. Join or
create topics for conversation with people from all over the
globe. Supports private topics for private conversations with
family or with business associates.
* Requirements: 486SX PC - 25 MHZ, 8MB RAM (recommended)
An Internet Winsock 1.1 compatible TCP\IP connection (minimum
connection: a 14,400 baud modem SLIP\PPP connection)
Windows 3.1
Windows-compatible sound card
* Cost: $US59 + shipping. You can order on the internet:
http://www.vocaltec.com/order.html
* More Information: WWW: http://www.vocaltec.com/
* Availability:

Demo version:
ftp://ftp.vocaltec.com/pub/iphone09.exe

* Contact: VocalTec Inc.

157 Veterans Drive, Northvale, NJ 07647


Tel: 201-768-9400 Fax: 201-768-8893
E-mail: info@vocaltec.com

WebPhone

* Platform: Windows
* Description: WebPhone provides telephone quality, real-time, full
duplex, encrypted, point-to-point voice communication over the
Internet and other TCP/IP based networks. (More detail provided on

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (39 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

the NetSpeak WWW pages).


* Requirements: 80486DX-33 MHz running Windows 3.1 or higher, 4 MB
of RAM, MCI compliant sound card, Winsock 1.1 compliant stack,
14.4Kbps modem, VGA card capable of displaying 256 colors. Full
duplex audio card required for full duplex.
* Price: $49.95 (US)
* Availability: via the WWW: http://www.netspeak.com/getphone.html
* Contact: NetSpeak Corporation
902 Clint Moore Rd., Boca Raton, Fl. 33487, USA
Ph: +1-407-997-4001, Fax: +1-407-997-2401
Email: info@netspeak.com
WWW: http://www.netspeak.com/

WebTalk

* Platform: Windows 3.1/95


* Description: Full-duplex or half duplex, telephone-quality voice,
supports many commercial web browsers.
* Contact: Quarterdeck Corporation
13160 Mindanao Way, 3rd Floor, Marina Del Rey, CA 90292-9705, USA
Ph: +1-310-309-3700, Fax: +1-310-309-4217
Email: info@quarterdeck.com
WWW: http://www.quarterdeck.com/

AF version AF3R1

* Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI


* Description: The AF System is a device-independent
network-transparent system including client applications and audio
servers. With AF, multiple audio applications can run
simultaneously, sharing access to the actual audio hardware.
The AF3R1 distribution of AF includes server support for Digital
RISC systems running Ultrix, Digital Alpha AXP systems running
OSF/1, SGI Indigo running IRIX 4.0.5, Sun Microsystems
SPARCstations running SunOS 4.1.3, and Sun Microsystems
SPARCstations running Solaris 2.3. The servers support audio
hardware ranging from the built-in CODEC audio on SPARCstations
and Personal DECstations to 48 KHz stereo audio using the DECaudio
TURBOchannel module or the SPARCstation DBRI interface
* Availability: The source kit is distributed by anonymous ftp from

ftp://crl.dec.com/pub/DEC/AF

WWW:
http://www.research.digital.com/CRL/projects/AF/home.html

* Contact: af-request@crl.dec.com

Voice E-Mail from Bonzi Software

* Description: Voice E-Mail is an extension to regular e-mail which


allows recorded voice messages to be transmitted in the same way
as normal text messages. Voice E-Mail is available in several
forms: Voice E-Mail 3.0 for WinCIM, Voice E-Mail 3.0 for America
Online, Voice E-Mail 3.0 for Eudora, and Voice E-Mail 3.0 for
Netscape. Voice E-Mail uses digital audio and image compression
technology to compress messages before transferring them through
CompuServe, America Online, and the Internet.
* Availability: Go to the Bonzi home page - http://www.bonzi.com/ -
and follow the links to the Internet Shopping Network's
"Downloadable Software Division."

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (40 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Further Information: Bonzi Software


WWW: http://www.bonzi.com/
Email: info@bonzi.com
Fax 805-238-5798

MicNotePad Recording Software for Macs

* Platforms: Macintosh
* Description: MicNotePad is audio recording tool designed to
improve dictation (a digital replacement for the old-style
mechnical tape systems used by typists). It uses the built-in
microphone or sound input port and the hard disk to record
conversations or speech of arbitrary length. Speech compression
techniques are used to reduce the disk-space. Once it is recorded,
single keystrokes control playback while you type in your word
processor.
* Contact: Nirvana Research
WWW: http://moof.com/nirvana/
Email: nirvana@got.net

MixViews

* Description: A Unix/X sound editor. Does waveform play/record, and


cut/splice. Has various filters, handles native file formats, FFT,
LPC and more
* Availability: by anonymous ftp including SunOS 4 and IRIX 5
binaries.

ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews

Network Audio System Release 1.1

* Platforms: Various (includes SunOS, Solaris, SGI)


* Description: A device-independent mechanism for transferring,
playing and recording audio signals over a network. Has a range of
features suited to networks.
* Cost: Free
* Availability: By anonymous ftp from

ftp://ftp.x.org:/contrib/audio/nas/netaudio-1.2.tar.gz

Also available in the same directory are document files and some
sample sounds.

NIST SPeech HEader REsources Package (SPHERE)

* Description: Standard speech header software from the National


Institute of Standards & Technology (NIST). SPHERE headers
represent information about sample frequency, sample format, etc.
* Availability: By anonymous ftp from

Readme File
ftp://jaguar.ncsl.nist.gov/pub/sphere.README

Source Code
ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z

NIST Speech Recognition Scoring Package (SCORE)

* Description: Software for scoring results of speech recognition


systems from the National Institute of Standards & Technology

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (41 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

(NIST) .
* Availability: By anonymous ftp from

README File
ftp://jaguar.ncsl.nist.gov/pub/score.README

Source Code
ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z

Sound Processing Kit

* Platforms: UNIX
* Description: Sound Processing Kit (SPKit) is an object-oriented
class library for audio signal processing. SPKit includes classes
for various signal processing tasks and a way of implementing
sound processing algorithms in a simple object-oriented manner.
Sound Processing Kit is implemented in C++ and is designed to be
portable. The current version requires a bare-bones C++ 2.0
compatible compiler (templates and exceptions are not needed).
ANSI C standard libraries are required. SPKit includes classes for
+ Sound input and output
+ Basic signal processing
+ Dynamics processing (compressor, gating etc)
+ Filtering
+ Delay and reverberation
+ Distortion
+ Signal routing
* Availability:

Full documentation on the WWW:


http://www.music.helsinki.fi/research/spkit/documentation
/SPKit.html

Software distribution:
http://www.music.helsinki.fi/research/spkit/distribution/
spkit.tar.Z

* Contact: Kai Lassfolk


University of Helsinki Music Research Laboratory
Email: spkit@elisir.helsinki.fi

TCPplay

* Description: TCPPlay lets you use your mac as an audio server for
your Unix box. Provided with source code. Written by Bill
Stafford, Rich Tsoi and Malcolm Slaney.
* Availability: Anonymous ftp from
ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx

Auditory Modeller 1

* Description: John Holdsworth's implementation of a gammatone


filter bank and Roy Patterson's spiral model, in C (with X-window
display).
* Availability: By anonymous ftp from

ftp://ftp.mrc-apu.cam.ac.uk/pub/aim

Auditory Modeller 2

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (42 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Description:Lowel O'Mard's implementation of peripheral filtering,


Ray Meddis's hair cell model and other stuff in C (as a library of
routines).
* Availability: By anonymous ftp from

ftp://suna.lut.ac.uk/public/hulpo/lutear

Auditory Toolbox for Matlab

* Description: This toolbox provides extensions to Matlab which are


useful to people interested in auditory/cochlear modeling. [Matlab
is described is the previous section.] This toolbox has been
tested on both Macintosh and Unix computers. It includes the
following major models:
+ Lyon's Passive Long Wave Cochlear Model (our conventional
model)
+ Patterson-Holdsworth ERB Filter bank with Meddis Hair cell
+ Seneff's Auditory Model (Stages I and II)
+ MFCC (Mel-scale frequency cepstral coefficients from the ASR
world)
+ Spectrogram
+ Correlogram generation and pitch modeling
+ Simple vowel synthesis
* Availability: From Malcolm Slaney home page and by anonymous FTP:
ftp://ftp.apple.com/pub/malcolm
The following files are available:
+ AuditoryToolbox.mif.Z
+ AuditoryToolbox.psc.Z
+ AuditoryToolbox.sea.hqx
+ AuditoryToolbox.tar
+ AuditoryToolbox.tar.Z
The ".mif.Z" file is a Unix compressed version of the FrameMaker
documentation. The ".psc.Z" file is a Unix compressed version of
the Postscript documentation. The ".tar" and ".tar.Z" files are
Unix TAR archives containing all of the m-functions and C-MEX
source code. Finally, the ".sea.hqx" file is a Macintosh
self-extracting archive that has been encoded using BinHex. There
is precompiled version of the three MEX function for the
Macintosh.
* Misc: Our lawyers ask you to remind you that there is no warranty.
We've done some testing but we undoubtably missed things.
* Contact: Malcolm Slaney, Interval Resarch.
Email: malcolm@interval.com
WWW: http://www.interval.com/~malcolm/

Human Audio Perception Document

* Description: Document prepared by Argiris Kranidiotis on the human


audio perception system. It lists a number of references, gives
plenty of numbers and some equations.
* Availability: by anonymous ftp from the comp.speech archive site

ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPe
rception

* Contact: Argiris A. Kranidiotis


University Of Athens, Informatics Department
email: akra@zeus.di.uoa.ariadne-t.gr

BEEP dictionary

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (43 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Description: Phonemic transcriptions of over 250,000 English


words. (British English pronunciations)
* Availability: By anonymous ftp:

BEEP dictionary README file


svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.7.R
EADME

BEEP Dictionary (1.1M)


svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep.tar.g
z

CMU dictionary

* Description: Phonemic transcriptions of 100,000 words with


American English pronunciation.
* Availability - WWW: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
* Availability - ftp: By anonymous ftp from the directory

ftp://ftp.cs.cmu.edu/project/fgdata/dict/

with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1

CUVOLAD dictionary (Oxford Dictionary)

* Description: Computer Usable Version of the Oxford Advanced


Learner's Dictionary containing 70,000+ entries. Has British
English pronunciations and parts of speech.
* Availability: Anonymous ftp
ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/
Documentation:
ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc

Comprehensive Word List

* Description: A comprehensive word list which should contain most


common American words, abbreviations, hyphenations, and even
incorrect spellings. The word lists were compiled from a number of
sources: commercial news services, UseNet news postings, existing
dictionaries, name lists, company lists, UNIX man pages, project
Gutenberg's E-texts, project Wordnet, received mailings, etc. The
current size is 460,000 words.
* Availability: anonymous ftp
ftp://wocket.vantage.gte.com/pub/standard_dictionary
Note 1: There seems to be some sort of network problem reaching
the server.
Note 2: There is a README file which explains the file formats.

EAT: Edinburgh Associative Thesaurus

* Description: A set of word association norms showing the counts of


word association as collected from subjects.
* Availability: Source and WWW interactive versions

Interactive version
Provided by Computing and Information Systems Department
(CISD) of Rutherford Appleton Laboratory, UK
http://www.cis.rl.ac.uk/proj/psych/eat.html

Set of word association norms

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (44 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ftp directory. 6 MB
http://www.cis.rl.ac.uk/proj/psych/eat/eat/

Homophone List

* A list of homophones in General American English is available by


anonymous FTP from the comp.speech archive site:

ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homo
phones-1.01.txt

Moby Lexical Resources

* Description: A set of lexical resources compiled by Grady Ward.


3449 Martha Ct., Arcata, CA 95521-4884, USA
Email: grady@netcom.com OR grady@northcoast.com
* Availability: Mirrored by Malcolm Crawford
(m.crawford@dcs.shef.ac.uk) at the Institute for Language Speech
and Hearing, the University of Sheffield.
WWW: http://www.dcs.shef.ac.uk/research/ilash/Moby/
FTP: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
* Contents:

Moby Hyphenator: mhyph.tar.Z


185,000 entries fully hyphenated. 980kB.

Moby Language: mlang.tar.Z


Word lists in five major languages. 2.3MB.

Moby Part-of-Speech: mpos.tar.Z


230,000 entries with part(s) of speech listed in priority
order. 1.2MB.

Moby Pronunciator: mpron.tar.Z


175,000 entries fully International Phonetic Alphabet
coded. 3.1MB.

Moby Shakespeare: mshak.tar.Z


The complete unabridged works of Shakespeare. 2.3.MB.

Moby Thesaurus: mthes.tar.Z


30,000 root words, 2.5 million synonyms and related
words. 12MB.

Moby Words: mwords.tar.Z


610,000+ words and phrases. 4.0MB.

MRC Psycholinguistic Database

* Description: A machine usable dictionary containing over 150000


words with up to 26 linguistic and psycholinguistic attributes for
each (e.g. pronunciation, part of speech, word frequency).
Psycholinguistic Database was the basis for the "Oxford
Psycholinguistic Database" available for Apple Macs from Oxford
University Press.
* Availability: Several versions with different formats:

Interactive Version of MRC Psycholinguistic Database


Produces lists of words meeting user-definable selection
criteria. Provided by the Dept. of Psychology, University
of Western Australia.
http://www.psy.uwa.edu.au/uwa_mrc.htm

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (45 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ftp'able MRC Psycholinguistic Database


Approximately 12M of data.
ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/
README:
ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.
Information: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info

WordNet

* Description: WordNet is an on-line lexical reference system in


which English nouns, verbs, adjectives and adverbs are organized
into synonym sets, each representing one underlying lexical
concept. Different relations link the synonym sets.
WordNet was developed in the Cognitive Science Laboratory at
Princeton University under the direction of Professor George
Miller.
* Availability:

WWW Interface
http://www.cogsci.princeton.edu/~wn/w3wn.html

Source Distributions
Unix (9.1MB), PC (5.8MB), Macintosh (7.5MB), Prolog
(database only, 4.2MB).
ftp://clarity.princeton.edu/pub/wordnet/

Extended interfaces developed by WordNet users (for X, Lisp etc)


are listed in the WordNet home page.
* Further information: Email: wordnet@princeton.edu
WWW: WordNet home page: http://www.cogsci.princeton.edu/~wn/
README: ftp://clarity.princeton.edu/pub/wordnet/README
Publications: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps

Dictionaries on the WWW

For a while, there was a range of dictionaries and other lexical


resources on the WWW and elsewhere on the Internet. However, due to
copyright reasons, fewer sites are publishing dictionary information.
When last checked, the following sites provide dictionaries or links
to dictionaries on the net:

CMU Dictionary
http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Institute of Phonetic Sciences, Amsterdam


Electronic dictionaries, including French, Norwegian Swahili
and English.
http://fonsg3.let.uva.nl/Other_pages.html

1913 Webster's Revised Unabridged Dictionary


Available as a searchable HTML form at the University of
Chicago ARTFL project site, and as a tagged working file and
downloadable version (45MB) of the HTML at Project Gutenberg.

Martin Ramsch's Englisch-Worterbucher aller Art


Lists of on-line dictionaries, translation dictionaries,
technical dictionaries, etc.
http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/engli
sch.html

Galaxy's list of dictionaries etc.


A comprehensive list of dictionaries, acronym lists,
translation resources, and a Thesaurus.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (46 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-
Information/Dictionaries-etc.html

Webster's dictionary online


http://c.gp.cs.cmu.edu:5103/prog/webster

International Phonetic Alphabet

* Description: The International Phonetic Association


(http://www.arts.gla.ac.uk/IPA/ipa.html) defines the International
Phonetic Alphabet. It is a standard set of symbols for
transcribing the sounds of spoken languages. The full chart of IPA
symbols is published on the International Phonetic Association WWW
site. Also provided are charts for consonants, vowels, tones and
accents, suprasegmentals, diacritics and other symbols. A cassette
of sounds is available: see
http://www.phon.ucl.ac.uk/home/wells/cassette.htm

WWW: Phonetic Fonts and Examples Online

George L. Dillon's list of phonetic resources


[http://weber.u.washington.edu/~dillon/PhonResources.html]

Vowel sounds of American English


Examples of standard American vowels along with the IPA
phonetic symbols and links to recordings.
http://weber.u.washington.edu/~dillon/vowels.html

Consonant sounds of English


Examples of consonants along with the IPA phonetic
symbols and links to recordings.
http://weber.u.washington.edu/~dillon/consonants.html

Vowel Quadrilaterals for American and British English


Charts and audio.
http://weber.u.washington.edu/~dillon/newstart.html

IPA-ASCII
A scheme for representing IPA transcriptions in ASCII for
use in Usenet articles and email.
http://weber.u.washington.edu/~dillon/ipaascii.html

Some things about studying Speech


Information on speech physiology, acoustic phonetics, speech
perception, speech recognition and voice recognition.
http://www.ccp.uchicago.edu/grad/Francis_Alex/speech.html

Summer Institute of Linguistics IPA Fonts

* Platform: Apple Macintosh and Mircosoft Windows


* Description: International Phonetic Alphabet (IPA) fonts are
available as freeware from the Summer Institute of Linguistics
(SIL). The SIL Encore IPA Fonts are a set of scalable IPA fonts
containing the full International Phonetic Alphabet with 1990 Kiel
revisions. Three typefaces are included: SIL Doulos (similar to
Times), SIL Sophia (similar to Helvetica), and SIL Manuscript
(monowidth). Each font contains all the standard IPA discrete
characters and non-spacing diacritics as well as some
suprasegmental and punctuation marks. Each font comes in both
PostScript Type 1 and TrueType formats.
* Availability: Via the WWW and Gopher:
+ WWW: http://www.sil.org/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (47 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

+ Gopher:
gopher://gopher.sil.org/11/gopher_root/computing/software/fon
ts/
+ Ftp for Windows: ftp://ftp.sil.org/fonts/win/silip12a.exe
+ Ftp for Mac: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx
Also available through the SIL email server. Send either of the
following commands to MAILSERV@sil.org.

Windows:
SEND/MODE=BLOCK/ENCODING=UUENCODE
[FTP.FONTS.WIN]SILIP12A.EXE

Mac:
SEND [FTP.FONTS.MAC]SILIPA12.SEA_HQX

Finally, they are available on diskette from the address below.


$US5 to cover the cost of shipping.
* Contact: International Academic Bookstore
Summer Institute of Linguistics
7500 W. Camp Wisdom Road, Dallas, TX 75236 U.S.A.
Ph: 214-709-2404, Fax: 214-709-2433
e-mail: academic.books@sil.org
WWW: http://www.sil.org/

Phonetic Fonts for TeX and LaTeX

Linguistics/Tex mailing list


ling-tex@ifi.uio.no
Subscription method unknown.

TIPA
Created by Rei Fukui: fkr@tooyoo1.l.u-tokyo.ac.jp.
Source: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
Postscript manual:
ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
Compressed postscript manual:
ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps

WSUIPA: Washington State University International Phonetic


Alphabet fonts
A basic WSUIPA font contains 128 phonetic characters and/or
diacritics in five different point sizes (8, 9, 10, 11 and 12)
and in three typefaces (roman, slanted and bold extended). Each
size and typeface includes a TFM (TeX Font Metric) file and its
related GF, PK or PXL file. A macro package and manual are
provided. Apparently LaTeX 2.09 compatible - not LaTeX 2e
compliant.
Available from ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/
OR from CTAN-ftp-archives: e.g.
ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/

Yamada Language Center

* Platform: Apple Macintosh and Mircosoft Windows


* Description: The Yamada Language Center maintains an archive of
fonts to assist users who wish to display or type non-English
fonts on their computers. Their WWW and ftp sites include five
International Phonetic Alphabet fonts (or near IPA). They also
have fonts for over 40 languages (American Sign Language, Arabic,
Armenian, Bengali, Burmese, Celtic, Cherokee......).
* Availability: :

WWW Font List

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (48 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://babel.uoregon.edu/yamada/fonts.html

Windows Fonts
http://babel.uoregon.edu/yamada/winfonts.html

IPA Fonts
http://babel.uoregon.edu/yamada/fonts/phonetic.html

ftp site
ftp://yftp@www-vms.uoregon.edu/fonts/

* Contact: Yamada Language Center, University of Oregon

The vOICe

* Description: Peter Meijer's Java applet/application for sound


analysis and synthesis.
+ Platform: All (where Java VM available)
+ Interactive spectrographic synthesis: draw your own sound
+ Image sonification
+ Mathematical function sonification
+ Spectrographic sound analysis (Fourier, spectrogram)
+ Vision substitution research
* Contact: Peter Meijer

The Learning Company's Language Training

* Platform: Windows and Macintosh


* Description: Foreign-language training software for Spanish,
French, German, Italian, Japanese, and English. In the Windows
version for English, speech-recognition technology is used to help
users improve accents.
* Contact: The Learning Company
Ph: (800) 852-2255
Email: webmaster@learningco.com
WWW: http://www.learningco.Inter.net/foreign.html

Wildfire - an Electronic Assistant

* Platform: ?
* Description: Wildfire is a phone-based electronic assistant.
Functions include:
+ Screens, routes, and announces incoming calls.
+ Contact list with voicedialing.
+ Schedules and reminders for follow-up calls and action items.
+ Messaging and advanced voicemail features.
* Contact: Wildfire Communications, Inc.
20 Maguire Road, Lexington, MA 02173 USA
Ph: +1-617-674-1500, Fax: 617-674-1501
Demo line: 1-800-WILDFIRE
Email: info@wildfire.com
WWW: http://www.wildfire.com/

___________________________________________________________________________

Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.


This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
long as it is posted in its entirety and includes this copyright statement.
This FAQ may not be distributed for financial gain.
This FAQ may not be included in any collections or compilations
without express permission from the author.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (49 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

---

Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories Ph: (508) 442-2681
2 Elizabeth Drive, MS UCHL03-207 Fax: (508) 250-5067
Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (50 of 50) [10/31/2003 8:41:14 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 2/3
From: andrew.hunt@east.sun.com (Andrew Hunt)
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Followup-To: comp.speech
Organization: Speech Applications Group, Sun Microsystems Laboratories
Summary: Information on Speech Technology
Approved: news-answers-request@MIT.Edu

Archive-name: comp-speech-faq/part2
Last-modified: 1997/09/06
URL: http://www.speech.su.oz.au/comp.speech/

COMP.SPEECH FAQ POSTING - PART 2/3

[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]

Signal Processing for Speech

comp.speech FAQ Section 2

* SpeechLinks: Signal Processing for Speech


* Q2.1: What sampling do I need for speech?
* Q2.2: Finding the pitch of a speech signal
* Q2.3: How do I find the start and end points of a speech
signal?
* Q2.4: Where can I find FFT software?
* Q2.5: Signal processing in speech technology
* Q2.6: Speech sampling and signal processing hardware
* Q2.7: How do I convert to/from mu-law format?
* Q2.8: Signal Processing Software

___________________________________________________________________________

Q2.1: What sampling do I need for speech?

For recorded speech to be understood by humans you need an 8kHz


sampling rate or more and at least 8 bit sampling. This produces poor
quality speech - but in can be understood.

Improvements can be achieved by increasing the number of bits in


sampling to 12bits or 16bits, or by using a non-linear encoding
technique such as mu-law or A-law (see Q2.7). This improves the
"signal-to-noise" ratio.

Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
improves the frequency response: the higher the sampling frequency the
better the high frequency content will be. A 16kHz sampling rate is a
reasonable target for high quality speech recording and playback.

When doing speech recognition you need to remember that the your
computer is not as good as your ear so it will have trouble with poor
quality sounds. The choice of an appropriate sampling setup depends
very much on the speech recognition task and the amount of computer
power available.

___________________________________________________________________________

Q2.2: Finding the pitch of a speech signal

This topic comes up regularly in the comp.dsp newsgroup. Question 2.5

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (1 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

of the FAQ posting for comp.dsp gives a comprehensive list of


references on the definition, perception and processing of pitch. The
comp.dsp FAQ posting is posted regularly to the comp.dsp newsgroup,
and is also available by ftp and on the WWW:

* http://www.bdti.com/faq/dsp_faq.htm
* ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

The following provide pitch tracking software:

* Most of the speech processing environments listed in Q1.9


including CSRE, ESPS, Kay Elemetrics Computer Speech Lab, OGI
Speech Tools, Speech Filing System, Signalyze, Soundscope.

___________________________________________________________________________

Q2.3: Finding start and end points of a speech signal

End-point detection algorithms identify sections in an incoming audio


signal that contain speech. Accurate end-pointing is a non-trivial
task, however, reasonable behaviour can be obtained for inputs which
contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and
at the rate of "zero-crossings". A zero-crossing is where the audio
signal changes from positive to negative or visa versa. When the
energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in
the papers cited below and in the documentation for the following
software.

End-point detection software is available from:

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
*
ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.t
ar.gz

Plenty of research papers have been presented on end-pointing. Try the


following:

* Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints
of Isolated Utterances", Bell System Technical Journal, Vol 54,
No. 2, pp 297-315, 1975.
* Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans
on Communications, Vol 26, No 1, Jan 78, pp. 140-145.
* Newman, W.C. "Detecting Speech with an Adapative Neural Network."
Electronic Design. 22 March 1990.
* Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE
Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994, pp 153-159.

___________________________________________________________________________

Q2.4: FFT Software

* Comprehensive list of FFT software


Links to over 65 different pieces of one-dimensional FFT code.
http://tjev.tel.etf.hr/josip/DSP/fft.html

* FFT Software including optimised fft routines and mixed-radix


algorithms
ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
OR,
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff.
tar.gz

* mixfft03.zip: C-source for a very fast arbitrary N FFT routine

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (2 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

The C-source is ShareWare: read the text file included in the


package before using the FFT routine commercially.
Jens J. Nielsen: jnielsen@internet.dk
Available from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.z
ip
OR ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip

* FFTW
FFTW is a C subroutine library for computing the FFT in one or
more dimensions. It is not limited to sizes that are powers of
two, and includes real-complex and parallel transforms.
Also on the FFTW web site are benchmarks comparing the
performance and accuracy of many public-domain FFT
implementations on a variety of platforms, as well as links to
other sources of FFT code and information.
Available from http://theory.lcs.mit.edu/~fftw
Developed by Matteo Frigo and Steven G. Johnson:
fftw@theory.lcs.mit.edu

___________________________________________________________________________

Q2.5: Signal processing in speech technology

This question is far to big to be answered in a FAQ posting. Here are


some WWW resources and books which cover the area well.

Tony Robinson's Course Notes

Dr. Tony Robinson of the Engineering Dept of Cambridge University has


put his Speech Analysis course notes on the web. The base page is
http://svr-www.eng.cam.ac.uk/~ajr/SA95/. There is information on the
following:

* Sampling theory
* Filter bank analysis
* Short-term fourier analysis
* Linear prediction analysis
* Formant analysis and voicing analysis
* Speech coding
* and more....

Joseph Picone's Course Notes

Joseph Picone of the Institute for Signal and Information Processing


(ISIP) at Mississippi State University has put two sets of course
notes on the web:

EE 4773/6773: Digital Signal Processing


The course covers sampling, frequency analysis, z-transforms,
filter design and more. The WWW site provides the syllabus,
assignments, some source code data, exams, homework and
solutions, lecture notes and more.

EE 8993: Fundamentals of Speech Recognition


The course covers background probability and
phonetics/acoustics, speech signal analysis, dynamic
programming, dynamic time warping, hidden Markov modelling,
language modelling, neural networks, etc. The WWW sites
provides the syllabus and lecture notes.

Signal Processing Home page

The Signal Processing Home page has information on a range of DSP


issues. It includes references to a range of software and much more.
http://tjev.tel.etf.hr/josip/DSP/sigproc.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (3 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Books and other References

There are many good books which discuss signal processing for speech:

* Digital processing of speech signals; L. R. Rabiner, R. W.


Schafer. Englewood Cliffs; London: Prentice-Hall, 1978
* Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill
1986
* Computer Speech Processing; ed Frank Fallside, William A. Woods
Englewood Cliffs: Prentice-Hall, c1985
* Digital speech processing : speech coding, synthesis, and
recognition edited by A. Nejat Ince; Kluwer Academic Publishers,
Boston, c1992
* Speech science and technology; edited by Shuzo Saito pub. Ohmsha,
Tokyo, c1992
* Speech analysis; edited by Ronald W. Schafer, John D. Markel, New
York, IEEE Press, c1979
* Applied Speech Technology Edited by: Ann Syrdal (AT&T Bell Labs,
Holmdel, New Jersey), Raymond Bennett (Ameritech, Hoffman Estates,
Illinois) and Steven Greenspan (AT&T Bell Labs, Murray Hill, New
Jersey). Publisher: CRC Press.
* Speech Communication: Human and Machine Douglas O'Shaughnessy,
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
* Discrete-time processing of speech signals; John R Deller, John G
Proakis, John H L Hansen; Macmillan 1993.
* Signal processing of speech; F J Owens; Macmillan 1993.

___________________________________________________________________________

Q2.6: Speech sampling and signal processing hardware

In addition to the following information, have a look at the Audio


File format document prepared by Guido van Rossum (see details in
Section 1.8).

Information is included on hardware for the following systems:

* Macintosh Audio Hardware


* PC Audio Hardware
* Unix Audio Hardware

Can anyone provide information for SGI, NeXT, other UNIX hardware and
any other PC soundcards?

Macintosh Audio Hardware - an overview

* Description: ALL Macintosh computers come with the ability to play


back sounds at any sample rate (sample rate conversion is done in
software.) Older machines have 8 bit stereo output (hardware runs
at 22254 samples/second). The newer machines have 16 bit stereo
hardare running at 44100 samples/second.
Most of the recent Macintosh computers come with sound input
hardware. There are probably exceptions to this, but the older and
some of the current low-end machines have 8 bit (linear) mono
hardware running at 22254.54 samples/second. All of the PowerPC,
AV, and the 500 series notebook computers come with 16 bit 44kHz
stereo sampling hardware. They can also record at 22050
samples/second. The sound manager implements an AGC (Automatic
Gain Control) function for the 8 bit hardware. The drivers have a
switch to turn off the AGC.
There are a number of DSP vendors that support high quality audio.
Generally this means quieter analog sections, and more IO formats

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (4 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

(AES/IBU, for example). Try DigiDesign and Spectral Innovations.


The software drivers for sound are described in "Inside Macintosh:
Sound". If you want to see some sample code check out the sources
for the Matlab "Sound and Image Toolbox". They can be found at

ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.
hqx

Routines that play and record sounds using the toolbox are
included (and interfaced to Matlab).

PC Audio Hardware

Note: new soundcards are becoming available all the time - the
information below is definately not up to date. Check out the
following newsgroups for up-to-date information.

* comp.sys.ibm.pc.soundcard
* comp.sys.ibm.pc.soundcard.GUS
* comp.sys.ibm.pc.soundcard.advocacy
* comp.sys.ibm.pc.soundcard.games
* comp.sys.ibm.pc.soundcard.misc
* comp.sys.ibm.pc.soundcard.music
* comp.sys.ibm.pc.soundcard.tech

The Soundcard WWW Site is an excellent source of information:

* http://www.wi.leidenuniv.nl/audio/

An good source of programs and information for soundcards is SimTel:

* http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

Additional information on PC soundcards is provided by the FAQ


postings for the comp.sys.ibm.pc.soundcard.misc newsgroup. These are
available by anonymous ftp from:
ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/

* Aria Soundcard FAQ


* Aria Soundcard Support List
* MIDI files software archives on the Internet
* Turtle Beach sound cards FAQ

Unix Audio Hardware

Could someone please provide information on the audio capabilities of


other Unix platforms?

Sun standard audio port: SPARC I & II

* Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample


rate. This provides telephone quality sampling.

Sun DBRI audio port (SPARC 10 & 20)

* Input and Output: Stereo (2 channels). 16-bit linear sampling.


Multiple sample rates (48000, 44100, 37800, 32000, 22050, 18900,
16000, 11025, 9600, 8000 Hz)

Silicon Graphics Audio

The Silicon Graphics audio Frequently Asked Questions (FAQ) is the


best place to get information on SGI audio capabilities and
programming. It provides information on connecting the audio output,
using the DSP capabilities, controlling the audio output, programming,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (5 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

useful software and more. It is available from:

* WWW: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
* News: comp.sys.sgi.misc
* Ftp: ftp://viz.tamu.edu/pub/sgi/faq/

Ariel Signal Processors

* Platform: Various
* Description: A range of signal I/O, A/D, D/A and DSP products are
available. There are too many to list.
* Contact: Ariel Corp.
433 River Road, Highland Park, NJ 08904.
Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124

___________________________________________________________________________

Q2.7: How do I convert to/from mu-law format?

Mu-law coding is a form of compression for audio signals including


speech. It is widely used in the telecommunications field because it
improves the signal-to-noise ratio without increasing the amount of
data. Typically, mu-law compressed speech is carried in 8-bit samples.
It is a companding technqiue. That means that carries more information
about the smaller signals than about larger signals.

On SUN Sparc systems have a look in the directory /usr/demo/SOUND.


Included are table lookup macros for ulaw conversions. [Note however
that not all systems will have /usr/demo/SOUND installed as it is
optional - see your system admin if it is missing.]

OR, here is some sample conversion code in C.

/**
** Signal conversion routines for use with Sun4/60 audio chip
**/

#include stdio.h

unsigned char linear2ulaw(/* int */);


int ulaw2linear(/* unsigned char */);

/*
** This routine converts from linear to ulaw
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
** Continuous PCM Companding Law," Villeret, Michel,
** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
** 1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/

#define ZEROTRAP /* turn on the trap as per the MIL-STD */


#define BIAS 0x84 /* define the add-in bias for 16 bit samples */
#define CLIP 32635

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (6 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

unsigned char
linear2ulaw(sample)
int sample; {
static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
int sign, exponent, mantissa;
unsigned char ulawbyte;

/* Get the sample into sign-magnitude. */


sign = (sample >> 8) & 0x80; /* set aside the sign */
if (sign != 0) sample = -sample; /* get magnitude */
if (sample > CLIP) sample = CLIP; /* clip the magnitude */

/* Convert from 16 bit linear to ulaw. */


sample = sample + BIAS;
exponent = exp_lut[(sample >> 7) & 0xFF];
mantissa = (sample >> (exponent + 3)) & 0x0F;
ulawbyte = ~(sign | (exponent << 4) | mantissa);
#ifdef ZEROTRAP
if (ulawbyte == 0) ulawbyte = 0x02; /* optional CCITT trap */
#endif

return(ulawbyte);
}

/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: 8 bit ulaw sample
** Output: signed 16 bit linear sample
*/

int
ulaw2linear(ulawbyte)
unsigned char ulawbyte;
{
static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
int sign, exponent, mantissa, sample;

ulawbyte = ~ulawbyte;
sign = (ulawbyte & 0x80);
exponent = (ulawbyte >> 4) & 0x07;

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (7 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

mantissa = ulawbyte & 0x0F;


sample = exp_lut[exponent] + (mantissa << (exponent + 3));
if (sign != 0) sample = -sample;

return(sample);
}

___________________________________________________________________________

Q2.8: Signal Processing Software

[Note: Question 1.9 lists speech laboratory environments and audio


editors, many of which provide basic and advanced signal processing
capabilities.]

Signal Processing Products

* SigLib from Numerix Ltd.

On the Web

The following sites provide lists of useful DSP software. Not all the
software is directly applicable to speech processing.

comp.dsp FAQ
http://www.bdti.com/faq/dsp_faq.htm

DSP Internet Resources


http://www.eg3.com/
http://www.eg3.com/dsp.htm

Poynton's Digital Signal Processing Resource List


http://www.inforamp.net/~poynton/Poynton-dsp.html

WWW Pages Relating to Sound Computation


http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html

Yahoo - Signal and Image Processing


http://www.yahoo.com/Science/Engineering/Electrical_Engineering
/Signal_and_Image_Processing/

Sound Related Resources


http://pscinfo.psc.edu/~geigel/menus/sound.html

SPLIB: Signal Processing url LIBrary


http://jazz.rice.edu/splib/

Wavelet's Home Page


http://www.mat.sbg.ac.at/~uhl/wav.html

SigLib from Numerix Ltd.

* Platform: Windows, Unix and all major DSPs


* Description: SigLib is an ANSI C Source DSP Library and includes
functions for the following areas : spectrum analysis, windowing,
filtering (fixed and adaptive coefficient), convolution,
correlation, covariance, signal generation, statistical analysis,
regression analysis, communications and modulation, digital
effects, vectors processing, control, graphics and file I/O.
Detailed product information and a description of the application
of SigLib to speech processing is provided on the Numerix Ltd. WWW
site.
* Availability: A free demonstration of SigLib V2.0 is available
from the Numerix Ltd. WWW site. Educational discount is available
for SigLib.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (8 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

* Contact: Numerix Ltd.,


157 Sileby Road, Barrow-on-Soar, Leics, LE12 8LW, UK.
Phone/Fax : +44 (0)1509 413195
Email: numerix@numerix.co.uk
WWW: http://www.numerix.co.uk/

___________________________________________________________________________

Speech Coding and Compression

comp.speech FAQ Section 3

* SpeechLinks: Speech Coding


* Q3.1: Speech compression techniques
* Q3.2: Information on speech coding and compression
* Q3.3: Speech Compression / Coding Software

___________________________________________________________________________

Q3.1: Speech compression techniques

Provided by Tony Robinson:

The aim of speech compression is to produce a compact representation


of speech sounds such that when reconstructed it is perceived to be
close to the original. The two main measures of closeness are
intelligibility and naturalness.

The standard reference point is toll quality speech, this is the same
as what would be expected over a telephone line, for example, speech
coded at 8 kHz using 8 bit ulaw coding and a maximum frequency of
about 3.3 kHz. This is a bit rate of 64 kbps, and as such represents a
compressed form over (say) 16 bit, 16 kHz speech which is the standard
in speech recognition work.

ulaw coding does not exploit the (normally large) sample to sample
correlations found in speech. ADPCM is the next family of speech
coding techniques, and does exploit this redundancy by using a simple
linear filter to predict the next sample of speech. The resulting
prediction error is typically quantised to 4 bits thus giving a bit
rate of 32 kbps (see, for example, the software in Q3.3: 32 kbps
ADPCM, G.711/721/723 Compression, shorten). The advantages of ADPCM
are that is simple to implement and has very low delay.

To obtain more compression specific properties of the speech signal


must be modelling. The main assumption is known as the source filter
model of speech production. This assumes that a source (voicing or
fricative excitation) is passed through a filter (the vocal tract
response) to produce the speech. The simplest implementation of this
is known as a LPC synthesiser (e.g. LPC10e). At every frame the speech
is analysed to compute the filter coefficients, the energy of the
excitation, a voicing decision, and a pitch value if voiced. At the
decoder a regular set of pulses for voiced speech or white noise for
unvoiced speech is passed through the linear filter and multiplied by
the gain to produce the speech. This is a very efficient system and
typically produces speech coded at 1200-2400bps. With clever acoustic
vector prediction this can be reduced to 300-600bps. The disadvantages
are a loss of naturalness over most of the speech and occasionally a
loss of intelligibility.

The CELP family of coders compensates for the lack of quality of the
simple LPC model by using more information in the excitation. Each of
a set of codebook of excitation vectors is tried and the index of the
one that best matches the original speech is transmitted. This results
in an increase in the bit rate to typically 4800-9600bps. Most speech

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (9 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

coding research is currently directed towards CELP coders. (See, for


example, CELP 3.2a, a TMS implementation, a G.728 LD-CELP vocoder, and
the L&H implementation.

___________________________________________________________________________

Q3.2: Information on speech coding and compression

Reference Books

The following books cover speech coding/compression.

* Douglas O'Shaughnessy, Speech Communication: Human and Machine,


Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
* Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech
Processing. London: Prentice/Hall International, 1985. N. S.
Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall,
ISBN 0-13-211913-7 01, 1984.
* W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis,
Elsevier, Amsterdam, 1995.
Contents, preface etc on the WWW:
http://www.elsevier.nl/section/engtech/scs/menu.htm
* Thomas P. Barnwell, Kambiz Nayebi and Craig H Richardson, Speech
Coding: A Computer Laboratory Textbook, John Wiley and Sons Inc,
1996.
* Schuyler R Quackenbush, Tom P Barnwell III, Mark A Clements,
Objective Measures of Speech Quality, Prentice-Hall, 1988.

And the are good tutorial articles.

* Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the


IEEE 63 (1975): 561 - 580.

On the WWW

comp.compression FAQ
Includes a few questions and answers on the compression of
speech.
ftp://rtfm.mit.edu/pub/usenet/comp.compression/

Tony Robinson's Speech Analysis Course


A complete course on speech analysis, including some stuff on
speech coding.
http://svr-www.eng.cam.ac.uk/~ajr/SA95/
http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html

ITU Coding Standards


Members of the ITU (International Telecommunications Union) can
obtain copies of the Series G Recommendations (including
G.711/721/723/728) from the ITU WWW site (http://www.itu.ch/)
and from http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html.

Jason Woodard's Speech Coding Page


Introduction to speech coding plus information on a series of
speech coding standards.
http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html

WWW searchable online-bibiliography for Phonetics and Speech


Technology
Over 8000 entries provided by Institut fur Phonetik at Johann
Wolfgang Goethe-Universitat Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html

Ciaran McElroy's Speech Coding Page


Introduction to many types of speech coding.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (10 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.h
tml

Examples of speech coding

Nam Phamdo's Speech Coding Demonstration


Examples of ADPCM, LD-CELP, CELP, LPC10 and CELP coding and
coding over a noisy channel.
http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html

Phil Karn's Digital/Analog Voice Demo


Examples of several speech coding systems.
http://www.qualcomm.com/people/pkarn/voicedemo/

___________________________________________________________________________

Q3.3: Speech Compression / Coding Software

The following speech compression software is described in the FAQ.

* 32 kbps ADPCM
* Castleton Network Systems - G.729 Voice Coder
* CELP 3.2a & LPC-10
* 8 Kbit/s CELP on the TMS320C5x family of DSP chips
* CyberVoice
* Rockwell's DigiTalk
* File format conversion
* G.711/721/723 Compression
* G.728 LD-CELP vocoder
* G.728 Compression
* GSM 06.10 Compression
* Lernout & Hauspie Speech Coding (5 products)
* Lernout & Hauspie Speech Coding SDK
* MPEG Audio
* shorten - a lossless compressor for speech signals
* Sipro Lab Telecom Inc. Coding
* Sonarc: Digital Audio Compression
* StarAudio Compressor/Player
* TrueSpeech from DSP Group
* U.S.F.S. 1016 CELP vocoder for DSP56001
* ToolVox from Voxware

32 kbps ADPCM

* Platform: SGI and Sun Sparcs


* Description: 32 kbps ADPCM C-source code (G.721 compatibility is
uncertain)
* Contact: Jack Jansen
* Availablity: http://www.cwi.nl/ftp/audio/adpcm.shar

Castleton Network Systems - G.729 Voice Coder

* Platform: TI TMS320C5x DSP


* Description: G.729, also called CS-ACELP (Conjugate-Structure
Algebraic Code Excited Linear Prediction), is a state-of-the-art
voice compression ITU (International Telecommunications Union)
standard that can be used in a wide range of applications
including wireless communications, digital satellite systems,
packetized speech and digital leased lines. G.729 provides 8000
bits/s bandwidth for compressed speech at toll quality (equivalent
to G.726 32 kbit/s ADPCM under clean channel condition). Also,
G.729 has lower complexity and lower bit rate than G.728.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (11 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

The Castleton G.729 implementation provides a bit-exact


implementation of the ITU standard on a single TI TMS320C5x DSP.
The software is C callable and fully re-entrant, which allows easy
interfacing and multi-channel capability. The encoder and decoder
are fully independent, therefore, a DSP device can run a number of
full-duplex or half-duplex channels. The coder and the decoder are
able to operate under a real-time task switching kernel.
* Cost and Availablity: Contact Castleton Network Systems.
* Contact: Castleton Network Systems Corporation
350 Terry Fox Drive, Kanata, Ontario, Canada K2K 2W5
Ph: 613-591-8786, Fax: 613-591-8783
Email: inquire@castleton.com
WWW: http://www.castleton.com/

CELP 3.2a & LPC-10

* Platform: Sun (the makefiles and source can be modified for other
platforms)
* Description: CELP is lossy compression technqiue. The US
Department of Defences's Federal-Standard-1016 based 4800 bps code
excited linear prediction voice coder version 3.2a (CELP 3.2a).
Fortran and C simulation source codes.
* Availability: By anonymous ftp from:
ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
Or from the comp.speech ftp server
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.gz
LPC-10 Fortran source code is also available:
ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
Here is a modified LPC-10 release that includes ANSI C source:
http://www.arl.wustl.edu/~jaf/lpc/
* Documentation: The following articles describe the
Federal-Standard-1016 4.8-kbps CELP coder:
+ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder,"
Digital Signal Processing, Academic Press, 1991, Vol. 1, No.
3, p. 145-155.
+ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard
1016)," in Advances in Speech Coding, ed. Atal, Cuperman and
Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p.
121-133.
The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400
bps linear prediction coder (LPC-10) was republished as a Federal
Information Processing Standards Publication 137 (FIPS Pub 137).
It is described in:
+ Thomas E. Tremain, "The Government Standard Linear Predictive
Coding Algorithm: LPC-10," Speech Technology Magazine, April
1982, p. 40-49.
There is also a section about FS-1015 in the book:
+ Panos E. Papamichalis, Practical Approaches to Speech Coding,
Prentice-Hall, 1987.
The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
described in:
+ Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced
Classification of Speech with Applications to the U.S.
Government LPC-10E Algorithm," Proceedings of the IEEE Intl.
Conf. on Acoustics, Speech, and Signal Processing, 1986, p.
473-6.
* Vendors:
Realtime DSP code for FS-1015 and FS-1016 is sold by:
+ John DellaMorte, DSP Software Engineering
165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (12 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Ph: 1-617-275-3733 Fax: 1-617-275-4323


Email: dspse.bedford@channel1.com
DSP Software Engineering's FS-1016 code can run on a DSP
Research's Tiger 30 (a PC board with a TMS320C3x and analog
interface suited to development work).
+ DSP Research
1095 E. Duane Ave, Sunnyvale, CA 94086, USA
Ph: (408)773-1042 Fax: (408)736-3451

8 Kbit/s CELP on the TMS320C5x family of DSP chips

* Description: For low bandwidth transmission of voice, compact


voice storage for archival purposes, low-cost digital answering
machines and efficient storage for voice mail. Features :
+ near toll quality at 8 Kb/s.
+ Variable rate option with 1 Kb/s silence encoding.
+ Implemented on a fixed-point processor for lower system cost.
+ Attractive licensing scheme.
+ Future availability of 4 Kb/s.
+ Custom rates possible.
Capacity :
+ Two half-duplex or one full duplex channels on the 20 MIPS
'C5x (at 95% and 55% CPU utilization respectively).
+ Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU
utilization).
+ Requires 9 K-words program memory and 3 K-words data memory.
+ Decoding in real-time on a 486 class CPU.
* Contact:

CVI Inc.
443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
Tel: (604) 987 1719 Fax: (604) 986 8139
Email: cvi@extropia.wimsey.com

CyberVoice

* Description: Cybernetics InfoTech, Inc. offers the following


products
+ Telephone voice compression at 1.2, 2.4, 4.8 and 6.0 kbit/s
with good-communications-quality to near-toll-quality coded
voice;
+ Wideband voice (7-kHz bandwidth) compression at 16 kbit/s
with near-original-quality coded voice;
+ Internet Voice E-mail software with voice editing,
high-quality low-data-rate voice compression, fast/slow voice
playback, and more.
* Availablity: C code and Windows .DLL for telephone voice
compression and wideband voice compression are available for
licensing.
Real-time DSP codes are under development.
Voice E-mail software is available for purchase and download from
the CyberVoice home page.
* Contact: Cybernetics InfoTech, Inc.
2 Professional Dr., #228, Gaithersburg, MD 20879
WWW: http://www.cybit.com/
E-mail: info@cybit.com
Fax: 301-590-0359

Rockwell's DigiTalk

* Description: The DigiTalk coder operates at a sampling rate of

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (13 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

8KHz and transmits 223 bits of coded speech every 26ms, giving an
overall bit rate of 8.577Kbps. The algorithm is based on
analysis-by-synthesis predictive coding with vector-coded
excitation, in which the excitation signal is optimized by
minimizing the perceptually weighted error between the original
and synthesized speech. More information and results of perceptual
tests are available on the WWW.
* Availablity: See the WWW page:
http://www.nb.rockwell.com/ref/digitalk/

File format conversion

* Platform: SUN OS?


* Description: Conversion utility able to encode and decode between
the the following formats: G.723, G.721, A-law, u-law and linear.
* Availability: By anonymous ftp from

ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z

G.711/721/723 Compression

* Description:
+ G.711 : CCITT u-law and A-law compression
+ G.721 : CCITT 32 kbps ADPCM coder
+ G.723 : CCITT 24 kbps and 40 kbps ADPCM coders
* Availability: By email to itudoc@itu.ch, with
GET ITU-3022
as the *only* line in the body of the message.
It is also available by anonymous ftp from:

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G
721_G723.tar.Z

G.728 LD-CELP vocoder

* Platform: Analog Devices ADSP-2171


* Description: Real-time, full-duplex G.728 LD-CELP vocoder that
runs on a single Analog Devices ADSP-2171. Source and object code
available for a one-time license fee.
* Contact:

Cole Erskine
Analogical Systems
299 California Avenue, Suite 120
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
email: cole@analogical.com

G.728 Compression

* Description: G.728 low delay celp package written by Alex Zatsman


of Analog Devices, Inc.
* Availability: By anonymous ftp from

ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz

GSM 06.10 Compression

* Platform: Unix; faster than real time on most Sun SPARCstations

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (14 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

* Description: GSM 06.10 is a standardized lossy speech compression


employed by most European wireless telephones. It uses RPE/LTP
(residual pulse excitation/long term prediction) coding to
compress frames of 160 13-bit samples (8 kHz sampling rate, i.e. a
frame rate of 50 Hz) into 260 bits.
* Contact: GSM 06.10 support and implementation
_jutta@cs.tu-berlin.de_, cabo@cs.tu-berlin.de
* Availability: The following configurations are available be
anonymous ftp:

gzip compression from Germany:


ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.
0.7.tar.gz

MS-DOS compression from Germany:


ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ddj/gs
m-107.zip

MS-DOS compression from USA:


ftp://ftp.mv.com/pub/ddj/1194.12/gsm-105.zip

* Misc: The WWW site is

http://www.cs.tu-berlin.de/~jutta/toast.html

Lernout & Hauspie Speech and Music Coding Product Range

* Product name: L&H.smc650: 32kbps ADPCM Speech coding


+ Implementation of ADPCM 32 kbps based on CCITT G721 standard.
+ Estimated quality: 4.1 MOS (Mean Opinion Score)
+ Hardware Example: Analog Devices ADSP2101
+ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
signal with up to 16 bits per sample; 8 kHz sampling rate
* Product name: L&H.smc550: LD-CELP 16 kbps speech coding
+ Proprietary implementation of LD-CELP 16 kbps based on CCITT
G728 standard.
+ Estimated quality: 4.0 MOS (Mean Opinion Score)
+ Hardware Example: Motorola 5600X
+ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
signal with up to 16 bits per sample; 8 kHz sampling rate
* Product name: L&H.smc450: 16-17.5 kbps speech coding
+ Estimated Quality: 3.9 MOS (Mean Opinion Score)
+ Hardware Examples: Analog Devices ADSP2101, Intel 486 DX2/66
MHz
+ Input / Output Signal: A-Law or mu-Law PCM (64 kbps); Linear
signal with up to 16 bits per sample; 8 kHz sampling rate.
* Product name: L&H.smc350: 4.8-9.6 kbps speech coding
+ Proprietary CELP based software for compression rates of 4.8
kbps to 9.6 kbps
+ Estimated Quality: 3.5 MOS (Mean Opinion Score)
+ Hardware Examples: AT&T DSP32C
+ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
signal with up to 16 bits per sample; 8 kHz or 11.025kHz
sampling rate.
* Product name: L&H.smc250: 2.4 kbps speech coding
+ Combination of multi band excitation and code book excited
linear prediction.
+ Estimated Quality: 3.0 MOS (Mean Opinion Score).
+ Hardware Examples: Intel 486 DX2/66 MHz, Analog Devices
ADSP2101
+ Input signal: A-Law or mu-Law PCM (64 kbps); Linear signal
with 12-15 bits per sample; 8 kHz sampling rate.
+ Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal
with 12-15 bits per sample; 8 kHz sampling rate.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (15 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

* See also: L&H Speech Coding SDK


* More Information: On the WWW: http://www.lhs.com/coding.html
* Cost: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Lernout & Hauspie Speech Coding SDK

* Description: Windows based software development kit for


integrating speech coding technology with Windows based PC
applications.
* Requirements: IBM-compatible 486 DX/33 MHz + 2MB RAM + MS DOS 5.0
+ MS Windows 3.1 (or higher) + Sound Blaster compatible sound
board.
* See also: L&H Speech Coding Products
* More Information: On the WWW: http://www.lhs.com/coding.html
* Cost: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

MPEG Audio

MPEG (Moving Pictures Experts Group) is a standard methods for


compression and transmission of digital video and audio. Detailed FAQs
and WWW sites are available for MPEG:

MPEG Pointers and Resources


http://www.mpeg.org/

FAQ by Luigi: http://www.crs4.it/~luigi/MPEG/mpegfaq.html

FAQ by Frank Gadegast


http://www.powerweb.de/mpeg/mpegfaq/

FAQ by by Chad Fogg


http://www-plateau.cs.berkeley.edu/mpegfaq/MPEG-2-FAQ.html

How to Install an MPEG Audio Player for your Web Navigator


http://www.mpeg.org/index.html/MPEG-audio-player.html

MPEG Audio Software on the WWW

Audio and Music Applications for Silicon Graphics Systems


Lists 4 MPEG audio players for SGI machines.
http://reality.sgi.com/employees/cook/audio.apps/public.html

MPEG-1 Audio Layer 3 encoder, decoder and FAQ


From the Fraunhofer Institute
http://www.iis.fhg.de/departs/amm/layer3/index.html

MPEG-2 Audio FAQ from Philips


http://www.keymodules.philips.com/MD/mpeg/faqmpeg2.htm

MPEG-1 and MPEG-2 audio software


Universitaet Hannover
ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (16 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

MPEG-1 Audio Layer 1 &2 encoder - decoder


Internet Underground Music Archive (IUMA)
ftp://ftp.iuma.com/audio_utils/converters/source/

Buddy Software Library: MPEG-1 Audio Layer 3 encoder and


player
http://www.buddy.org/softlib.html

MPEG-1 Audio Layer 1 & 2 decoder and verifier at CCETT


ftp://ftp.ccett.fr/pub/mpeg/audio_new/

MPEG-2 Audio encoder and decoder at CCETT


ftp://ftp.ccett.fr/pub/mpeg/mpeg2/

MPEG Audio - MetaSound

* Platform: MS Windows/3.1 and Windows/95


* Description: MetaSound is a partial MPEG-1 software decoder which
is designed to work with hardware video decoders. It can reduce
the hardware cost by eliminating the need for a hardware audio
decoder. Currently, MetaSound has been successfully incorporated
to work with three hardware video decoders. Features
+ Performance: For 486 DX4-100 machines or above, MetaSound can
deliver FM quality (22 KHz) sound. For Pentium-90 or above
machines, MetaSound requires 40% CPU bandwidth to deliver CD
quality (44.1 KHz) sound.
+ Portability: it can take less than one month to port to new
hardware video decoders.
+ CD standard supports including Video CD 1.0, Video CD 2.0,
and CDI.
+ User interface with full set of functions: volume control,
stop, pause, forward, backward, mute, resume, select the
previous/next program track (Video CD 2.0), randomly select a
program track (Video CD 2.0).
+ Error Recovery: can automatically skip error bitstreams.
* Contact: Meta Media, Inc.
F8, #10-1, Ho-Ping East Rd. Sec. 1, Taipei, Taiwan, R.O.C.
Ph: 011-886-2-369-3330, Fax: 011-886-2-369-3331
Email: mmedia@ms4.hinet.net.tw

shorten - a lossless compressor for speech signals

* Platform: UNIX/DOS
* Description: A fast waveform coder suitable for a speech and music
signals in a wide variety of file formats. The degree of
compression is adjustable from lossless to three bits a sample.
16bit 16kHz speech generally attains 50% lossless compression and
16:3 compression of CDROM quality speech is obtainable with only
minor audiable degredation.
* Availability: Anonymous ftp - UNIX and DOS versions

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
n.tar.gz

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
n.tar.Z

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
n.zip

Sipro Lab Telecom Inc. Coding

* Platform: Various processors

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (17 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

* Description: Coding software for several International Standards


plus two Proprietary standards.
International Standards
1. PCS 1900 (a 13 kbps codec, established as a North American
PCS standard)
2. Enhanced GSM (a 13 kbps codec)
3. G.723 (8 kbps codec established as a multi-purpose
international standard)
4. G.729 (a dual-rate codec for the video phone market)
5. G.729 Annex A (8 kbps codec made for Digital Simultaneous
Voice & Data transmission in the modem industry).

Proprietary Standards
1. ACELP 8 v2.0 codec (flexible dual rate codec equipped with a
VAD)
2. ACELP 4.8 codec
* Contact: Sipro Lab Telecom Inc.
770, Chemin Lucerne, Ville Mont-Royal (Quebec), H3R 2H6 CANADA
Ph: (514) 737-5874, Fax: (514) 737-2327
E-mail: sales@sipro.com
WWW: http://www.sipro.com/

Sonarc: Digital Audio Compression

* Platform: DOS and Windows


* Description: Sonarc provides reversable, variable-rate compression
of audio signals. Obtains compression ratio which averages about
2:1. Supports monaural and stereo files, 8-bit and 16-bit files,
and WAVE and VOC formats.
* Availablity: Shareware by Richard P. Sprague
Speech Compression
P.O. Box 1785, Wilsonville, OR, 97070-1785, USA
Ph: (503) 263-3102
Email: 76635.3652@compuserve.com

StarAudio Compressor/Player

* Platform: Win95
* Description: Using a time-domain process delivers lossless
decompressed data. Processes any source of .wav file format, high
quality 16-bit audio data at any sampling rate. Requires no
special hardware and decompression speed is real-time on most
486's and on any Pentium. The higher the sampling rate the higher
the compression ratio; minimum compression of 4:1 for 11k data,
and usually exceeding 7:1 for 44k data. Full bandwidth of signal
is preserved with default compression options. Compression options
allow increase of compression ratio further with a slight trade
off in the reduction of the output quality. A decompression
library is available for application development.
* Demo: Download the shareware version of the program from the STR
WWW site.
* Misc: A technical paper is available in Word 6.0 format:
ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
* Contact: Speech Technology Research Ltd.,
Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
Ph: +1-250-477-0544
Email: products@speechtech.com
WWW: http://www.speechtech.com/home/speechtech/

TrueSpeech from DSP Group

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (18 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

* Description: TrueSpeech is a family of speech compression and


decompression algorithms and software. It is designed for personal
computers and personal communications devices. With the high
compression ratios ranging from 15:1 to 27:1, TrueSpeech improves
the storage and communications transmission of digital voice
information and can be used in the integration of personal
computers and telephones. TrueSpeech can be utilized in many
products and applications such as:
+ Multimedia PCs
+ Sound cards and modems
+ Computer/telephony and teleconferencing
+ Voice mail systems and PBX systems
+ Wireless/cellular applications
+ Personal digital assistants
+ Games, Education
+ Video/cable and on-line services
The TrueSpeech encoder is available for free in the Sound System
of Windows 95 and Windows NT. The DSPG WWW pages have information
on how to add TrueSpeech capability to your WWW pages.
* Contact: DSP Group, Inc.
3120 Scott Boulevard, Santa Clara, CA 95054-3317, USA
Phone: (408) 986-4300 Fax: (408) 986-4323
Email: Webster@dspg.com
WWW: http://www.dspg.com/index.html

U.S.F.S. 1016 CELP vocoder for DSP56001

* Platform: DSP56001
* Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a
single 27MHz Motorola DSP56001. Free demo software available for
PC-56 and PC-56D. Source and object code available for a one-time
license fee.
* Contact:

Cole Erskine
Analogical Systems
299 California Avenue, Suite 120
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
Email: cole@analogical.com

ToolVox from Voxware

* Platform: Windows and soon available on Mac (in Beta now) and Unix
* Description: ToolVox is a proprietary frequency domain speech
coder. 11 KHz speech is coded to an average rate of between 5,000
bits per second and 9,000 bps. Real-time compression algorithms
available for 2,400 bps. 22 KHz playback, as well as a ultra low
bit rate 8 KHz codec are coming soon. On playback, the time scale
can be changed by a 5x factor, pitch can be modified over a 3
octave range, and vocal personality can be modified using a
tranformation function called VoiceFonts(tm).
* Misc 1: A SDK for Windows is available.
* Misc 2: Demo software is available from the Voxware Inc WWW page:
http://www.voxware.com/
* Price: Basic toolkit is $895 US. OEM and mass distribution
licenses are separate. Ordering information is provided on the
Voxware WWW server.
* Contact:

Voxware, Inc.
Ph: (609) 497-1212 Fax: (609) 497-2490

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (19 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Sale information: sales@voxware.com


WWW: http://www.voxware.com/

___________________________________________________________________________

Natural Language Processing

comp.speech FAQ Section 4

There is now a newsgroup specifically for Natural Language Processing;


comp.ai.nat-lang. A FAQ posting is available for the group:

ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language
_Processing_FAQ

There is also a lot of useful information on Natural Language


Processing in the comp.ai FAQ. That FAQ lists available software and
useful references. It includes a substantial list of software,
documentation and other info available by ftp.

The FAQ has information on the following:

* Q4.1: NLP References and Books


* Q4.2: NLP Software

___________________________________________________________________________

Q4.1: NLP References and Books

Take a look at the FAQ for the "comp.ai" newsgroup as it also includes
some useful references.

* James Allen: Natural Language Understanding, (Benjamin/Cummings


Series in Computer Science) Menlo Park: Benjamin/Cummings
Publishing Company, 1987.
+ This book consists of four parts: syntactic processing,
semantic interpretation, context and world knowledge, and
response generation.
* G. Gazdar and C. Mellish, Natural Language Processing in Prolog,
Addison Wesley, 1989
* G. Gazdar and C. Mellish, Natural Language Processing in Lisp,
Addison Wesley, 1989
* G. Gazdar and C. Mellish, Natural Language Processing in Pop11,
Addison Wesley, 1989
+ Emphasis on parsing, especially unification-based parsing,
lots of details on the lexicon, feature propagation, etc.
Fair coverage of semantic interpretation, inference in
natural language processing, and pragmatics; much less
extensive than in Allen's book, but more formal. There are
three versions, one for each programming language listed
above, with complete code.
* Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1
and 2. New York: John Wiley & Sons, 1990.
+ There are articles on the different areas of natural language
processing which also give additional references.
* Paris, Ce'cile L.; Swartout, William R.; Mann, William C.: Natural
Language Generation in Artificial Intelligence and Computational
Linguistics. Boston: Kluwer Academic Publishers, 1991.
+ The book describes the most current research developments in
natural language generation and all aspects of the generation
process are discussed. The book is comprised of three
sections: one on text planning, one on lexical choice, and
one on grammar.
* Readings in Natural Language Processing, ed by B. Grosz, K. Sparck
Jones and B. Webber, Morgan Kaufmann, 1986

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (20 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

+ A collection of classic papers on Natural Language


Processing. Fairly complete at the time the book came out
(1986) but now seriously out of date. Still useful for ATN's,
etc.
* Klaus K. Obermeier, Natural Language Processing Technologies in
Artificial Intelligence: The Science and Industry Perspective,
Ellis Horwood Ltd, John Wiley & Sons, Chichester, England, 1989.

The following are extensive bibliographies related to NLP:

* Computational Parsing : Syntactic Analysis, Semantic Analysis,


Semantic Interpretation, Parsing Algorithms, Parsing Strategies :
BIBLIOGRAPHY, by Conrad F. Sabourin 1994, 2 volumes, 1029p, ISBN
2-921173-02-6, INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal,
H3X 3T4, Canada.
* Computational Text Understanding : Natural Language Programming,
Argument Analysis : BIBLIOGRAPHY, by Conrad F. Sabourin 1994,
657p, ISBN 2-921173-06-9, INFOLINGUA inc., P.O. Box 187 Snowdon,
Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html
* Computational Text Generation : Generation from data or Linguistic
Structure, Text Planning, Sentence Generation, Explanation
Generation : BIBLIOGRAPHY, by Conrad F. Sabourin with a survey
article by Mark T. Maybury 1994, 649p, ISBN 2-921173-07-7,
INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html
* Natural Language Processing : Interfaces to Databases, to Expert
Systems, to Robots, to Operating Systems, and to
Question-Answering Systems : BIBLIOGRAPHY, by Conrad F. Sabourin,
1994, 2 volumes, 847p, ISBN 2-921173-08-5 INFOLINGUA inc., P.O.
Box 187 Snowdon, Montreal, H3X 3T4, Canada
See also: http://gomer.mlink.net/infolingua.html

Journals

The major journals of the field are

* Computational Linguistics and _Cognitive Science_ for the


artificial intelligence aspects,
* Cognition for the psychological aspects,
* Language and _Linguistics and Philosophy_ and Linguistic Inquiry
for the linguistic aspects.
* Artificial Intelligence occasionally has papers on natural
language processing.

Conferences

The major NLP conferences are

* ACL: held annually


* COLING: held biannually

Most AI conferences have a NLP track; AAAI, ECAI, IJCAI and the
Cognitive Science Society conferences usually interesting for NLP.
CUNY is an important psycholinguistic conference. Other conferences
include NELS, the conference of the Chicago Linguistic Society (CLS),
WCCFL, LSA, the Amsterdam Colloquium, and SALT.

___________________________________________________________________________

Q4.2: NLP Software

Natural Language Software Registry (NLSR) - NLP Tools

* The Natural Language Software Registry is available from the


German Research Institute for Artificial Intelligence (DFKI) in

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (21 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Saarbrucken. Its purpose is to facilitate the exchange and


evaluation of natural language processing software within the
research community. To this end, the NLSR is cataloging natural
language software projects, both commercial and non- commercial.
The new updated and enlarged version contains more than 100
descriptions of natural processing software. Registry listings
include:
+ speech signal processors, such as the Computerized Speech Lab
(Kay Elemetrics)
+ morphological analyzers, such as PC-KIMMO (Summer Institute
for Linguistics)
+ parsers, such as Alveytools (University of Edinburgh)
+ semantic and pragmatic analyzer, such as NLL (University of
the Saarland, Germany)
+ generation programs, such as FUF (Ben Gurion University of
the Negev)
+ knowledge representation systems, such as Rhet (University of
Rochester)
+ multicomponent systems, such as ELU (ISSCO), PENMAN (ISI),
Pundit (UNISYS), SNePS (SUNY Buffalo),
+ NLP-Tools, such as GULP (University of Georgia) or Linguist
(Kansai Research Laboratory)
+ applications programs (misc.)
* If you have developed a piece of software for natural language
processing that other researchers might find useful, you can
include it by returning the questionnaire available from the
sources below.
* ftp://ftp.dfki.uni-sb.de/pub/registry
* e-mail: registry@dfki.uni-sb.de
* Natural Language Software Registry
Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
Stuhlsatzenhausweg 3
D-66123 Saarbruecken
Germany
* Other ftp sites are

ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy

ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry

Part of Speech Tagger

* Description: A rule-based part of speech tagger developed by Eric


Brill.
* Availability: The tagger software, about 10 descriptive papers and
related data are available by anonymous ftp from
ftp://ftp.cs.jhu.edu/pub/brill/

___________________________________________________________________________

Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.


This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
long as it is posted in its entirety and includes this copyright statement.
This FAQ may not be distributed for financial gain.
This FAQ may not be included in any collections or compilations
without express permission from the author.

---

Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories Ph: (508) 442-2681
2 Elizabeth Drive, MS UCHL03-207 Fax: (508) 250-5067
Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (22 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (23 of 23) [10/31/2003 8:41:18 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 3/3
From: andrew.hunt@east.sun.com (Andrew Hunt)
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Followup-To: comp.speech
Organization: Speech Applications Group, Sun Microsystems Laboratories
Summary: Information on Speech Technology
Approved: news-answers-request@MIT.Edu

Archive-name: comp-speech-faq/part3
Last-modified: 1997/09/06
URL: http://www.speech.su.oz.au/comp.speech/

COMP.SPEECH FAQ POSTING - PART 3/3

[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]

Speech Synthesis

comp.speech FAQ Section 5

* SpeechLinks: Speech Synthesis


* Q5.1: What is speech synthesis?
* Q5.2: How can speech synthesis be performed?
* Q5.3: References/Books on Synthesis
* Q5.4: Speech Synthesis on the WWW
* Q5.5: Speech Synthesis Software/Hardware

___________________________________________________________________________

Q5.1: What is speech synthesis?

Speech synthesis programs convert written input to spoken output by


automatically generating synthetic speech. Speech synthesis is often
referred to a "Text-to-Speech" conversion (TTS).

___________________________________________________________________________

Q5.2: Performing speech synthesis

There are several algorithms. The choice depends on the task they're
used for. The easiest way is to just record the voice of a person
speaking the desired phrases. This is useful if only a restricted
volume of phrases and sentences is used, e.g. messages in a train
station, or schedule information via phone. The quality depends on the
way recording is done.

More sophisticated but worse in quality are algorithms which split the
speech into smaller pieces. The smaller those units are, the less are
they in number, but the quality also decreases. An often used unit is
the phoneme, the smallest linguistic unit. Depending on the language
used there are about 35-50 phonemes in western European languages,
i.e. there are 35-50 single recordings. The problem is combining them
as fluent speech requires fluent transitions between the elements. The
intellegibility is therefore lower, but the memory required is small.

A solution to this dilemma is using diphones. Instead of splitting at


the transitions, the cut is done at the center of the phonemes,
leaving the transitions themselves intact. This gives about 400
elements (20*20) and the quality increases.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (1 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

The longer the units become, the more elements are there, but the
quality increases along with the memory required. Other units which
are widely used are half-syllables, syllables, words, or combinations
of them, e.g. word stems and inflectional endings.

The Museum of Speech Analysis and Synthesis has pictures of artificial


speech systems going back over 150 years: worth a visit. (
http://mambo.ucsc.edu/psl/smus/smus.html)

___________________________________________________________________________

Q5.3: References/Books on Synthesis

Books and Papers

* Thierry Dutoit, An Introduction to Text-to-Speech Synthesis,


Kluwer Academic Publishers (Dordrecht), 1997, ISBN 0-7923-4498-7,
312 pages. Volume 3 in the series on Text, Speech and Language
Technology.
* Douglas O'Shaughnessy, Speech Communication: Human and Machine
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
* T.V. Raman, Auditory User Interfaces --Toward The Speaking
Computer Kluwer Academic Publishers, Boston, ISBN 0-7923-9984-6,
August 1997, 168 pp.
* D. H. Klatt, "Review of Text-To-Speech Conversion for English",
Jnl. of the Acoustic Society of America (JASA), Vol 82, pp
737-793.
* "Talking Machines, Theories, Models and Designs" Eds, G. Bailly &
C. Benoit (Elsevier: North Holland)
* I. H. Witten. Principles of Computer Speech, London: Academic
Press, Inc., 1982.
* W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis,
Elsevier, Amsterdam, 1995.
Contents, preface etc on the WWW:
http://www.elsevier.nl/section/engtech/scs/menu.htm
* John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
Speech: The MITalk System", Cambridge University Press, 1987.
* J.P.H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg,
"Progress in Speech Synthesis", Springer, 1996.

On the WWW

* Survey of the State of the Art in Human Language Technology


Report edited by Ronald A. Cole et. al. with a section on
Text-to-Speech Technologies.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html

Bibliographies and Reference Lists

* WWW searchable online-bibiliography for Phonetics and Speech


Technology with more than 8000 entries. Provided by Institut fur
Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
* Computational Speech Processing: Speech Analysis, Recognition,
Understanding, Compression, Transmission, Coding, Synthesis ; Text
to Speech Systems, Speech to Tactile Displays, Speaker
Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html

___________________________________________________________________________

Q5.4: Speech Synthesis on the WWW

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (2 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Most of the following are links to WWW pages with demonstrations of


speech synthesis. Plenty more links are included in the detailed list
of speech synthesis software/hardware in Q5.5.

Speech Synthesis "Museum"


URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the
University of Birmingham.
Information and speech samples for

+ YorkTalk
+ Loughborough Sound Images
+ University of Birmingham - FDFS
+ Eurovocs
+ DECtalk
+ AT&T Bell Labs Synthesiser
+ S.W.A.Ll.C. - Welsh Synthesis from CSTR
+ All-Prosodic Speech Synthesis - IPOX
+ Orator from Bellcore

The Festival Speech Synthesis System


http://www.cstr.ed.ac.uk/projects/festival.html
Pre-synthesized examples in English, Welsh and Spanish, and
online demo of English.

Pavarobotti
http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
WWW demo of the Pavarobotti synthesis technology developed at
the National Center for Voice and Speech
(http://www.shc.uiowa.edu/ncvs_home.html).

Say...
http://wwwtios.cs.utwente.nl/say
WWW demo of the rsynth speech synthesis software. The WWW
capability was implemented by Axel Belinfante.

Musee sonore de la synthese de la Parole en francais


http://www.icp.grenet.fr/exemples_synthese/ex.html
Speech synthesis examples from a series of French language
speech synthesisers plus links to other speech synthesis demo
pages.

+ ICP-Grenoble
+ CNET-Lannion (with TD-PSOLA)
+ KTH-Stockholm
+ Universite-Mons - several versions

Lucent Technologies Bell Labs Text-to-Speech


http://www.bell-labs.com/project/tts/
Demos and samples of the latest Lucent Technologies Bell Labs
Text-to-Speech system.

WATSON FlexTalk from AT&T Advanced Speech Products Group


http://www.att.com/aspg/demo.html
WWW interface to the WATSON FlexTalk speech synthesis
demonstration.

AT&T Bell Laboratories Voices


http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
WWW interface to the AT&T Bell Laboratories text to speech
(TTS) synthesizer

Laureate from British Telecom


http://www.labs.bt.com/innovate/speech/laureate/
Demo of the Laureate speech synthesis system - not yet
commercially available.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (3 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

ORATOR from Bellcore


Online demo of the ORATOR system developed at Bellcore.
http://www.bellcore.com/ORATOR/

SVOX from TIK, ETH in Zurich


http://www.tik.ee.ethz.ch/cgi-bin/w3svox
Demo of German speech synthesis from Institut fur Technische
Informatik und Kommunikationsnetze.

Speech Synthesis Research at OGI


http://www.cse.ogi.edu/CSLU/research/TTS
Examples of diphone speech corpora and algorithms developed at
OGI for synthesis of American English and Mexican Spanish using
the Festival framework.

Lyricos
http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
Demos of the Lyricos singing voice synthesis system.
Concatenation-based synthesis of singing voice from MIDI input.

Multi-Lingual TTS from Gerhard-Mercator University, Duisburg


http://www.fb9-ti.uni-duisburg.de/demos/speech.html
Synthesis in German, English or Japanese.

TMH: Institutionen for Taloverforing och Musikakustik, Kungliga


Tekniska Hogskolan
http://www.speech.kth.se/info/software.html
Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish,
British and American English, French, German, Italian, Spanish,
LA Spanish and Greek.

Haskins Laboratory WWW Site


http://www.haskins.yale.edu/Haskins/MISC/special.html
Examples of several types of speech synthesis. Articulatory
Synthesis by HyperASY. SineWave Synthesis. Gestural
Computational Model. Pattern Playback system of the 1940's!

BeSTspeech from Berkeley Speech Technologies, Inc., (BST)


http://www.bestspeech.com/weblang.html

Eurovocs Multilingual Speech Synthesis


http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.h
tml
Based on Lernout and Hauspie technology.

HADIFIX German Speech Synthesis


http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
Provided by the Instituts fur Kommunikationsforschung und
Phonetik, Universitat Bonn.

Centigram's TruVoice Demo


http://www.centigram.com/centigram/TruVoice/index.html
Allows control of speech rate, pitch and other prosodic
charateristics.

MBROLA: Free Speech Synthesis Project


http://tcts.fpms.ac.be/synthesis/modelcmp.html
WWW demo of MBROLA which compares the quality of PSOLA,
MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic concatenative
synthesizers. Provided by the TCTS Lab, Faculti Polytechnique
de Mons, Belgium

Institute of Phonetic Sciences


http://fonsg3.let.uva.nl/IFA-Features.html
Links to lots of on-line speech synthesis demonstrations
provided by the Institute of Phonetic Sciences of the Faculty

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (4 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

of Arts of the University of Amsterdam.

Yahoo page on speech generation


http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
igence/Natural_Language_Processing/Speech_Generation/

___________________________________________________________________________

Q5.5: Speech Synthesis Software/Hardware

Please email any updates, corrections or additions to the following


list. The range of commercially available synthesis software is
growing rapidly so any help in keeping up to date will be appreciated.

Other lists of speech synthesis software on the WWW include:

Kevin Lenzo's list of Macintosh Speech Resources and Apps


http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

Speech Toys Speech Synthesis Information


http://www.speechtoys.com/spchtoys/spsyn.html

In the FAQ...

The following speech recognition software/hardware is described in the


comp.speech FAQ.

_Apple Macintosh_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* Infovox Product Range
* Macintosh Speech Output Applications
* Macintosh Speech Synthesis Manager
* MacYack Pro
* MBROLA: Free Speech Synthesis Project
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* Sound Bytes DeveloperUs Kit
* Macintosh Speech Synthesis Manager

_Windows (including 95, NT, 3.1)_


* AcuVoice
* AT&T Watson Speech Synthesis
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* Creative TextAssist and TextAssist API
* DECtalk: Text-to-Speech from Digital
* ETI-Eloquence
* HADIFIX
* Infovox Product Range
* IPOX: All Prosodic Speech Synthesis Architecture
* Lernout and Hauspie Text-To-Speech Windows SDK
* Listen2 Text Reader
* MBROLA: Free Speech Synthesis Project
* Monologue for Windows from First Byte
* PAM - A Text-To-Speech Application
* ProVerbe Speech Engine from ELAN Informatique
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* Sound Bytes DeveloperUs Kit
* Tinytalk
* TruVoice from Centigram
* WinSpeech
* ZMD Speech Synthesis

_DOS_
* CSRE: Computerized Speech Research Environment
* Infovox Product Range

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (5 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* MBROLA: Free Speech Synthesis Project


* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* spchsyn.exe
* Tinytalk
* ZMD Speech Synthesis

_OS/2_
* ProVerbe Speech Engine from ELAN Informatique
* ProVoice Developer's Speech Toolkit from First Byte
* Sound Bytes DeveloperUs Kit

_Unix_
* AcuVoice
* AsTeR
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* DECtalk: Text-to-Speech from Digital
* ETI-Eloquence
* Emacspeak - A Speech Output Subsystem For Emacs
* Festival Speech Synthesis System
* JSRU
* Klatt-style synthesiser
* KPE80 - A Klatt Synthesiser and Parameter Editor
* "learph": Trainable text-to-phoneme software by Antonio Lucca

* MBROLA: Free Speech Synthesis Project


* Orator from Bellcore
* ProVerbe Speech Engine from ELAN Informatique
* rsynth
* SENSYN speech synthesizer
* SGI Developers Toolbox Synthesiser
* Speak
* TrueTalk
* TruVoice from Centigram

_Integrated Circuits and Dedicated Hardware_


* Eurovocs
* Infovox Product Range
* ProVerbe Speech Engine from ELAN Informatique
* RC Systems V8600/V8601 Text to Speech synthesizers

_Other Platforms_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* TheBigMouth (NeXT)
* MBROLA: Free Speech Synthesis Project
* Narrator Translator Library (Amiga)
* Narrator (Amiga)
* TextToSpeech Kit (NeXT)
* Orator from Bellcore
* SENSYN speech synthesizer
* WreadFiles: File reader for Commodore Amiga

_Unknown_
* Lernout and Hauspie Text-To-Speech (3 products)
* Lucent Technologies Bell Labs Text-to-Speech system
* SIMTEL
* Text to Phoneme Program 1
* Text to phoneme program 2
* Text to phoneme program 3

AcuVoice

* Platform: Windows, Solaris


* Description: AcuVoice is a natural sounding text-to-speech system

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (6 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

built using a concatenative approach. Currently it is available


for an American English Male Voice. Software Developer Kits are
available for the Windows Platform (32-Bit) and also for the
Solaris Platform. More information and samples are available on
the Acuvoice web site.
* Contact: AcuVoice, Inc.
84 W. Santa Clara Street, Suite 720, San Jose, CA 95113-1810
Ph: 1(408)289-1661, Fax: 1(408)289-1201
Demo: 1(408)289-1177
Email: AcuVoice1@AOL.COM
WWW: http://www.acuvoice.com/

AsTeR

* Platform: UNIX
* Description: TTS front-end program which encodes structural
information about documents in speech synthesis. For more
information check out:

http://www.research.digital.com/CRL/personal/raman/aster/
aster-toplevel.html

* Operation requirements: Lisp: Lucid, clisp


* Contact: T. V. Raman
WWW: http://www.research.digital.com/CRL/personal/raman/raman.html

Email: raman@adobe.com

AT&T Watson Speech Synthesis

* Platform: Windows 95/NT on a Pentium 75 Mhz or higher


* Description: Watson is a software implementation of AT&T Bell
Laboratories voice processing technology. Watson includes BLASR
Speech Recognition (see Q6.6) and FlexTalk speech synthesis. It
requires no special hardware to run other than a standard sound
card and/or phone card. Technical details for the FlexTalk speech
synthesis include:
+ Compliant with MS Speech API.
+ Male and Female Voices available
+ 8 KHz and 11 KHz output
+ SoundBlaster compatible sound card and drivers required
+ Context sensitive abbreviation expansion
+ Accurate pronunciation of most proper names
+ Adjustable vocal tract size, speed, volume, pitch, etc.
+ American English only - other languages in development
The AT&T Advanced Speech Products Group home page provides more
detailed information including a Frequently Asked Questions list,
information for application developers on the Independent Software
Vendor (ISV) Program (including info on the SDK, licensing, and
the training program).
* Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
or higher (uses
* Cost and Availability: WATSON is a software-based speech platform
with a Software Developers Kit (SDK) that allows application
developers to use voice processing in their applications. It is
not available as a stand-alone product.
Licensing information (inc. price) is provided in the AT&T
Advanced Speech Products Group home page
* See also: Watson BLASR speech recognition in Q6.5, Microsoft
Speech API, and Advanced Speech API.
* Contact: AT&T Advanced Speech Products Group
Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
Ph: 1-800-5-WATSON, Fax: 1-608-259-2269

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (7 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Email: aspg@attmail.com
WWW: http://www.att.com/aspg/

BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

* Platform: available for Macintosh, Sun, Silicon Graphics, Windows


PC and IBM RS/6000 platforms, and can be ported to others.
* Description: BeSTspeech reads ASCII text no vocabulary limits.
Available for Dutch, English (male and female), French, German,
Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese, Korean,
Malay, Mandarin and Russian.
* Availability: Berkeley Speech Technologies, Inc does not sell end
user toolkits or products.
* Contact: Berkeley Speech Technologies, Inc.
2246 Sixth Street, Berkeley, California 94710, USA
Ph: (510) 841-5083, Fax: (510) 841-5093
Email: webmaster@bst.com
WWW: http://www.bestspeech.com/index.html

TheBigMouth - a Text to Speech Program

* Platform: NeXT
* Description: Text to speech program based on concatenation of
pre-recorded speech segments.
* Availability:
ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z

Creative TextAssist

* Platform: Windows
* Description: Based on DECtalk speech synthesis. A detailed
description of TextAssist is provided on the Creative WWW pages.
TextAssist TextReader provides a convenient Windows user interface
for text reading.
* Availability: Creative TextAssist is bundled with most (all?)
Creative Sound Blaster audio cards. TextAssist preview software is
available from the Creative Labs TextAssist home page.
* Contact: Creative Labs, Inc.
Address, phone, email etc unknown
WWW: http://www.creaf.com/ :
http://www.creaf.com/wwwnew/tech/devcnr/tassist.html

Creative TextAssist API

* Platform: Windows
* Description: The TextAssist API (TAAPI) is created for Microsoft
Windows 3.1x and Windows 95 developers who intend to develop
16-bit Text-to-Speech software applications using Creative's
TextAssist speech engine. It supports direct control of speech
output characteristics, concurrent playback of text-to-speech and
wave files, foreign language support, speech synchronization,
exception dictionaries. It also includes a voice editing tool for
creating new custom voices, a Visual Basic Custom Control for
high-level support in Visual Basic and other languages
* Availability: The TextAssist API is released to registered
developers at no cost.
* Contact: WWW: http://www.creaf.com/
FAQ: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (8 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

CSRE: Computerized Speech Research Environment

* Platform: DOS
* Description: CSRE is a software system which includes in an
implementation of the Klatt speech synthesizer. See the CSRE entry
in Q1.9 and the AVAAZ WWW pages for more detail.
* Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
2B0
Ph: +1-519-472-7944 , Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/

DECtalk Speech Synthesis

* Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
* Description: Converts ordinary text into natural-sounding,
intelligible speech. Provides personalized voices, and extensive
user controls. DECtalk technology is available for the following
packaging options.
+ DECtalk PC card option: An industry-standard ISA/EISA bus
card implementation that can be integrated with any Intel 486
processor-based system running DOS or Windows. Applications
can be interfaced to the bus via a DOS Terminate and Stay
Resident (TSR) driver or a Windows Dynamic Link Library
(DLL). This option is available with an external speaker with
volume control and headphone jack.
+ DECtalk Express external package: An external, portable
package that you can plug in to any PC or serial port. The
external package includes a built-in speaker and headphone
jack, plus combined on/off and volume controls and a
rechargeable battery pack.
+ DECtalk Software solution: Software-only text to speech for
Alpha or Intel systems running Windows NT or Alpha systems
running Digital UNIX. Provides complete speech synthesis
capabilities so developers can enhance applications with
DECtalk technology. DECtalk Software output can be directed
to audio devices, into WAVE files, or into memory buffers.
* Pricing:
://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis
-oi.html
* More Information:
Digital Equipment Corporation WWW pages: http://www.digital.com/
DECtalk page:
http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
l
Ph: 1-800-DIGITAL

DECtalk Software

* Platform: Digital UNIX and Windows NT


* Description: DECtalk converts standard ASCII text into natural,
intelligible speech. Speech output through any audio device is
supported by Microsoft Video for Windows or Multimedia Services
for Digital UNIX. An API gives developers direct access to
text-to-speech functions. Provides nine voice personalities (4
female, 4 male, 1 child). Provides punctuation and tonal control,
supports customized pronunciation of trade jargon and acronyms.
Common programming interface works with both Alpha and Intel
platforms.
* More Information:
Digital Equipment Corporation WWW pages: http://www.digital.com/
DECtalk Software page:
http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (9 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

l
WWW:
http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synth
esis.html
Ph: 1-800-DIGITAL

ETI-Eloquence

* Platform: MS Windows (Win95,NT,3.1), Solaris, SunOS, SGI, RS/6000


* Description: ETI-Eloquence is a software based text-to-speech
system. It generates waveforms completely algorithmically instead
of by concatenating waveforms, for maximum flexibility and
naturalism. For instance, when the user requests a deeper voice,
the software simulates a larger vocal tract, instead of simply
pitch-shifting samples. It uses high-level linguistic parsing,
which obviates the need for a huge dictionary. It handles numbers,
acronyms, currency, etc. It includes a set of annotation symbols,
for placing stress on particular words, expressing
excitement/boredom, etc. Also allows phonetic input. Supports MS
SAPI.
Produces male and female voices for General American English.
Dialects under development include Alabama and Brooklyn.
* Price: Flexible license agreements on application.
* Availability:Eloquent Technology, Inc.
2389 North Triphammer Road, Ithaca, NY 14850 , USA
Ph: (607) 266-7025, Fax: (607) 266-7030
Email: info@eloq.com
WWW: http://www.eloq.com/

Emacspeak - A Speech Output Subsystem For Emacs

* Platform: UNIX, Emacs


* Description: Emacspeak is a speech output system that will allow
someone who cannot see to work directly on a UNIX system.
Emacspeak is built on top of Emacs. With emacspeak loaded, Emacs
provides spoken feedback for everything you do. Emacspeak
currently supports the new Dectalk Express speech synthesizer, as
well as older versions of the Dectalk e.g. the MultiVoice. See the
Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak
distribution for additional details.
* Requirements: Requires GNU FSF Emacs 19 (version 19.23 or later)
and TCLX 7.3B (Extended TCL) to run Emacspeak.
* Availability:

Emacspeak WWW page


http://www.research.digital.com/CRL/personal/raman/emacsp
eak/emacspeak.html

Emacspeak source
http://www.research.digital.com/CRL/personal/raman/emacsp
eak/emacspeak.tar.gz

* Contact: T. V. Raman, raman@adobe.com

Eurovocs

* Platform: Various - RS232 hardware connection


* Description: Eurovocs is a stand-alone text-to-speech synthesizer
which uses the text-to-speech technology of Lernout and Hauspie
Speech Products. Available for Dutch, French, German and American
English with other languages planned for release soon. One

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (10 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Eurovocs device can support two different languages. Eurovocs can


be connected to any computer via a standard serial interface
(RS232). It supports personal dictionaries, generation of DTMF
tones, and pronunciation of special character sequences such as
digit strings, telephone-numbers, date and time indications,
abbreviations, alphanumeric strings etc.
* Contact: Technologie & Revalidatie
Postbus 128, B-9000 Gent, Belgium
Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
E-mail: noe@elis.rug.ac.be
WWW:
http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html

Festival Speech Synthesis System

* Platform: General Unix (including Solaris (2.4,2.5), SunOS, HPUX,


SGIs, Linux, Dec Alpha, FreeBSD)
* Description: Festival is a general multi-lingual speech synthesis
system developed at CSTR, University of Edinburgh. It offers a
full text to speech system with various APIs, as well an
environment for development and research of speech synthesis
techniques. It is written in C++ with a Scheme-based command
interpreter for general control. Festival's home page offers
demos, the full manual and access to the download page. The
distribution includes full source and documentation, and lexicons
and speech databases for British English text to speech.
* Price: Free for non-commercial use
* Availability: by anonymous ftp:
WWW: http://www.cstr.ed.ac.uk/projects/festival/download.html
ftp: ftp://ftp.cstr.ed.ac.uk/pub/festival/1.1.1/

HADIFIX

* Platform: Windows
* Description: German speech synthesis system developed at the
Institute for Communications Research and Phonetics , University
of Bonn. Provides conversion of input text to phonemes, automatic
prediction of stress, phrasing and pitch, and speech generation by
concatenation of small units of natural speech. Demisyllables and
similar units are used; they comprise all consonants before the
vowel and the beginning of the vowel (initial demisyllable) or the
end of the vowel and the following consonants (final
demisyllable). For example, the word 'Strolch' is formed by
concatenating 'Stro' and 'olch'.
* Demo: Windows demo software available. Limited to synthesis of one
short text (text.txt) at a time. Speech format limitations too.
1.3MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
A 1993 version is available with unlimited synthesis from a string
of phonemic symbols and accent markers. 6MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
* WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
* On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html

Infovox Product Range

* Description: Multilingual Text-to-speech systems, languages


available: American English, British English, German, French,
Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
Finnish.
* Product name:INFOVOX 500, PC BOARD

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (11 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

+ Product description: Half length expansion board for IBM PC,


XT, AT, PS/2 model 30 or compatible personal computers. The
board can also be connected via the serial port. Language and
control program for downloading into RAM or mounted on EPROMs
+ Platform: DOS/Windows with IBM PC, XT, AT, PS/2 model 30 or
compatible
+ Delivered standard interface: MS DOS I/O driver
* Product name: INFOVOX 600, OEM BOARD
+ Product description: OEM board built with CMOS IC's. Language
and control program are stored in on-board fixed memory.
+ Platform: any, hardware interface: 9-pole D-SUB (RS 232-C)
300-9600 Baud.
+ Delivered standard interfaces: MS DOS I/O driver and
interface to Apple Speech manager.
* Product name: INFOVOX 700, DESKTOP UNIT
+ Product description: Desktop unit with built in Infovox 600
to be connected to any computer or terminal via an RS 232-C
serial interface. Built in loudspeaker and rechargable
battery for 4 hours use, and control knobs for continuous
control of speech volume and speed.
+ Platform: various through hardware interface
+ Delivered standard interfaces: MS DOS I/O driver and
interface to Apple Speech manager
* Product name: INFOVOX 650, OEM BOARD
+ Product description: OEM-board built with CMOS IC's. Language
and control program are stored in on-board memory.
+ Platform: any, hardware interface: 9 pole D-SUB (RS 232-C)
300-9600 Baud
+ Delivered standard interfaces: MS DOS I/O driver and
interface to Apple Speech manager
* Product name: INFOVOX 750, DESKTOP UNIT
+ Product description: Desktop unit with built in Infovox 650
to be connected to any computer or terminal via an RS 232-C
serial interface. Built in loudspeaker and rechargable
battery for 5 hours use, and a control knob for continuous
control of speech volume.
+ Platform: various through hardware interface. Delivered
standard interfaces include MS DOS I/O driver and interface
to Apple Speech manager
* Product name: Infovox 210, software for Apple Macintosh
+ Product description: Software based text-to-speech
conversion. Produces 16 bit and 8 bit sound. Delivered on
3.5" diskettes with user lexicon and a complete
documentation.
+ Platform: Apple Macintosh with minimum 68030, 33 MHz
microprocessor.
+ Delivered standard interfaces: Standard interface to Apple
Speech manager
* Product name: Infovox 220, software for Microsoft Windows.
+ Product description: Software based text-to-speech
conversion. Produces 16 bit sound and conforms to Microsoft
Windows multimedia standard MCI. Delivered on 3.5" diskettes
with user lexicon and a complete documentation.
+ Platform: Windows on IBM compatible PC with minimum 486/25MHz
microprocessor.
+ Delivered standard interfaces: Standard interface to
Microsoft Windows 3.1 and sound boards supporting Microsoft
Windows multimedia driver for audio.
* Contact: Telia Promotor Infovox AB
TTS Sales Division
P.O. Box 2069, S-171 02 Solna, Sweden
Ph: +46 8 764 35 00, Fax: +46 8 735 78 76
Email: tts-sales@infovox.se
WWW: http://www.promotor.telia.se/NYA/cc/t-s/index.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (12 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

IPOX: All Prosodic Speech Synthesis Architecture

* Platform: Windows
* Description: IPOX is an experimental, all-prosodic speech
synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is
freely available (after registration) for evaluation and
non-profit research purposes.
* Requirements: PC (preferably a fast 486) running Windows 3.1 or
higher. Sound output requires a 16-bit Windows-compatible sound
card
* Availability: By WWW from
http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm

JSRU

* Platform: UNIX and PC


* Cost: 100 pounds sterling (from academic institutions and
industry)
* Description: A C version of the JSRU system, Version 2.3 is
available. It's written in Turbo C but runs on most Unix systems
with very little modification. A Form of Agreement must be signed
to say that the software is required for research and development
only.
* Contact: Dr. E.Lewis _eric.lewis@bristol.ac.uk)_

Klatt-style synthesiser

* Platform: Unix
* Cost: Free
* Description: Software posted to comp.speech in late 1992.
* Availability: By ftp from the comp.speech ftp site
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
04.tar.gz
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
04.tar.Z
* See also: KPE80 - A Klatt Synthesiser and Parameter Editor.

KPE80 - A Klatt Synthesiser and Parameter Editor

* Platform: Unix
* Description: The KPE80 program provides a graphical interface for
the implementation of the Klatt 1980 formant synthesiser written
by Jon Iles and Nick Ing-Simmons. It was inspired by IGE, a piece
of code written by Rob Fletcher (
http://www.york.ac.uk/~rpf1/IGE.html).
* Technical Desc.: It is comprised of an X-Window interface and
version 3.03 of the synthesiser code. The interface allows users
to display and edit Klatt parameters using a graphical display
which includes the time-amplitude waveform of both the original
speech and its synthetic copy, and some signal analysis
facilities. Most of the work in choosing the parameter values to
produce the synthetic copy has to be done by the user. KPE will
estimate the fundamental frequency contour from an original token;
this estimate will need to be amended where errors occur. It is
possible to specify the formant trajectories with some precision
by overlaying the appropriate formant frequency parameter tracks
on the spectrogram of the target waveform. A number of facilities

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (13 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

exist to help in the refinement of parameter values: original and


synthetic waveforms can be compared aurally, spectrally, and
spectrographically using built-in speech analysis facilities.
* File formats: KPE will read RIFF (.wav) files and SFS files. (SFS
is a suite of speech-signal processing programs available free
from Phonetics and Linguistics, UCL.)
* Availability:

KPE for SunOs 4.1.3 (statically compiled libraries)


ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z

KPE for Linux (statically compiled libraries)


ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z

The source code (needs gcc and SUIT to compile)


ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z

A postscript overview of KPE


ftp://ftp.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps

The SFS distribution


ftp://ftp.phon.ucl.ac.uk/pub/sfs/

* See also: Public domain Klatt-style speech synthesis code.


* Contact: Andrew Simpson
Department of Phonetics and Linguistics, University College London

Wolfson House, 4 Stephenson Way, London NW1 2HE


Email: a.simpson@ucl.ac.uk
WWW: http://www.phon.ucl.ac.uk/home/andrew/home.html

"learph": Trainable text-to-phoneme software by Antonio Lucca

* Platform: UNIX
* Description: Experimental software which learns text to phoneme
translation from examples using decision-tree-like data
structures. It is based on the assumption that each letter can
correspond to different phoneme strings depending on the context.
* Availability: Examples and source are available on the WWW:
http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
* Contact: Antonio Lucca: toninlcc@tesi.dsi.unimi.it

Lernout & Hauspie Text-to-Speech (3 products)

Lernout & Hauspie have three TTS products. The functionality of the
products is similar, however, they differ in hardware implementation
and other details where described below.

* L&H tts2000/T: TTS for the Telephony and Telecommunications Market


* L&H tts2000/M: TTS for the Computer and Multimedia Market
* L&H tts3000/C: TTS for the Buisness and Consumer Electronics
Market

* Description: Text to Speech (TTS) software based on parameterized


segment concatenation (diphones, triphones and tetraphones)
algorithms. Available for US English, German, Dutch, French,
Spanish (Castilian), Italian and Korean. General features include:
+ The control of volume, speech rate and speech pitch.
+ The use of control sequences to customize TTS output (adding
pauses, using phonetic input, etc.).
+ Switching between languages at run time.
+ A personal vocabulary editor is available for building
exception dictionaries.
+ Readout modes: letter by letter, word by word or sentence by

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (14 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

sentence.
+ Input formats: orthographic input, phonetic input, phonetic
input with prosodic information.
* tts2000/T
+ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
linear PCM.
+ Sampling Frequency: 8kHz
+ Single channel platform examples: SHARP SH7000, ARM6/ARM7,
Intel i960, TI TMS320C31, AT&T DSP3210
+ Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
* tts2000/M
+ Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit
A-law PCM, 16 bit linear PC.
+ Sampling Frequency: 8/10/11.025 kHz
+ Single processor platform examples: ARM6/ARM7, Intel
386/486/Pentium, Motorola 68040
+ Two processor platform examples: {Intel 386/486/Pentium or
Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI
TMS320C25/20C5X}
* tts3000/C
+ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
linear PCM.
+ Sampling Frequency: 10kHz
+ Single processor platform examples: SHARP SH7000, ARM6/ARM7,
Intel i960, TI TMS320C31, AT&T DSP3210
+ Two processors platform examples: { SHARP SH7000 or ARM6/ARM7
or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or
Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
* See also: L&H Windows TTS SDK
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/tts.html
* Price: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Lernout & Hauspie Text-to-Speech Windows SDK

* Platform: Windows
* Description: The L&H Text-to-Speech software developers kit is
able to integrate text-to-speech technology with your own or
existing PC applications under Microsoft Windows 3.1. This
software will allow conversion of written text into clear human
sounding synthetic speech.
* Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 +
MS Windows 3.1 (or higher) + SoundBlaster compatible sound board.
* See also: L&H TTS Products
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/tts.html
* Price: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Listen2 Text Reader

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (15 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Platform: Windows
* Description: Listen2 is a multi-voice, multi-language text reader.
Listen2 comes in two versions, English only that uses high quality
male and female voices, and the International version that can
speak up to 5 different languages: English, German, French,
Spanish or Italian, all in male voices. The basic International
program comes with built-in English and additional language fonts
can be purchased separately. The English version comes complete.
Both programs are dynamically switchable and configurable. This
means that you can press a hot key to speed up the speech, make it
louder or quieter, etc., as it is reading a file. You can also
insert flags in text files to make it switch voices or switch
languages, depending on what version you have.
Listen2 has all the features of the JTS Reader shareware program
plus a few more. It will voice your reminder messages or
appointment list on start-up. It will also speak a reminder
message on shutting down.
* WWW: A more complete description is available on the Listen2 web
page
* Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
WWW: http://www.islandnet.com/jts/

Lucent Technologies Bell Labs Text-to-Speech system

* Platform: Unknown
* Description:Lucent Technologies provides a web site with demos and
samples of their latest speech synthesis technology. The site has
interactive demos in American English, German, and Mandarin
Chinese, and the capability to adjust voice parameters on the fly.
Pre-synthesized demos for French, Italian, Russian, and Romanian
are also provided.
The site includes downloadable papers with detailed system
descriptions.
* WWW: http://www.bell-labs.com/project/tts/

Macintosh Speech Output Applications

* Platform: Macintosh
* Description: A comprehensive list of Macintosh Speech Applications
is provided by Kevin Lenzo at CMU:
http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
The Apple Speech WWW Site also has some useful information:
http://www.speech.apple.com/

Speech Manager and PlainTalk

* Platform: Macintosh
* Description: Apple's text-to-speech system extensions that enable
applications to perform text-to-speech conversion. The Speech
Manager runs on most Macs, but PlainTalk (and the high quality
voices) requires a 68020 Mac or better.
* Availability: By anonymous ftp from:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
em/PlainTalk 1.4.1/
This directory contains subdirectories for recent versions of
PlainTalk. The current release (PlainTalk 1.4.1) contains the
English Text-To-Speech with about a dozen voices
(English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
(Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (16 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Recognition software (English_Speech_Recognition.hqx: 2.3MByte).


* Cost: Free
* WWW: The latest information is available from Apple's WWW page for
speech recognition and synthesis:
http://www.speech.apple.com/
* Note 1: Check out Kevin Lenzo's list of Macintosh Speech
Applications.
* Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
Plaintalk. For subscription and other information visit the
Plaintalk Discussion List Home page
* Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com

MacYack Pro

* Platform: Macintosh
* Description: MacYack Pro is a commercial speech package for
Macintosh that uses the PlainTalk Text-to-Speech synthesis
software. Features include:
+ Add speech to any word processor.
+ Hear notification dialogs and other dialog boxes.
+ See and hear a customized message at startup or shutdown.
+ Hear calculations instantly.
+ Correct pronounciation errors.
+ Create custom double-clickable "speech files."
+ Have speaking alert sounds.
+ Add speech to HyperCard stacks.
+ Use AppleScript to add speech to other programs.
* Price: $29.95 for a limited time, reduced from $49.95 regular
price. 30 days money back guarantee.
* Contact: Scantron Quality Computers
20200 Nine Mile Rd. St. Clair Shores, MI 48080
Ph: 1-800-777-3642, Fax: 810-774-2698
E-mail: sales@sqc.com
WWW: http://www.sqc.com/
Product Info: http://www.lowtek.com/macyack/

MBROLA: Free Speech Synthesis Project

* Platform: Sun4, Sun/SunOS5.4, HP, VAX/VMS, DEC Alpha/VMS, PS/DOS,


PS/Windows 3.1, PS/Windows 95, PC/Solaris2.4, PC/Linux, SGI
INDY/IRIX, NeXT, and soon for Macintosh.
* Description: MBROLA is a high-quality, diphone-based speech
synthesizer which is available for free. It is provided by the
TCTS Lab of the Faculte Polytechnique de Mons (Belgium) which aims
to obtain a set a speech synthesizers for as many languages as
possible which will be free of use for non-commercial,
non-military applications.
MBROLA 2.00 takes a list of phonemes as input, together with
prosodic information (duration of phonemes and a piecewise linear
description of pitch), and produces 16bit speech samples at the
sampling frequency of the diphone database (typically 16kHz). (It
is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does
not accept raw text as input.) Databases are now being prepared
for English, Spanish, Italian, Dutch, and Romanian. Collaborations
are welcome. More information can be found at the MBROLA project
homepage.
* Demonstration: WWW demo of MBROLA which compares the quality of
PSOLA, MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic
concatenative synthesizers is available at

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (17 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

http://tcts.fpms.ac.be/synthesis/modelcmp.html.
* Contact: Dr Thierry Dutoit
Faculte Polytechnique de Mons, TCTS Lab,
31, bvd Dolez, B-7000 Mons, Belgium.
Ph: +32-65-374133, Fax: +32-65-374129
e-mail: mbrola@tcts.fpms.ac.be
WWW: http://tcts.fpms.ac.be/synthesis/mbrola.html

Monologue for Windows from First Byte

* Platform: Windows
* Description: Monologue is a software program that reads text from
the clipboard in Windows 16 or 32 bit applications. It can be
found as a bundled product with many sound cards and multimedia
general purpose computer systems. Monologue can add the element of
speech to virtually any text oriented application. Any
pronounceable combination of letters and numbers will be spoken
clearly. It can be applied to tasks such as eyes-free
proofreading, data verification (e.g. spreadsheets), reading
E-mail and more. User-changeable parameters provide control over
the sound quality by allowing for changes in pitch, and the speed
of speech. An exception dictionary saves preferred pronunciation
of words and abbreviations.
Monologue Win32 now includes support for the Microsoft SAPI.
Monologue male "SpeechFonts" are available for US English, British
English, German, French, Latin American Spanish, Italian. A US
English Female SpeechFont is also available.
For more detailed information and examples go to the First Byte
WWW pages.
* Availability: Currently bundled with many sound cards and
multimedia general purpose computer systems. For pricing,
licensing details, and release information see the First Byte WWW
pages or email info@firstbyte.davd.com.
* See also: ProVoice Developer's Speech Toolkit from First Byte
* Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610 Fax: 310-793-0611
Email: info@firstbyte.davd.com
WWW: http://www.firstbyte.davd.com/

Narrator Translator Library

* Platform: Amiga
* Description: A US English text to phoneme translator, implemented
as a resident software library, for use with the Amiga Narrator
Device. This software was supplied as a standard part of the Amiga
operating system software up to O.S version 2.04. (Translator
version 37.1, 1991) Approximately 700 translation rules are used
to create the 'ARPAbet' phonemes. This software is functional on
all current Amiga systems (O.S. 3.1).
* Availability: limited to pre-owned system software disks and
unsold O.S upgrade kits (Pre-O.S. 2.1).

Replacement Library: Translator42

* Platform: Amiga
* Description: an independent replacement for the Commodore-supplied
"translator.library" which is a part of the Narrator speech
synthesis package. It implements multi-lingual text-to-speech for
an Amiga. The translation rules for each language are defined in a
plain text 'Accent' file.
There is a provision for the selection of unique languages for

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (18 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

text segments by inserting in-line markup codes in the text: e.g.


"Hello there! \french{Bonjour} \deutsch{gute morgen}".
'Accent' files for American English, British English, Swedish,
Maori, Finnish, German, Icelandic, Klingon, Polish, Italian, and
Welsh languages included in the archive.
* Availability: Amiga The most current version, 42.4, of the library
and source are available by anonymous ftp from Aminet:
ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha

Narrator

* Platform: Amiga
* Description: Formant based speech synthesis. Includes a
Engish-to-phoneme translation library, and a SPEAK: pseudo-device
for speech output.
* Hardware: Standard Amiga hardware
* Availability: Part of AmigaOS
* See Also: The Narrator Translation library

TextToSpeech Kit

* Platform: NeXT Computers


* Description: The TextToSpeech Kit does unrestricted conversion of
English text to synthesized speech in real-time. The user has
control over speaking rate, median pitch, stereo balance, volume,
and intonation type. Text of any length can be spoken, and
messages can be queued up, from multiple applications if desired.
Real-time controls such as pause, continue, and erase are
included. Pronunciations are derived primarily by dictionary
look-up. The Main Dictionary has nearly 100,000 hand-edited
pronunciations which can be supplemented or overridden with the
User and Application dictionaries. A number parser handles numbers
in any form. A letter-to-sound knowledge base provides
pronunciations for words not in the Main or customized
dictionaries. Dictionary search order is under user control.
Special modes of text input are available for spelling and
emphasis of words or phrases. The actual conversion of text to
speech is done by the TextToSpeech Server. The Server runs as an
independent task in the background, and can handle up to 50 client
connections.
* Misc: The TextToSpeech Kit comes in two packages: the Developer
Kit and the User Kit. The Developer Kit enables developers to
build and test applications which incorporate text-to-speech. It
includes the TextToSpeech Server, the TextToSpeech Object, the
pronunciation editor PrEditor, several example applications,
phonetic fonts, example source code, and developer documentation.
The User Kit provides support for applications which incorporate
text-to-speech. It is a subset of the Developer Kit.
* Hardware: Uses standard NeXT Computer hardware.
* Cost:
+ TextToSpeech User Kit: $175 CDN ($145 US)
+ TextToSpeech Developer Kit: $350 CDN ($290 US)
+ Upgrade from User to Developer Kit: $175 CDN ($145 US)
* Availability: Trillium Sound Research

1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (19 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Orator Text-to-Speech Synthesizer

* Platform: SUN SPARC, Decstation 5000. Written in C, and therefore


portable to other UNIX platforms. Some successful ports: HP,
RS-6000, PC-Unix [Linux].
* Description: Sophisticated speech synthesis package. Has text
preprocessing (for abbreviations, numbers), acronym rules, and
human-like spelling routines. Natural-sounding synthesis based on
demisyllable concatenation. Has high accuracy for pronunciation of
names of people, places and businesses in America; good accuracy
for English text; rules for stress and intonation marking; various
methods of user control and customization at most stages of
processing.
A new version of the ORATOR system is under development. Both
ORATOR and this new "ORATOR II" system are capable of general text
synthesis. The ORATOR II system has a more natural-sounding voice.
* Hardware: Runs on common SPARC or Decstation workstations, using
their internal audio output capability. Recommend at least 16M of
memory.
* WWW: More detailed information plus examples of ORATOR synthesis
are available on the ORATOR WWW pages:
http://www.bellcore.com/ORATOR/
* Misc 1: A free demo cassette is available.
* Misc 2: Examples of Orator are also available on the University of
Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).
* Availability and Pricing: Contact Bellcore's Licensing Office
Tel: 1-800-521-CORE (521-2673)
Fax: 1-908-336-2559
Email: Anthony Lindsey: alin1@panix.com
WWW: http://www.bellcore.com/ORATOR/

PAM - A Text-To-Speech Application

* Platform: Windows
* Description: PAM is a talking personal assistant and text reader
application. It uses the ProVoice TTS package. PAM will verbally
advise about appointments and reminder messages at specified times
during the day. It can read text files, clipboard text, and text
sent in DDE messages. Using the full verbal interface, PAM can be
used by visually challenged individuals. Shareware - thirty day
free trial.
* Requirements: Any Windows sound card, speakers or headphones. Min.
memory - 4 megs, 8 megs recommended.
* WWW: A more complete description is available on the JTS homepage:
http://www.islandnet.com/~tslemko/
* Availability: The shareware can be downloaded by ftp from
ftp://ftp.islandnet.com/jts/pam_en3c.zip. The file size is approx.
1 MByte.
* Price: $US40 for the registered version.
* Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0

ProVerbe Speech Engine from ELAN Informatique

* Platform: Windows 3.x, NT, 95, OS/2, Unix Solaris, Unix SCO and
hardware
* Description: The ProVerbe Speech Engine from ELAN Informatique
produces natural sounding speech from written text. Naturalness is
achieved by using the TD-PSOLA process from the CNET (France
telecom's research lab.) which is based on the concatenation of
elementary speech units (including diphones). Supported languages

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (20 of 69) [10/31/2003 8:41:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

are British English, American English, Russian, German, French and


Spanish. For multi-channel applications Elan Informatique also
provides hardware platforms.
Elan Informatique provides a SDK reference document (sdken.doc:
WinWord6 format).
* Demo versions: Telephone demonstration: +33-561 17 67 01
Sample sound files and demonstration software available.
A CD-ROM with all these demonstrations is available by
registration.
* Contact: Elan Informatique
4 rue Jean Rodier, 31400 TOULOUSE FRANCE
Contact person: Pierre Delrat
Phone: +33-561-36-0777 Fax: +33-61-36-0770
BBS: +33-561-36-0788
E-mail: sales@elan.fr
ftp: ftp://ftp.elan.fr
WWW: http://www.elan.fr/

ProVoice Developer's Speech Toolkit from First Byte

* Platform: ProVoice Developer's Toolkits are available for DOS,


Windows 3.1, Windows 95, Windows NT, OS/2, and Macintosh.
* Description: ProVoice allows programmers to add synthesized speech
to their applications. Your program passes text strings to the
ProVoice speech engine that translates text into audible speech.
Male and/or female "SpeechFonts" are available for many languages;
English, French, German, UK British English, Italian, and Spanish.

ProVoice converts text to speech in two phases using a set of


phonetic translation and pronunciation rules. First, the software
analyzes and translates text into "sound descriptors", a phonetic
language with pitch, duration, and amplitude codes which are
needed to produce stress patterns in phrases and sentences. Rules
are used to analyze words, numbers, and punctuation. The second
phase converts the intermediate phonetic language in speech
signals; algorithms drive distinct speech signals into smooth
flowing, continuous, clear speech. Real time synchronization of
mouth movement and word boundaries allows animation of a graphical
talking character, or highlighting of displayed text as it is
spoken.
Necessary tools and examples are provided for programmers to
manipulate the ProVoice speech technology; including installation
instructions, extensive samples programs, and complete
documentation. In addition, sample code is provided on disk to
illustrate speech programming techniques.
* Note 1: First Byte will perform custom work for embedded systems.
* Note 2: ProVoice Windows includes support for the Microsoft SAPI.
It will speak through any Windows-supported wave audio device.
* Note 3: Distribution of ProVoice for commercial use is subject to
execution of a Commercial Product Distribution License Agreement.
* WWW: For more detailed information and examples go to the First
Byte WWW page: http://www.firstbyte.davd.com/
* See also: Monologue for Windows from First Byte
* Price and Availability: Contact First Byte
* Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610, Fax: 310-793-0611
Email: info@firstbyte.davd.com
WWW: http://www.firstbyte.davd.com/

RC Systems V8600/V8601 Text to Speech synthesizers

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (21 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Platform 1: IBM PC: ISA card.


* Platform 2: Interface to PC/104 standard microcontrollers.
* Platform 3: Standalone (or embedded) hardware thru RS232 or
parallel printer port or processor bus.
* Description: Converts plain ASCII text to speech. Programmable
voices, pitch rate, volume, etc. Built-in DTMF and tone
generators.
* Price: $151-$299 US (qty 1)
* Contact: RC Systems

1609 England Avenue, Everett, WA 98203, USA


Ph: (206) 355-3800 Fax: (206) 355-1098
Europe: +44181 539-0285

rsynth

* Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI


Irix4.x, Linux)
* Description: Public domain text-to-speech systm assembled from a
variety of sources. It supports CMU and BEEP format dictionaries
(as described in Q1.10) and now utilises stress marks in the
dictionary in synthesising intonation.
* Price: Free
* Misc: Axel Belinfante has implemented a WWW rsynth demo:
http://wwwtios.cs.utwente.nl/say.
* Availability: by anonymous ftp from

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
nth-2.0.tar.Z

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
nth-2.0.tar.gz

SENSYN speech synthesizer

* Platform: PC/DOS/Windows, Macintosh, Sun, and NeXT


* Rough Cost: $300
* Description: This formant synthesizer produces speech waveform
files based on the (Klatt) KLSYN88 synthesizer. It is intended for
laboratory and research use. Note that this is NOT a
text-to-speech synthesizer, but creates speech sounds based upon a
large number of input variables (formant frequencies, bandwidths,
glottal pulse characteristics, etc.) and would be used as part of
a TTS system. Includes full source code.
* Availability: Sensimetrics Corporation
Sidney Street, Cambridge MA 02139.
Fax: (617) 225-0470; Tel: (617) 225-2442.
Email: sensimetrics@sens.com
WWW: http://www.sens.com/

SGI Developers Toolbox Synthesiser

* Platform: SGI
* Description: The SGI Developer Toolbox 4.0 CDROM contains a
basicpublic domain text-to-speech program in the publics/speak
directory. The directory includes man pages and source.
* Availability: on the SGI Developer Toolbox 4.0 CDROM

SIMTEL

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (22 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

A wide range of speech related software, sound-blaster software and


signal processing software for PCs is available on SimTel and its
mirror sites. It can be obtained by ftp from:

ftp://ftp.coast.net/SimTel/msdos/voice/

and is now on the WWW:

http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

Voicemaker

The archives include the program Voicemaker which synthesises speech


from phonemes using "concatenation" of phonemes recorded by the user.
Voicemaker is a freeware program. It requires an IBM or compatible,
512KB RAM, sound blaster compatible sound card.

ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip

Sound Bytes DeveloperUs Kit

* Platform: Subroutine library for Windows, OS/2 and Macintosh


* Hardware: Windows - 16 MHz 80386 (minimum) running Windows 3.1; 4
Mb RAM with at least 1.4 Mb RAM free. Disk space 1.4 Mb.
OS/2 - 16 MHz 80386 (minimum) running OS/2 2.0 or above; 8 Mb RAM
with at least 1.4 Mb RAM free.
Mac - Any Mac with at least 2.5 Mb of RAM running 6.0.4 or higher.
Telephone compatible. Compatible with commonly used sound cards.
* Description: SBDK is a software-only sentence-level synthesizer
that converts unrestricted English text (ASCII) into synthesized
voice through diphone concatenation. SBDK utlizes parsing to
incorporate the intonational and rhythmic patterns of normal
speech. The developerUs kit includes two voices, one female and
one male. The product has a 55,000-word built-in dictionary and a
tool for creating customized user dictionaries. It converts
numbers, dates, dollars, phone numbers and times to words, and has
a SoundOut facility that provides a choice of pronouncing unknown
words phonetically or spelling them out. Developers can vary voice
pitch (130-220 Hz) and rate (65-200 wpm), synchronize speech to
other events, have multiple channels of speech to the same or
different boards, etc. Speech sampling options: 8-bit linear;
8-bit companded at 11 kHz (Windows); 8-bit mu-law PCM at 8 or 11
kHz; 16-bit linear at 11 kHz.
* Cost: Sound Bytes may be licensed for internal use or resale. Site
license fee= $3750. Resale or Internal runtime fees= 2% of net
sales price per runtime sold, OR $150 per telephone port, OR per
unit pricing for internal use determined case-by-case.
* Misc: Demo disks are available for Windows and the Mac.
* Availability: Natural Speech Technologies, Inc.
Ph: (619) 457-2526.

spchsyn.exe

* Platform: DOS
* Availability: By anonymous ftp as a self extracting DOS archive.
ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
* Requirements: May require special TI product(s), but all source is
there.

"Speak" - a Text to Speech Program

* Platform: Sun SPARC

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (23 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Description: Text to speech program based on concatenation of


pre-recorded speech segments. A function library can be used to
integrate speech output into other code.
* Hardware: SPARC audio I/O
* Availability: by anonymous ftp
ftp://wilma.cs.brown.edu/pub/speak.tar.Z

Speech Manager and PlainTalk

* Platform: Macintosh
* Description: Apple's text-to-speech system extensions that enable
applications to perform text-to-speech conversion. The Speech
Manager runs on most Macs, but PlainTalk (and the high quality
voices) requires a 68020 Mac or better.
* Availability: By anonymous ftp from:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
em/PlainTalk 1.4.1/
This directory contains subdirectories for recent versions of
PlainTalk. The current release (PlainTalk 1.4.1) contains the
English Text-To-Speech with about a dozen voices
(English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
(Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
* Cost: Free
* WWW: The latest information is available from Apple's WWW page for
speech recognition and synthesis:
http://www.speech.apple.com/
* Note 1: Check out Kevin Lenzo's list of Macintosh Speech
Applications.
* Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
Plaintalk. For subscription and other information visit the
Plaintalk Discussion List Home page
* Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com

Text to phoneme program (1)

* Platform: unknown
* Description: Text to phoneme program. Based on Naval Research
Lab's set of text to phoneme rules.
* Availability: by anonymous ftp
ftp://shark.cse.fau.edu/pub/src/phon.tar.Z

Text to phoneme program (2)

* Platform: unknown
* Description: Text to phoneme program.
* Availability: by anonymous ftp
ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz

Text to phoneme program (3)

* Description: A public domain version of the same Naval Research


Lab text to phoneme rules.
* Availability: By anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phon
eme.tar.gz

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (24 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Tinytalk

* Platform: DOS / Windows???


* Description: Shareware package is a speech 'screen reader' which
is used by many blind users.
* Price: Tinytalk is now $150. There are package deals on Tinytalk
with various speech synthesizers.
* Availability: Tinytalk is available by anonymous ftp from the
following site

Files: ttexe167.zip and ttdoc167.zip (executable and


documenation)
ftp://ftp.netcom.com/pub/eb/ebohlman/

(Note: it is a busy ftp server.)


* Contact: Eric Bohlman

OMS Development
610-B Forest Ave., Wilmette, IL 60091
Ph: (800)831-0272 Fax: 708-251-5793
Outside North America: (708)-251-5787
Email: ebohlman@netcom.com

TrueTalk

* Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or


SGI Indy/Indigo/Indigo2 with IRIX 5.2. More platforms in
development.
* Description: Personal TrueTalk, by Entropic Research Laboratory,
Inc., is an all-software Text-to-Speech (TTS) system designed to
voice-enable UNIX X-Windows workstations. It combines a graphical
interface with a powerful TTS engine based on technology developed
by AT&T Bell Laboratories. Features include:
+ Intelligible, prosodically natural speech.
+ Text taken from file input, highlighted X selections, the
interface scratch pad, other programs connected through a
TCP/IP socket, or Tcl/Tk applications via the Tk "send"
mechanism.
+ Stop, pause and resume while speech is in progress.
+ Visual indication of corresponding text position when paused.
+ Nine speaking voices, with Male and Female versions of each
voice.
+ Adjustable speaking rate and volume.
+ Supports drop-in text filters; "email" and "lively" examples
included.
+ Audio output through workstation headphones or speaker.
+ Complete on-line documentation, including mouse-activated
help windows.
* Misc: A more detailed description of TrueTalk is available on the
Entropic WWW server: http://www.entropic.com/truetalk.com
* Availability: You can obtain Personal TrueTalk through the
Internet. For details, see

ftp://ftp.entropic.com/pub/truetalk/README.ptt

Personal TrueTalk is available free of charge for evaluation


purposes. You can fully-enable your evaluation copy at any time by
purchasing a license key from Entropic.
* Requirements: 12MB disk space, 8MB process size (24MB system RAM
recommended).
* Cost: US$495; US$395 academic
* Contact: Entropic Research Laboratory, Inc.,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (25 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Washington, D.C.
Voice: 1-800-ENTROPIC (North America), (202) 547 1420
Fax: (202) 547-6648
Email: truetalk@entropic.com
WWW: http://www.entropic.com/

TruVoice from Centigram

* Platform: Windows-NT, Windows 95, Windows 3.1 (limited release),


Sun Solaris 2.x
* Description: TruVoice., an advanced text-to-speech converter, is
available for multiple environments. TruVoice converts text into
spoken language. TruVoice adds intelligible, natural-sounding
speech to sound enabled platforms.
+ Small, 1.5MB, memory footprint
+ Advanced text pre-processing
+ No vocabulary restrictions
+ User-definable pronunciation dictionary
+ Accurately pronounces surnames and place names
+ Preprocessor provides e-mail and spreadsheet reading
capabilities and expands abbreviations.
+ Multiple languages available: American English, Latin
American Spanish, German, French, Italian
+ Flexible pitch, volume and speech rate
+ Intonation support for punctuation
+ Supports navigational capabilities such as, pause, resume and
jump forward / jump back with sentence or word boundaries
More detailed information is provided in the brochure page on the
Centigram WWW site.
A demonstration of TruVoice is available on the Centigram WWW
pages.
* Cost:
+ Windows versions are $495 for the SDK
+ Solaris versions are $995
+ Contact Centigram for other pricing.
* Contact: TruVoice Sales
Centigram Communications Corporation
91 East Tasman Drive, San Jose, CA 95134
Ph: (408) 944-0250 Fax: (408) 428-3732
Demo: 800-746 1632
Email: webmaster@centigram.com
WWW: http://www.centigram.com/

WinSpeech

* Platform: Windows
* Description: WinSpeech is a text-to-speech application that reads
text and produces speech to the audio output. Features basic text
editing tools, talk from editing window, DDE server allows other
Windows applications to send text for talking, coach mode for
providing audio instructions throughout the program, dictionary
editing tools for customizing pronunciation.
WSPLIB text-to-speech DLL is a speech functions library for
developers. More information available by email.
* Requirements: System requirements: IBM PC or compatible computer
with Windows 3.1 or higher. Sound card is recommended but not
required.
* Availability: Freeware available through the PC WholeWare WWW
page.
* Contact: PC WholeWare
33 Justin Street, Lexington, MA 02173, U.S.A.
Email: info@pcww.com

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (26 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

WWW: http://www.pcww.com/index.html

WreadFiles: File reader for Commodore Amiga

* Platform: Commodore Amiga


* Description: WreadFiles is a vocal text file reader program for
use on the Commodore Amiga. The text is printed to the screen and
spoken. Features include:
+ Text is read in sentences rather than lines.
+ Dynamic Speech Correction on over 4000 word or word
fragments.
+ Pronunciations for many place names, personal names, foreign
names, foreign expressions and abbreviations.
+ Run from Workbench or CLI.
+ Used with A1000 (OS 1.3), A3000 (OS 2.04-2.1), and A4000 (OS
3.0)
* Requirements: Standard Amiga Translator.library and
Narrator.device required. 2.04 versions recommended. 1 Meg or more
ram recommended. External speakers required.
* Availability: No fee requested for non-commercial use. From:
+ GEnie: Page 555,3 File Number 24627
+ Aminet
ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha
* Contact: Written by Michael L. Barlow
Email: M.Barlow1@GEnie.geis.com or mbarlow@pacific.telebyte.com or
MikeB@cuix.pscu.com

ZMD Speech Synthesis

"Speaky" Speech Synthesis from ZMD

* Platform: DSP solution for platform independent speech synthesis


implementation
* Description: "Speaky" provides German speech synthesis system in a
DSP solution. It includes pre-processing of input ASCII text with
unlimited vocabulary, both parametric and non-parametric speech
synthesis algorithms, and prosody modelling. More detailed
information and audio samples can be found at the ZMD WWW Site.
* Contact: Zentrum Mikroelektronik Dresden GmbH
Grenzstrasse 28, D-01109 Dresden, Germany
Ph: +49-351-8822-306, Fax: +49-351-8822-337
Email: assp@zmd-gmbh.de
WWW: http://www.zmd-gmbh.de/

ZMD PCMCIA Speech Synthesis Card

* Platform: MS-DOS, Windows


* Description: Complete text-to-speech synthesis system for the
German language with unlimited vocabulary using VOICE Processor
"Speaky". The required pre-processing of the input ASCII text is
performed by a software programm that is downloaded automatically
from the PCMCIA Speech Synthesis Card during the card's
initialising routine. Headphone or active loudspeaker can be
connected directly for signal output. More detailed information
and audio samples can be found at the ZMD WWW Site.
* Requirements: PC Card slot, Card & Socket Services Software
* Contact: Zentrum Mikroelektronik Dresden GmbH
Grenzstrasse 28, D-01109 Dresden, Germany
Ph: +49-351-8822-306, Fax: +49-351-8822-337
Email: assp@zmd-gmbh.de
WWW: http://www.zmd-gmbh.de/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (27 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

___________________________________________________________________________

Speech Recognition

comp.speech FAQ Section 6

* SpeechLinks: Speech Recognition


* Q6.1: What is speech recognition?
* Q6.2: How is speech recognition performed?
* Q6.3: How can I build a simple speech recogniser?
* Q6.4: References & books on speech recognition
* Q6.5: Speech Recognition Hardware/Software
* Q6.6: Speaker Recognition (Verification and Identification)
* Q6.7: Integrated Speech Products

___________________________________________________________________________

Q6.1: What is speech recognition?

Automatic Speech Recognition

Automatic speech recognition is the process by which a computer maps


an acoustic speech signal to text.

Automatic speech understanding is the process by which a computer maps


an acoustic speech signal to some form of abstract meaning of the
speech.

What does speaker dependent / adaptive / independent mean?

A speaker dependent system is developed to operate for a single


speaker. These systems are usually easier to develop, cheaper to buy
and more accurate, but not as flexible as speaker adaptive or speaker
independent systems.

A speaker independent system is developed to operate for any speaker


of a particular type (e.g. American English). These systems are the
most difficult to develop, most expensive and accuracy is lower than
speaker dependent systems. However, they are more flexible.

A speaker adaptive system is developed to adapt its operation to the


characteristics of new speakers. It's difficulty lies somewhere
between speaker independent and speaker dependent systems.

What does small/medium/large/very-large vocabulary mean?

The size of vocabulary of a speech recognition system affects the


complexity, processing requirements and the accuracy of the system.
Some applications only require a few words (e.g. numbers only), others
require very large dictionaries (e.g. dictation machines). There are
no established definitions, however, try

* small vocabulary - tens of words


* medium vocabulary - hundreds of words
* large vocabulary - thousands of words
* very-large vocabulary - tens of thousands of words.

What does continuous speech or isolated-word mean?

An isolated-word system operates on single words at a time - requiring


a pause between saying each word. This is the simplest form of
recognition to perform because the end points are easier to find and
the pronunciation of a word tends not affect others. Thus, because the
occurrences of words are more consistent they are easier to recognise.

A continuous speech system operates on speech in which words are


connected together, i.e. not separated by pauses. Continuous speech is

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (28 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

more difficult to handle because of a variety of effects. First, it is


difficult to find the start and end points of words. Another problem
is "coarticulation". The production of each phoneme is affected by the
production of surrounding phonemes, and similarly the the start and
end of words are affected by the preceding and following words. The
recognition of continuous speech is also affected by the rate of
speech (fast speech tends to be harder).

___________________________________________________________________________

Q6.2: How is speech recognition performed?

A wide variety of techniques are used to perform speech recognition.


There are many types of speech recognition. There are many levels of
speech recognition / analysis / understanding.

Typically speech recognition starts with the digital sampling of


speech. The next stage is acoustic signal processing. Most techniques
include spectral analysis; e.g. LPC analysis (Linear Predictive
Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling
and many more.

The next stage is recognition of phonemes, groups of phonemes and


words. This stage can be achieved by many processes such as DTW
(Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural
Networks), expert systems and combinations of techniques. HMM-based
systems are currently the most commonly used and most successful
approach.

Most systems utilise some knowledge of the language to aid the


recognition process.

Some systems try to "understand" speech. That is, they try to convert
the words into a representation of what the speaker intended to mean
or achieve by what they said.

___________________________________________________________________________

Q6.3: How can I build a simple speech recogniser?

QUICKY RECOGNIZER sketch:

Doug Danforth provides a detailed account in article 253 in the


comp.speech archives. A summary is provided below. It is also
available by anonymous ftp

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechReco
gnition

This is a simple recognizer that should give you 85%+ recognition


accuracy. The accuracy is a function of the words you have in your
vocabulary. Long distinct words are easy. Short similar words are
hard. You can get 98+% on the digits with this recognizer.

Overview:

* Find the begining and end of the utterance.


* Filter the raw signal into frequency bands.
* Cut the utterance into a fixed number of segments.
* Average data for each band in each segment.
* Store this pattern with its name.
* Collect training set of about 3 repetitions of each pattern
(word).
* Recognize unknown by comparing its pattern against all patterns in
the training set and returning the name of the pattern closest to
the unknown.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (29 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Many variations upon the theme can be made to improve the performance.
Try different filtering of the raw signal and different processing
methods.

Public Domain Recognition Software

Q6.5 contains information on public domain speech recognition software


including: Lotec and Myers' Hidden Markov Model software.

Discrete Hidden Markov Model Demonstration Software

Hidden Markov Models (HMMs) are widely used in speech recognition


systems. Joe Picone has put together some demonstration software for
basic discrete HMMs including Viterbi and Baum-Welch training and
evaluation, random sequence generation (generating data from a model),
and model updating (useful for incremental training). There is a
simple demo program that supports all of these modes from command line
arguments. This allows experiments to test the classic coin-toss
examples commonly described in textbooks. The code closely parallels
the following textbook:

* J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time


Processing of Speech Signals, MacMillan, 1993, ISBN:
0-02-328301-7.

The code is written in C++ and is intended to facilitate learning and


understanding of the algorithms. The code is available on the ISIP web
site:
http://www.isip.msstate.edu/software/

Lecture notes corresponding to the examples are also available:


http://www.isip.msstate.edu/publications/1996/speech_recognition_short
_course

___________________________________________________________________________

Q6.4: References & books on speech recognition

* Product Reviews and Comparisons


* Using Speech Recognition: Health Issues
* On the WWW
* Technology: General and Introductory
* Technical
* Course Notes
* Bibliographies and Reference Lists

Product Reviews and Comparisons

* "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
* "Seybold Report on Desktop Publishing" published a nine-page,
head-to-head comparison of Dragon's DOS software with IBM's OS/2
software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
19063 USA, phone (610) 565-2480.
* McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
published a two-page review of IBM's Personal Dictation System
software. May 1994; Volume ?, Number ?; Pages 145-146;
ISSN:0360-5280; Editorial, Executive, and Circulation address: One
Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?

Using Speech Recognition: Health Issues

* The National Center for Voice and Speech provides some basic
information on preserving "Vocal Health" on their WWW site:
http://www.shc.uiowa.edu/hygiene/home.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (30 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Voice Users Mailing List: detail in Q1.4.html of the FAQ.


* Typing Injury FAQ: http://www.cs.princeton.edu:80/~dwallach/tifaq/
has a range of information on Typing Injuries, avoiding them,
alternatives and more.
* Typing Injuries Page:
http://alumni.caltech.edu/~dank/typing-archive.html has links to
dozens of useful resources.
* Voice Problems -- Prevention and Correction: advice on preserving
your voice with specific hints for using speech recognition.
ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-problems
* " Talking to a PC May Be Hazard To Your Throat", by Julie Chao in
the Wall Street Journal.
* " Talking to Computers Has its Hazards", by Gordon Arnaut in The
Globe and Mail

On the WWW

* Survey of the State of the Art in Human Language Technology:


Report edited by Ronald A. Cole et. al. with a section on Spoken
Input Technologies.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html

Technology: General and Introductory

Some general introduction books on speech recognition technology:

* Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang


Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
Series), c1993, ISBN 0-13-015157-2
* Speech recognition by machine; W.A. Ainsworth London: Peregrinus
for the Institution of Electrical Engineers, c1988
* Speech synthesis and recognition; J.N. Holmes Wokingham: Van
Nostrand Reinhold, c1988
* Speech Communication: Human and Machine, Douglas O'Shaughnessy;
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
* Electronic speech recognition: techniques, technology and
applications, edited by Geoff Bristow, London: Collins, 1986
* Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu
Lee. San Mateo: Morgan Kaufmann, c1990

Technical

* Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki,


M.A. Jack. Edinburgh: Edinburgh University Press, c1990
* Speech Recognition: The Complete Practical Reference Guide; T.
Schalk, P. J. Foster: Telecom Library Inc, New York; ISBN
O-9366648-39-2; 377 pages; paperback only. Covers speech
recognition in a telephony environment and wish to use call
processing hardware based in PCs. It is written using Dialogic
hardware as the example for the hardware.
* Automatic speech recognition: the development of the SPHINX
system; by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
* An Introduction to the Application of the Theory of Probabilistic
Functions of a Markov Process to Automatic Speech Recognition, S.
E. Levinson, L. R. Rabiner and M. M. Sondhi; in Bell Syst. Tech.
Jnl. v62(4), pp1035--1074, April 1983
* Review of Neural Networks for Speech Recognition, R. P. Lippmann;
in Neural Computation, v1(1), pp 1-38, 1989.
* Automatic Speech and Speaker Recognition: Advanced Topics, C.H.
Lee, F.K. Soong and K.K. Paliwal (Eds.), Kluwer, Boston, 1996.

Course Notes

* Joseph Picone of the Institute for Signal and Information


Processing (ISIP) at Mississippi State University has put the

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (31 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

course notes for "Fundamentals of Speech Recognition" on the WWW.


The course covers background probability and phonetics/acoustics,
speech signal analysis, dynamic programming, dynamic time warping,
hidden Markov modelling, language modelling, neural networks, etc.
The WWW sites provides the syllabus and lecture notes.
WWW: http://www.isip.msstate.edu/publications/1996/ee_8993/

Bibliographies and Reference Lists

* WWW searchable online-bibiliography for Phonetics and Speech


Technology with more than 8000 entries. Provided by Institut fur
Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
* Computational Speech Processing: Speech Analysis, Recognition,
Understanding, Compression, Transmission, Coding, Synthesis ; Text
to Speech Systems, Speech to Tactile Displays, Speaker
Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html

___________________________________________________________________________

Q6.5: Speech Recognition Hardware and Software

The number of speech recognition packages, and the information about


the software is changing rapidly. Any help with keeping this
information up to date will be appreciated.

* Products in the FAQ


* Speech Recognition Processors (ICs)
* Recognition Information on the WWW
* Speech Recognition Resellers and Value-Add

In the FAQ:

The following speech recognition software/hardware is described in the


comp.speech FAQ.

_Apple Macintosh_
* Digital Dreams Speech Recognition Plug-Ins
* Dragon Dictation Products
* Macintosh Speech Recognition Manager
* PowerSecretary

_Windows (including 95, NT, 3.1)_


* AT&T Watson Speech Recognition
* Cambridge Voice for Windows
* CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
* DragonDictate for Windows
* Dragon Dictation Products
* Dragon Developer Tools
* Ficomp Interpreter 6000
* IBM VoiceType Dictation and Control
* IN CUBE
* Kurzweil Speech Recognition (2 products)
* Lernout & Hauspie ASR SDK
* Listen for Windows 2.0 from Verbex Voice Systems
* Microsoft Speech Recognition
* NCC Dictate
* Phonetic Engine 500 (PE500) from Speech Systems, Inc.
* Philips Speech Recognition (2 products)
* ProNotes Voice Tools
* PureSpeech
* smARTspeak from Advanced Recognition Technologies, Inc.
* Visual Voice from Stylus Innovation

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (32 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* VoiceAssist for Windows from Creative Labs, Inc.


* VoiceServer for Windows
* Whisper
* WildCard Speech Products

_DOS_
* DATAVOX - French
* Dragon Developer Tools
* Ficomp Interpreter 6000
* Jialong He's Speech Recognition Research Tool
* smARTspeak from Advanced Recognition Technologies, Inc.
* Votan VPC2100 Voice Card and VSP 1010 Speech Processor

_OS/2_
* IBM VoiceType Dictation and Control

_Unix_
* AbbotDemo
* BBN Hark Telephony Recognizer
* EARS: Single Word Recognition Package
* Ficomp Interpreter 6000
* Hidden Markov Model Toolkit (HTK) from Entropic
* IN CUBE
* Jialong He's Speech Recognition Research Tool
* Lotec Speech Recognition Package
* Myers' Hidden Markov Model software
* NICO Artificial Neural Network Toolkit
* Nuance Speech Recognition System
* PureSpeech
* recnet

_Integrated Circuits and Dedicated Hardware_


* HM2007 - Speech Recognition Chip
* OKI VRP6679 - Speech Recognition Chip
* Sensory Inc. Integrated Circuits
* Speech Commander - Verbex Voice Systems
* Voice Control Systems Recognition
* VCS 2030 & 2060 Voice Dialer

_Other Platforms_
* Simon Says (NeXT)
* Voice Command Line Interface (Amiga)
* Visus SpeechKit

_Unknown_
* Berkeley Restaurant Project (BeRP)
* Lernout & Hauspie ASR (3 products)
* Voice-Trek 2.0
* Voicetek Corp.
* Voice Processing Corporation Speech Recognition Product Line

Speech Recognition Processors (ICs)

Jean-Pierre Lereboullet has put together a detailed list of Voice


Recognition Processors which covers about 15 ICs and pieces of related
hardware (including D6106, HM2007, MSM6679, RSC-164, TC8860F/64F/65F,
5A128).
The document is available on the comp.speech ftp server:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProce
ssors

Recognition Information on the WWW

In addition to the entries on speech recognition in this FAQ, the


following WWW sites provide information on speech recognition:

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (33 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Commercial Speech Recognition: Russ Wilcox of PureSpeech Inc.

http://www.tiac.net/users/rwilcox/speech.html

Macintosh Speech Resources and Apps


http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

Speech Recognition Information: 21st Century Eloquence


http://www.voicerecognition.com/

Applied Speech Technology Laboratory of CLSI at Stanford


http://csli-www.stanford.edu/users/bscott/SRTech.html

Speech Toys Speech Recognition Page


http://www.speechtoys.com/spchtoys/sprec.html

Speech recognition product lists: postings to comp.speech


ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognit
ionProducts

Search Alta Vista for Speech Recognition

Search Lycos for Speech Recognition

Yahoo pages on Speech Recognition


http://www.yahoo.com/business/corporations/computers/software/v
oice_recognition/
http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
igence/Natural_Language_Processing/Speech_Recognition/

Speech Recognition Resellers and Value-Added Services

1stVoice
2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
Ph: 415-857-1320, Fax: 415-856-6996
WWW: http://www.1stvoice.com/
Email: mail@1stvoice.com
Dragon Dictation Products

21st Century Eloquence


325-A Royal Poinciana Plaza, Palm Beach, Florida 33480, USA
Ph: 800-245-2133, Fax: 407-835-4901
WWW: http://www.voicerecognition.com/
Kurzweil, IBM VoiceType, Dragon, Kolvox

Auscript (Australia)
Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000,
Australia
Ph: +61-2-238 6565, Fax: +61-2-238 6566
WWW: http://www.auscript.com.au/
Dragon Systems

BRITE
WWW: http://www.brite.com/
Computer Telephony Integration & Interactive Voice Response

DAX Systems, Inc.


30 Chapin Road, Unit 1201, P.O. Box 778, Pine Brook, NJ/USA
07058
Ph: +1-201-227-8111, Fax: +1-201-227-8197
Email: info@daxsystems.com
WWW: http://www.daxsystems.com/
Computer Telephony and Integrated Voice Response

HealthCare Resources
1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
Ph: +1-310-937-5156, Fax: +1-310-937-5159

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (34 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

EMail: Scalif@AOL.COM
Power Secretary & Dragon Dictate. Specializing in:
Medical/Dental, Motion Picture Industry, Carpal Tunnel related
and Disabled Persons.

O'Brien Resources
Ph: (540) 347-4988 (Address unknown)
Email: obrien@crosslink.net
WWW: http://www.crosslink.net/~obrien/
Kurzweil Voice Recognition Products

SCI VoiceAutomated
215 1/2 Main Street, Huntington Beach, CA 92648, USA
Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
http://www.voiceautomated.com/
IBM VoiceType, Kurzweil Voice, DragonDictate and Philips
speech.

Synapse
3095 Kerner Blvd., Suite S, San Rafael, CA 94901, USA
Ph: (415) 455-9700, Fax: (415) 455-9801
Email: SYNAPSE_ADAPTIVE@msn.com
WWW: http://www.synapseadaptive.com/
Dragon Systems, Kurzweil and IBM products.

Talk Technology
Ph: 1-800-270-1672, Fax: 1-516-360-1213
Email: info@talktechnology.com
http://www.talktechnology.com/

Talk Technology, Inc.


Tel: +1-718-745-9199, Fax: +1-718-499-6480
Email: mnm@pipeline.com
WWW: http://www.usbusiness.com/talk/
Dragon Dictate and portable (notebook) solutions

ToppCopy Telecom
Email: ffalzett@toppcopy.com
WWW: http://www.toppcopy.com/
Philips Digital Dictation

VoiceWare Systems
230 California Street, Suite 410, San Francisco, CA 94111
Ph: (415) 433-2001, Fax: (415) 433-6909
Email: info@talk2type.com
WWW: http://www.talk2type.com/home.htm
IBM, Dragon Systems, Kurzweil Applied Intelligence, WildCard
Technologies

WorkLink
A.D.A. Solutions by WorkLink
2566-A Telegraph Avenue, Berkeley, California 94704 USA
Ph: 510-848-8363, Fax:510-848-7322
WWW: http://www.worklink.net/
Email: wayne@worklink.net
Dragon Dictation Products

AbbotDemo

* Platform: SunOS4, IRIX, Linux, HU-UX


* Description: Large vocabulary, speaker independent, continuous
automatic speech recognition system. Uses recurrent neural
networks and hidden Markov models with a 5,000 word vocabulary
upgradable) and a trigram word grammar. Includes a front end for
waveform capture and display (including spectrogram) and a

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (35 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

graphical display of the phoneme representation as well as a


rewriting display of the best guess word sequence.
* Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster
processor, 16 bit soundcard, reasonable quality microphone and a
copy of the Wall Street Journal newspaper.
* Price: Free for non-commercial use
* Availability: By anonymous ftp from

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo

* Note 1: This is not a complete system for dictation.


* Note 2: At present there are no sources with this distribution.
For sources for an earlier version see the recnet entry.
* Note 3: Not supported.
* Contact: AbbotDemo@compute.demon.co.uk
Tony Robinson
Cambridge University Engineering Department
Trumpington Street, Cambridge, CB2 1PZ, UK
Tel: +44-1223-332815 Fax: +44-1223-332662

AT&T Watson Speech Recognition

* Platform: Windows 95/NT on a Pentium 75 Mhz or higher


* Description: Watson is a software implementation of AT&T Bell
Laboratories voice processing technology. Watson includes BLASR
Speech Recognition and FlexTalk speech synthesis (see Q5.5). It
requires no special hardware to run other than a standard sound
card and/or phone card. Technical details for BLASR Speech
Recognition include:
+ Compliant with Microsoft Speech API and Telephone API
+ Speaker independent, continuous speech recognition
+ Fast, run-time vocabulary change
+ Open mic and telephone line environments
+ SoundBlaster compatible sound card and drivers required
+ Subword models and whole-word digit models
+ Background, silence, and filler/garbage models
+ 50 word name vocabulary or 100 word phrase real-time
recognition with 95% accuracy
+ Rejection of out-of-vocabulary words
+ American English only - other languages in development
+ Barge-in speech begin/end notification - requires hardware
echo cancellation
The AT&T Advanced Speech Products Group home page provides more
detailed information including a Frequently Asked Questions list,
information for application developers on the Independent Software
Vendor (ISV) Program (including info on the SDK, licensing, and
the training program).
* Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
or higher CPU (uses
* Cost and Availability: WATSON is a software-based speech platform
with a Software Developers Kit (SDK) that allows application
developers to use voice processing in their applications. It is
not available as a stand-alone product.
Licensing information (inc. price) is provided in the AT&T
Advanced Speech Products Group home page
* See also: Watson FlexTalk speech synthesis in Q5.5, Microsoft
Speech API, and Advanced Speech API.
* Contact: AT&T Advanced Speech Products Group
Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
Email: aspg@attmail.com
WWW: http://www.att.com/aspg/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (36 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

BBN Hark Telephony Recognizer

* Platform: Available for Unix-based workstation and PC platforms


including IBM RS6000/AIX and Pentium/SCO Unix.
* Description: Large vocabulary (2,000+ words), speaker independent,
continuous ASR software. Specifically designed for large scale
telephony applications. Using a client/server architecture, all
features and capabilities are integrated in one software product
instead of on separate boards. Very memory efficient, the Hark
Telephony Recognizer runs in as little as 2MB of physical memory.
Multiple recognizers can be run on a single platform. Uses Hidden
Markov Model and phoneme-based BBN recognition algorithms. An API
is provided for integration with existing applications. A
developer's toolkit is available.
* Price and availability: Price varies depending on vocabulary size.
Version 3.0 available immediately.
* Misc: BBN Hark provides application design and human factors
consulting services. Regular monthly training classes on
developing speech-enabled applications are held at BBN Hark's
Cambridge (Mass) headquarters.
* WWW: For additional information see BBN Hark's home page.
* Contact: BBN Hark Systems
70 Fawcett Street, Cambridge, MA 02138, USA
Tel: 617-873-4636 Fax: 617-873-2473
WWW: http://www.bbn.com/bbn_hark/HarkHome.html

Berkeley Restaurant Project (BeRP)

* Description: BeRP is a test bed for a speech recognition system


being developed by the International Computer Science Institute in
Berkeley, CA. BeRP is a medium-vocabulary, speaker-independent
spontaneous continuous speech understanding system. BeRP functions
as a knowledge consultant whose domain is the restaurants in the
city of Berkeley. The system serves as a testbed for several
research projects, including robust feature extraction,
connectionist phonetic likelihood estimation, automatic induction
of multiple pronunciation lexicons, foreign accent detection and
modeling, advanced language models, and lip-reading.
* Note: As far as I know the BeRP software is in-house software -
that is, it is not made available for distribution.
* More information: http://www.icsi.berkeley.edu/real/berp.html

Cambridge Voice for Windows

* Platform: Windows
* Description: Speaker-independent recognition of continuous speech
in real time. Vocabularies can range from small to very large
(more than 60,000 word forms). Support is planned for languages
including English, Danish, Dutch, French, German, Italian,
Norwegian, Spanish, Swedish, and Japanese. The engine complies
with the Microsoft Speech API.
* Contact: Cambridge Group Research, Ltd.
Box 7290, Buffalo Grove, IL 60089
Ph: (708) 821-1040, Fax: (708) 821-1041
E-mail: 76061.3350@compuserve.com

CustomVoice and CustomTelephone: A&G Graphics Interface Inc.

* Platform: Windows
* CustomVoice: Speech recognition custom control for Visual Basic,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (37 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Visual C++, Borland C++, and other development platforms that


support *.VBX. Provides an engine/proprietary independent
development platform for speech recognition. Currently supports
ICSS, but should soon support other platforms. Includes a grammar
debugger and parser APIs to parse spoken speech into useful data
types.
Requirements: 486/DX or better PC, Windows 3.1 or Windows for
Workgroups, 8Mb RAM (minimum), SoundBlaster 16, microphone, and
mouse. Supports Visual Basic, Visual C++, Borland C++, and Delphi.
* CustomTelephone: Windows-based developers tool that allows
programmers to build speech enabled "telephony" applications via
standard custom control properties (VBX). It supports IBM
VoiceType Application Factory (VTAF), a continuous speech, speaker
independent speech recognizer, and supports voice response boards
such as Dialogic. Comes with a VB custom control, pre-built
grammar sets for common data types, an interactive grammar
debugger to identify valid speech patterns, and parser API
functions that convert recognized speech into data types supported
by VB, C++ and Delphi. Includes sample applications with source
code, and VBX, VCL and DLLs. Bundled with speech recognition
engines.
Requirements: 486/DX or better, Windows 3.1 or Windows for
Workgroups, 8Mb RAM (minimum), SoundBlaster or compatible sound
card, Dialogic D2X or D4X board, and mouse. Microphone and speaker
optional. Supports Visual Basic, Visual C++, Borland C++, and
Delphi.
* Contact: A&G Graphics Interface
51 Gore Street, Cambridge, MA 02141-1213 , USA
Ph: +1-617-492-0120, Fax: +1-617-427-2133
Email: customvc@world.std.com
CompuServe: 74774,273 CompuServe ( GO SPEECH )
WWW: http://www.customvoice.com/

DATAVOX - French

* Platform: PC / DOS
* Description: Continuous speech - speaker independent or dependent.
* Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
A/D - D/A module (ASA116)
* Misc: Application software may dialog with DATAVOX through 2 types
of interfaces :
+ Keyboard overlay: The application software may be used with
any PC compatible package. No specific adaptation is
necessary, you only need to define your configuration with
the application software.
+ C library: Allows a user-written program to drive the
recognition system.
DATAVOX is based on the AMADEUS speech recognition software
developed at LIMSI. It provides
+ Continuous speech recognition with 500 words speaker
dependent, 50 words speaker independent (custom-made
vocabulary).
+ Grammar of the application language (syntax acquisition,
verification and simplification software).
+ Large vocabulary : DATAVOX can recognize vocabularies of
several thousand words as long as there are no more than 500
words in the active vocabulary at any given node. It takes
less than 1 second to change syntax and vocabulary.
+ Training controlled by the system (use of co-articulation
models).
+ Response time less than 500 ms for any phrase length.
+ Synthetis (ADPCM) can be heard simultaneously while
recognition is being carried out.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (38 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Contact: VECSYS
Le Chene rond, 91570 Bievres, France
Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30

Digital Dreams Speech Recognition Plug-Ins

* Platform: Apple Macintosh


* Description (General): A suite of speech plug-ins for the
interactive multimedia market which enable developers to quickly
incorporate speech recognition into their titles without having to
resort to a low-level programming language, such as C. Speech
plug-ins bridge the gap between a speech recognition API, such as
Apple's PlainTalk Speech Recognition technology, and
authoring/development environments, such as Macromedia Director or
HyperCard. Digital Dreams currently offers Macintosh speech
plug-ins for Macromedia Director and HyperCard. Support for other
environments, including AppleScript, Apple Media Tool, Authorware,
and Windows is being developed. Currently available for North
American Adult English. More information is available on the
Digital Dreams WWW site.
* ShockTalk: is a combination of Netscape, ShockWave and Speech
Recognition technologies for the Power Macintosh and Quadra AVs
that enables you to navigate web sites and hyperlinks using spoken
commands as well as create shockwave movies that respond to spoken
user interactions.
* Requirements: Power Macintosh (PowerPC w/ MacOS)
Microphone (PlainTalk compatible)
PlainTalk Speech Synthesis and PlainTalk Speech Recognition
Netscape Navigator
* Contact: Digital Dreams
4308 Harbord Drive, Oakland, CA, 94618, USA
Tel: (510) 547-6929 Fax: (510) 547-6799
email: dreams@surftalk.com
WWW: http://www.surftalk.com/
FTP: ftp://ftp.surftalk.com/

DragonDictate for Windows

* Platform: Windows
* Description: Information moved to the page on Dragon Dictation
products including DragonDictate for Windows

Dragon Dictation Products

* Dragon NaturallySpeaking
* DragonDictate for Windows
* Dragon PowerSecretary
* General Information

Dragon NaturallySpeaking

* Platform: Windows
* Description: General purpose, continuous speech dictation system.
Personal Edition has a 30,000 word active vocabulary and comes
with a 200,000+ word pronunciation dictionary; users can also add
their own words or phrases.
More information on Dragon's NaturallySpeaking web site.
* Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM
(Windows NT 4.0), supported sound card.
* Price: see Dragon's NaturallySpeaking web site.
* Related products: see general information below

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (39 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Contact: see general information below

DragonDictate for Windows

* Platform: Windows
* Description: Speech-to-text dictation system. Discrete dictation;
continuous command/control; speaker-adaptive. Also provides mouse
movement for hands-free operation of Windows. Comes with a 120,000
word pronunciation dictionary; users can also add their own words
or phrases. Dictate directly into any application. Available in US
and UK English, French, Italian, German, Spanish, and Swedish.
Add-on vocabularies for medicine, law, business and finance,
computers and technology, journalism.
Available as DragonDictate Singles Editions (10,000 words active),
DragonDictate Personal Edition (10,000 words active),
DragonDictate Classic Edition (30,000 words active), DragonDictate
Power Edition (60,000 words active).
Includes Office97 support.
More information on the Dragon Systems web site.
* Requirements: 486/66, 7-10 MB dedicated RAM (depending on
edition), Windows 3.1x, NT 3.51, or 95.
Supported sound boards: Creative Labs Sound Blaster 16, Microsoft
Windows Sound System, IBM M-Audio Capture/Playback Adapter, many
notebooks with built-in audio.
See Dragon Systems Compatibility list for details.
* Price: Check at the Dragon Systems web site.
* Related products: see general information below
* Contact: see general information below

Dragon PowerSecretary

* Platform: Apple Macintosh


* Description: Speaker dependent/adaptive system requiring words to
be separated by short pauses. Available as PowerSecretary Power
Edition, Personal Edition, PowerSecretary MED for Healthcare
Professionals.
Vocabulary: 30,000 - 60,000 at any one time, automatically
selected from 120,000-word dictionary.
* Requirements: Power Macintosh 6100, 7100, 8100, Performa 6100
series, Powerbook 540, 68040 class Macintosh such as Quadra 660AV,
700, 800, 840AV, 900, 950, Centris 650 and 660AV.
Hard Disk with at least 25Mb free.
System 7.5 or greater
(Some systems require add-on hardware)
* More information: PowerSecretary home page
* Related products: see general information below
* Contact: see general information below

General Information

Dragon Dictation Products

* Dragon NaturallySpeaking
* DragonDictate for Windows
* Dragon PowerSecretary
* General Information

Dragon Developer Products

* Dragon PhoneQuery
* DragonXTools
* Dragon SpeechTool
* Dragon VoiceTools

Related Web Sites

* Simon Crosby's FAQ for DragonDictate

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (40 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Contact:

* Dragon Systems, Inc.


320 Nevada Street, Newton, MA 02160, USA
Tel: 1-617-965-5200 or 1-800-TALK-TYP
Fax: 1-617-527-0372
Email: info@dragonsys.com
WWW: http://www.dragonsys.com/
CompuServe: GO DRAGON

Dragon Developer Tools

* Dragon PhoneQuery
* DragonXTools
* Dragon SpeechTool
* Dragon VoiceTools

Dragon PhoneQuery

* Platform: Windows NT
* Description: Software for building voice response systems. Callers
are able to do the following: Ask for information using completely
natural and continuous language. Have a spoken dialog to fine tune
a request. Request information to be faxed, sent by electronic
mail, or read over the phone, using text-to-speech.
More information on the Dragon Systems telephony pages.
* Requirements: Pentium or Pentium Pro PC running Windows NT 4.0.
Telephone interconnect requirements vary by application.
* Related products: see general information below
* Contact: see general information below

DragonXTools

* Platform: Windows
* Description: VBX and OCX controls that allow an application to
control DragonDictate's capabilities, ranging from small
vocabulary command and control to customized large vocabulary
dictation. More information is available on the Dragon Developer
pages
* Related products: see general information below
* Contact: see general information below

Dragon SpeechTool

* Platform: Windows
* Description: Create small, optimized vocabularies for your
speech-enabled applications, or supplement DragonDictate's
extensive built-in vocabularies with specialized terms and names.
More information is available on the Dragon Developer pages
* Related products: see general information below
* Contact: see general information below

Dragon VoiceTools

* Platform: Windows, DOS


* Description: integrate small-vocabulary speech recognition
directly into your DOS and Windows 3.1x applications. More
information is available on the Dragon Developer pages
* Related products: see general information below
* Contact: see general information below

General Information

Dragon Dictation Products

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (41 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Dragon NaturallySpeaking
* DragonDictate for Windows
* Dragon PowerSecretary
* General Information

Dragon Developer Products

* Dragon PhoneQuery
* DragonXTools
* Dragon SpeechTool
* Dragon VoiceTools

Related Web Sites

* Simon Crosby's FAQ for DragonDictate

Contact:

* Dragon Systems, Inc.


320 Nevada Street, Newton, MA 02160, USA
Tel: 1-617-965-5200 or 1-800-TALK-TYP
Fax: 1-617-527-0372
Email: info@dragonsys.com
WWW: http://www.dragonsys.com/
CompuServe: GO DRAGON

EARS: Single Word Recognition Package

* Platform: Linux and Unixs with the Voxware sound driver


* Description: Intended as a limited ready-to-use single word
recognizer. However, its design aims at being a platform for
various kinds of methods used in speech recognition (SR). EARS is
designed to be a flexible environment for recognition system
components; for example, take this feature extractor and that
recognizing method, and this list of words. New methods for single
word recognition can be integrated easily, as EARS uses C++
abstract base classes. You speak the words you want to be
recognized later. Your utterances can be saved to RIFF WAV files
for inspection, change or delete them before they are further
processed to the pattern files on which the recognizer is finally
trained. As of version 0.20, the feature extractors are:
Rasta-PLP, PLP, LPC, Mel-Cepstrum. The implemented recognizers
are: DTW and non-recurrent neural nets on fixed-size sound
patterns.
* Requirements: Soundcard with mic
* Misc 1: The current version is an Alpha release.
* Misc 2: For more information subscribe to the EARS mailing list.
Send email to majordomo@phil.uni-sb.de with "subscribe ears-list"
in the body.
* Misc 3: Niels Thorwirth (thorwir@pi4.informatik.uni-mannheim.de)
has made changes to Version 0.14 which support the AF audio server
software (see Q1.11) and the OGI Speech Tools (see Q1.9) so that
EARS is more portable to other UNIX platforms. Available by email
to Niels.
* Requirements: Soundcard with mic
* Availability: Source and Linux binaries are available by anonymous
ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.
tar.gz
ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
* Contact: Ralf W. Stephan: ralf@ark.franken.de

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (42 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Ficomp Interpreter 6000

* Platform: DOS, Windows 3.1, Win95, Win NT, UNIX


* Description: Ficomp Systems, inc., is a systems integrator that
has developed commercial speaker-dependent, continuous-speech
recognition applications for use in high noise environments on
several platforms. Applications are specialized in the finance
industry for exchange floors, banks and brokerage firms.
* Contact: Ficomp Systems, Inc.
Ph: (732) 274-2600, Fax: (732) 274-2601
117 Docks Corner Road, Dayton, NJ 08810
E-Mail: fsisales1@aol.com
WWW: http://www.ficompsystems.com/

HM2007 - Speech Recognition Chip

* Platform: Intergrated circuit.


* Description: HM2007 is a 48-pin single chip CMOS voice recognition
LSI circuit with on-chip analog front end, voice analysis,
recognition process and system control functions. A 40 word
isolated-word voice recognition system can be composed of an
external microphone, keyboard, SRAM and a few other components.
When combined with a microprocessor, an intelligent recognition
system can be built. A demo board for this chip is being
distributed by The Summa Group.
* Cost: Approx US$16 for the HM2007 and US$160 for the demo board.
* Misc: Jean-Pierre Lereboullet's document on Voice Recognition
Processors provides additional information on the HM2007.
* Producer: HUALON Microelectronic Corp. USA
Tel: (415) 288 0390 Fax: (415) 288-0399
* Distributor 1: Marywale Engineering Company
Tel: (602) 247 4451 Fax: (602) 247 6167
Email: meco@indirect.com
* Distributor 2: The Summa Group Limited
One California Street, Suite #1940,
San Francisco, CA 94111
Ph: (415) 288-0390
* Distributor 3: Images Company
39 Seneca Loop, Staten Island, NY 10314, USA
Ph: +1-718-698-8305, Fax: +1-718-982-6145
Sells single piece quanities of HM2007 48Pin Dip Chip and HM2007
52 Pin PLCC style chip. Sells HM2007 Demo Kits unassembled $100.00
and assembled $135.00 (using 48 Pin dip chip)

Entropic's HTK (HMM Toolkit)

* Platform: Range of Unix platforms.


* Description: HTK is a software toolkit for building continuous
density HMM based speech recognisers. It consists of a number of
library modules and a number of tools. Functions include speech
analysis, training tools, recognition tools, results analysis, and
an interactive tool for speech labelling. Many standard forms of
continuous density HMM are possible. Can perform isolated word or
connected word speech recognition. It van model whole words, sub-
word units. Can perform speaker verification and other pattern
recognition work using HMMs. HTK is now integerated with the
ESPS/Waves speech research environment which is described in
Section 1.9.
* Misc 1: The availability of HTK changed in early 1993 when
Entropic obtained exclusive marketing rights to HTK from the
developers at Cambridge.
* Misc 2: More detailed information on HTK is available from the

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (43 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Entropic WW server: http://www.entropic.com/htk.html


* Cost: On request.
* Contact:

Entropic Research Laboratory,


600 Pennsylvania Ave, S.E. Suite 202,
Washington, D.C. 20003, USA
Phone: (202) 547-1420.
email - info@entropic.com
WWW: http://www.entropic.com/

IBM VoiceType Dictation

* Platform: OS/2 and Windows


* Description: IBM VoiceType Dictation supports speech input at
70-100 words a minute and can be used to control your desktop and
applications. Isolated-word, speaker-dependent system using a
speech adapter card. Available for U.S. English, U.K. English,
French, German, Italian, Spanish and Arabic. Provided with a
general office vocabulary and support for major OS/2 and Windows
applications. Additional specialised vocabularies are available:
+ US: Legal, Emergency Medicine, Radiology and Journalism
+ UK: Legal
+ IT: Radiology
* Requirements: See
http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html
* Cost: See
http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
* Misc: An IBM VoiceType Dictation FAQ is supported by UltraMedia
Systems International (a distributor of IBM VoiceType):
http://www.infi.net/~ums/ibmfaq.htm
* Demo software: Available on the IBM WWW site:
http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
* Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
Email: talk2me@vnet.ibm.com
WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html

IBM VoiceType Control (US Only)

* Platform: OS/2 and Windows


* Description: VoiceType Control is a speech recognition navigator
that lets you control programs by speaking. VoiceType Control
converts voice commands to keystroke macros. The program provides
speaker independent, continuous speech recognition, so you do not
have to train the program for your specific speech patterns.
* Requirements: ?
* Cost: ?
* Demo software:
http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
* Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
Email: talk2me@vnet.ibm.com
WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html

IN CUBE

* Platform: Three versions for Windows 95, Windows NT and Sun


SPARCstations
* IN CUBE for Windows 95: Developed for general purpose Windows 95
users. It is packaged for online distribution with a full working
demo and an option to register and unlock the full product. The
system uses Command Corp's Mark II continuous speech recognition
engine and handles changable lexicons of up to 75 commands.
+ Price: $49.95 US

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (44 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

+ Requirements: 386/25MHz processor or better, Microsoft


Windows 3.1 or later, Windows compatible sound card or
built-in audio, and microphone.
+ Availability: http://www.commandcorp.com/cci/win95.html
Demo mode available.
* IN CUBE Mark II Pro for Windows NT: IN CUBE is a continuous
realtime speech recognition system developed to provide a fast and
convenient means of window navigation and voice macro command
input for command intensive applications like CAD and publishing.
Speaker-dependent training and ability to add new commands and
macros.
+ Price: $495 including the PRO 8 microphone. $540 including
the MT 858 desk microphone.
+ Requirements: Windows NT, Windows NT-compatible audio board
(16-bit audio recommended).
+ Availability: http://www.commandcorp.com/cci/pront.html
Demo available.
* IN CUBE Voice Command for Sun SPARCstations: Provides continuous
realtime speech recognition system for window navigation and voice
macro command input to the workstation. Speaker-dependent training
and ability to add new commands and macros.
An IN CUBE Application Programming Interface is available with a
library of linkable object modules is available for developers.
+ Price: $495 per seat. The developer's API sells for $695.
+ Requirements: SUN OS 4.1.x or Solaris 2.x with OpenWindows
and Motif. Works with all audio-equipped SPARCs and clones.
Models range from SPARCStation 1s to SPARCStation 20s.
+ Availability: http://www.commandcorp.com/cci/in3sparc.html
A free 5 day evaluation license is available.
* Contact: Command Corp. Inc.,
3761 Venture Drive, PO Box 956099, Duluth, Georgia, 30136, USA
Ph: +1-770-813-8030
Email: in3@commandcorp.com
WWW: http://www.commandcorp.com/incube_welcome.html

Jialong He's Speech Recognition Research Tool

* Platform: SUN SPARC (SunOS), PC (MSDOS)


* Description: This is a speech recognition research tool. it
contains a feature extraction program and three speech
recognizers: a DTW recognizer, discrete didden Markov model (DHMM)
based recognizer and Continuous density hidden Markov mode (CHMM)
with Gaussian mixture functions based recognizer. The utilities
are grouped as:
+ feature -- extract featue vectors from a speech signal (MFCC
etc.)
+ dtwcmp -- dynamic time-wapping (DTW) comparision.
+ gensym -- turn vector sequences to discrete observation
symbols.
dhmm -- discrete HMM training program.
dtest -- DHMM companion test program.
+ chmm -- continuous density HMM training program.
viterbi -- CHMM companion test program.
Note, this is a research tool not a complete speech recognition
system.
* Availability: By anonymous ftp:

MSDOS Version
UK:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pchtool.zip
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.z

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (45 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

ip

Sun SPARC version, compiled with GNU C


UK:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pch_sun_v1.tar.gz
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun
_v1.tar.gz

* See also: Jialong He's Speaker Recognition (Identification) Tool


* Contact: Jialong He
email: jialong@neuro.informatik.uni-ulm.de

Kurzweil Voice for Windows

* Platform: Windows 3.1 or later


* Description: Kurzweil Voice for Windows is a dictation product
enabling the user to create text and enter data by speaking to
Windows-based applications. System is adaptive but requires no
initial training. Users can choose either 30,000 or 60,000 word
active vocabulary. Application command translation templates for
popular Windows application such as WordPerfect, 1-2-3, Organizer,
Word (30+ applications are listed on the Kuzweil WWW pages). More
detailed information is available on the Kurzweil WWW pages.
* Requirements: 486DX/33 or higher, 8 or 16 MB dedicated memory
(depends on vocabulary, 30 MBs dedicated disk space, VGA or
higher, Kurzweil-supplied microphone and DSP board.
* Contact:
Kurzweil Applied Intelligence, Inc.
411 Waverley Oaks Road, Waltham, MA 02154 USA
Phone: 1-800-380-1234
Email: info@kurzweil.com
WWW: http://www.kurzweil.com/

Kurzweil Clinical Reporter

* Platform: Windows 3.1 or later


* Description: Kurzweil Clinical Reporter is a voice-activated
clinical reporting system for computer-based patient records. The
family of products includes:
+ VoiceEM for emergency medicine
+ VoiceEM/TR for triage reporting
+ VoiceRAD for diagnostic imaging and radiology
+ VoicePATH for surgical and anatomical pathology
+ VoiceMED for Primary Care for family medicine, internal
medicine and pediatrics
+ VoiceORTHO for office-based orthopaedic surgery
+ VoiceCATH for invasive cardiology
+ VoiceReport for general reporting
* More information: from the Kurzweil WWW pages:
http://www.kurzweil.com/medical/
* Contact:
Kurzweil Applied Intelligence, Inc.
411 Waverley Oaks Road, Waltham, MA 02154 USA
Phone: 1-800-380-1234
Email: info@kurzweil.com
WWW: http://www.kurzweil.com/

Lernout & Hauspie ASR 1000/T and 1000/M

[Note: L&H asr200/A is described below.]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (46 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* L&H asr1000/T: ASR for the Telephony and Telecommunications Market


* L&H asr1000/M: TTS for the Computer and Multimedia Market

* Description: Automatic speech recognition software providing


continuous speech recognition, isolated word recognition, keyword
spotting or continuous digits recognition. The engine is speaker
independent, and phoneme-based with optimization for commonly used
words. General features include:
+ Languages available: US English, German, French, Spanish
(Castilian), Dutch.
+ Available vocabulary: >100,000 words.
+ Line adaptation.
+ Rejection of out of vocabulary/grammar words.
+ N-best alternatives for isolated word recognition and keyword
spotting.
+ Push to talk.
* asr1000/T
+ Single channel platform examples: Motorola 56156, TI
TMS320C2X/C3X/C5X
+ Multi-channel platform examples: TI TMS320C3X/C5X, AT&T
DSP32C/3210, Motorola 96000
+ Input: 8 kHz telephone sampling
* asr1000/M
+ Single processor platform examples: Intel 486/Pentium
+ Input: 8 kHz telephone or 11 kHz microphone sampling
* See also: L&H ASR SDK for Windows
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/asr.html
* Cost: Unknown
* Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 238 0960
Fax: (617) 238 0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Lernout & Hauspie ASR 200/A for the Automotive and Industrial Market

* Description: Automatic speech recognition software providing


isolated word recognition, keyword spotting and alphabet
recognition (optional). This engine is robust, speaker independent
and word based. Other features:
+ Vocabulary: 100 words US English
+ Voice activation detection
+ Response time
+ Platform examples: Analog Devices ADSP2101/5
+ Input: 8 kHz telephone or microphone sampling
* See also: L&H ASR SDK for Windows
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/asr.html
* Cost: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Lernout & Hauspie ASR SDK

* Platform: Windows
* Description: Windows based Software Development Kits are available
for integrating automatic speech recognition technology with

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (47 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Windows based PC applications.


* Requirements: IBM-compatible 486 DX/33 MHz + 8 MB RAM + MS DOS 5.0
+ MS Windows 3.1 (or higher) + Sound Blaster compatible sound
board.
* See also: L&H ASR Products
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/asr.html
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Listen for Windows 2.0 from Verbex Voice Systems

* Platform: Windows
* Description: Listen for Windows Version 2.0 is a Speaker
Independent software product that provides continuous speech
recognition for Windows applications. The product works with most
industry standard sound cards and PCs with inbedded audio chips.
Listen for Windows comes with over 16,000 commands in speech
interfaces for over 40 software applications, such as MS Office,
Lotus SmartSuite,Quicken, etc. The Listen Command Editor allows a
user to change or add commands to existing speech interfaces or
create new speech interfaces for most Windows applications.
More detailed information is available on the Verbex Listen for
Windows page.
Verbex also sells Verbal Advantage Voice Browser for controlling a
web browser, Verbal Advantage DeskTop for controlling desktop
applications.
* Requirements: 486/25SX PC or higher
* Pricing and Availbility: See the Verbex ordering page for pricing.
Verbex products are available over the web or can be shipped.
Microphones available from Verbex.
* Demo: A "Freeware" demo is available from the Verbex WWW site demo
page.
* Contact: Verbex Voice Systems
1090 King Georges Post Rd., Bldg 107, Edison NJ 08837, USA
Ph: 1-800-ASK-VRBX, (908) 225-5225, Fax:(908) 225-7764
WWW: http://www.verbex.com/

Lotec Speech Recognition Package

* Platform: Sun
* Description: Public domain speech recognition software. Operates
from input in Sun audio format (.au files) and outputs word
hypotheses and time labelling data. The software includes programs
to collect speech samples, a labeller, a "featurizer" which
parameterises speech files, a word spotter and the recogniser. The
software can real time recognition on a Sparc 10 for small
vocabularies.
* Requirements: Sun SPARC audio input and a "decent" microphone Sun
multimedia demo software (in /usr/demo/SOUND) and X.
* Availability: By anonymous ftp
ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
* Contact: Nigel Ward: _nigel@sanpo.t.u-tokyo.ac.jp _

Macintosh Speech Recognition Manager

* Platform: Macintosh
* Description: supports developers who wish to add speech

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (48 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

recognition to existing Macintosh applications. Provides speaker


independent recognition and robustness to noise. Apple's Speech
home page provides developer information and the complete speech
recognition and synthesis synthesis SDKs. The recognition SDK
includes samples code, control panels, interfaces, documentation
and the recognizer.
* Availability: under licensing conditions from the Macintosh Speech
Developer's page
http://www.speech.apple.com/speech/dev/dev.html.
* Requirements: Power Macintosh with 16-bit sound, System 7.5, and a
PlainTalk Microphone or equivalent
* Cost: Free
* See also: Macintosh Plaintalk and Speech Manager (Q5.5).
* Note: Check out Kevin Lenzo's list of Macintosh Speech
Applications.
* Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com

Microsoft Speech Recognition

Microsoft Dictation Research Demonstration

* Platform: Windows 95 or Windows NT 4.0


* Description: A free demonstration of research technology that
enables a computer to transcribe what you speak into Windows
applications such as email and word-processors. Features of the
demo software include:
+ 60,000 word vocabulary with the ability to add new words
+ High recognition accuracy
+ Works with any Windows 5application
+ "Dictation Pad" provides enhanced dictation features
+ "IntelliSense" converts spoken numbers and times
automatically
+ Compatible with the Microsoft Speech API
* Requirements: Windows 95 or Windows NT 4.0, Pentium 90 or better
(RISC builds are available), 16 megabytes of RAM on Windows 95,
Sound card with 16 kHz 16 bit input signals, High quality
close-talk microphone, Speakers.
* Availability: Free demo software is available at:
http://www.research.microsoft.com/research/srg/install.htm
* More information: http://www.research.microsoft.com/research/srg/

Microsoft Command and Control Engine

* Platform: Windows 95
* Description: Provides command and control speech recognition using
SAPI (the Microsoft Speech API) and "Whisper", Microsoft's speech
recognition technology. Features include:
+ Speaker independent, continuous, sub-word modeling, context
free grammars
+ Has its own letter-to-sound rules means it can recognize any
words in a grammar.
+ North American English
+ PC microphone and telephone speech recognition with high
performance
+ Word spotting option
+ Results objects containing top-N choices, segmentation, and
confidence
+ Written to SAPI, the Microsoft Speech API.
* Requirements: Windows 95 or Windows NT 4.0, Pentium 60 or better.
(RISC builds are available), 1.5 megabyte working set, 16 kHz or 8
kHz input signals, 6 megabytes on disk, Requires Microsoft Speech

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (49 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

SDK to use.
* Availability: Free demo software is available at:
http://www.research.microsoft.com/research/srg/install.htm
* More information: http://www.research.microsoft.com/research/srg/

Myers' Hidden Markov Model software

* Platform: Unix
* Description: Hidden Markov model software for automatic speech
recognition. C++ code that implements a basic left-right hidden
Markov model and corresponding Baum-Welch (ML) training algorithm.
It is meant as an example of the HMM algorithms described by
L.Rabiner and others. The code was built in order to learn how HMM
systems work and we are now offering it to the net so that others
can learn how to use HMMs for speech recognition. Keep in mind
that ease of understanding was our primary concern, not
efficiency. The code can be used to build an experimental speech
recognition systems using "train_hmm" and "test_hmm", and can be
used in conjunction with written tutorials on HMMs to understand
how they work.
* Availability: By anonymous ftp from the comp.speech archive site.
There are two files in the directory
+ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
The files are
+ hmm.README
+ hmm-1.03.tar.gz
* Contact: Richard Myers: rmyers@isx.edu

NCC Dictate

* Platform: Windows
* Description: NCC Digital DictateTM is an add-on, enhanced
interface for use with IBM's VoiceType(TM) Dictation for Windows
and various Windows 3.1 applications (e.g. MS Word, WordPerfect).
Digital DictateTM provides faster corrections and dictation rates
and various other features. This version is not a stand alone
product; it requires VoiceTypeTM Dictation to provide the speech
recognition engine and the Windows application. Features include:
+ Direct dictation into Windows applications with access to all
functions while dictating.
+ Versions for MS Word, WordPerfect, Ami Pro, and other Windows
applications.
+ Speech enabled editing.
+ Capability to save speaker models and defer corrections.
+ Microphone "pause and restore" functions controlled with
speech commands.
+ Add-on vocabularies for legal, medical, science and business.
+ SWITCH-ITTM foot pedal control or CardSwitchTM infrared
wireless control available which switch between dictation and
proofing/correction modes.
* Requirements: IBM's VoiceTypeTM Dictation for Windows; a computer
system meeting VoiceTypeTM Dictation for Windows requirements;
VoiceTypeTM Dictation Adapter.
* Availability: Through computer dealerships.
* Price: $US295
* Contact: NCC Incorporated
5808 E. Turquoise, Scottsdale, AZ 85253
Ph: (602) 922-6236 Fax: (602) 596-9050

NICO Artificial Neural Network Toolkit

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (50 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Platform: UNIX (ANSI C source code)


* Description: The NICO Toolkit is an artificial neural network
toolkit specifically designed and optimized for automatic speech
recognition applications. Networks with both recurrent connections
and time-delay windows are easily constructed. The network
topology is flexible -- any number of layers is allowed and layers
can be arbitrarily connected. Tools for extracting input-features
from the speech signal are included as well as tools for computing
target values from standard phonetic label-files.
* Availability: Through the NICO homepage
(http://www.speech.kth.se/NICO/index.html)
or the download page.
* Contact: Nikko Strom, nikko@speech.kth.se

Nuance Speech Recognition System

* Platform: UNIX-based workstations including Sun and SGI.


* Description: The Nuance Recognizer features client-server
architecture with multiple recognizers available on a single
processing platform. Primarily developed for telephony-based
applications, the system accepts speaker-independent, continuous
speech and supports very large vocabularies. Included is a
"template matching" natural language capability for identifying
the meaning of speech. A toolkit is available for use in
developing a wide variety of speech recognition applications.
* Price and availability: Contact Nuance
* Contact: Nuance Communications
333 Ravenswood Ave., Building 110, Menlo Park, CA 94025, USA
Ph: +1-415-462-8200, Fax: +1-415-462-8201
WWW: http://www.nuancecom.com/

OKI VRP6679 - Voice Recognition Processor

* Platform: Intergrated circuit.


* Description: Speech recognition IC. 25 words max. Speaker
independent recognition capability. Recognition rate quoted as 97%
in a noisy environment (e.g. a car).
* Misc: Alias MSM6679
* Misc 2: More information is provided in Jean-Pierre Lereboullet's
document on Voice Recognition Processors.
* Cost: Approx US$20. Demo board $876
* Availability: OKI Semiconductor and OKI Distributors
Corporate Headquarters
785 North Mary Avenue, Sunnyvale, CA, 94086 2909
Tel: (408) 720 1900, Fax: (408) 720 1918

Phonetic Engine 500 (PE500) from Speech Systems, Inc.

* Platform: Windows
* Description: Speaker independent, 40,000 word vocabulary,
continuous speech recognition for MS Windows. Grammars with high
perplexity possible. Includes noise rejection. Uses proprietary
DSP board.
* Cost: Prices in US$ - quantity one. The PE500 SDK is $995.00
including board, microphone, and runtime software. Runtime only is
$595.00. SpeechWizard(r) adds speech input to existing Windows
applications, $295.00. Two-day training: $295.00 with purchase,
$595.00 without.
* Misc: The user defines the grammar of allowed utterances and must
write software to invoke the board driver functions that control
recognition. The user must also write software to

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (51 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

collect/parse/interpret the ASCII text strings returned when


recognition succeeds.
* Misc 2: SSI now offers speech application development services.
* Contact:

Speech Systems, Inc.


2945 Center Green Court South
Boulder, CO 80301-2275, USA
Tel: 303.938.1110 Fax: 303.938.1874
http://www.speechsys.com

Philips Speech Recognition (2 products)

SpeechMagic: Dictation

* Platform: Windows 3.1 and higher


* Description: A continuous speech recognizer providing a 64,000
word vocabulary, speaker adaptation and multiple languages.
SpeechMagic is currently available for English and German.
SpeechMagic acts as a server application, processing speech input
and providing text output. Uses an add-on ISA compatible
recognition accelerator board. SpeechMagic provided a correction
editor, editing and playback of recordings, and a vocabulary
manager for entering new words, abbreviations, macros and special
transcriptions (e.g. for foreign words). Windows DDE support and a
native API are provided for integration.
* Hardware Requirements: IBM compatible personal computer (486DX/ 66
MHz or higher), minimum 16 MB of RAM, hard disk capacity > 500 MB,
and a Philips LFH 6210 Accelerator Board.
* More Information: For more information visit the SpeechMagic WWW
page or the Philips Speech home page.

Speech Processing System 6000s (Europe only)

* Description: Dictation of medical findings using continuous speech


recognition. Designed for German speaking radiologists and
encompasses the complete radiology vocabulary. The authors use
dictation stations (PCs) which are fitted with microphones. The
transcriptionists use editing stations (also PCs) which are
additionally fitted with headphones and footswitches. The SP6000s
has a single speech recognition unit serving all users, and it
offers automatic data transfer as well as the advantages of
digital dictation functions. For more information visit the
Philips SP6000s WWW page.
* More Information: For more information visit the Philips SP6000s
WWW page or the Philips Speech home page.

Dragon PowerSecretary

* Platform: Apple
* Description: Information moved to the page on Dragon Dictation
products including Dragon PowerSecretary
(Previously Articulate PowerSecretary.)

ProNotes Voice Tools

* Platform: Windows
* Description: ProNotes Voice Tools are designed to bring the speech
recognition capabilities of the IBM VoiceTypeTM Dictation System
for Windows into any program without the need for the programmer
to directly interface with the speech engine at the API level.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (52 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

There are five tools, as described below, which are all available
in three forms: Visual Basic(TM) Custom Controls (known as VBXs),
16-bit OLE Custom Controls, and 32-bit OLE Custom Controls. The
tools are intended for use by Windows(TM) developers working with
Windows 3.1(TM), Windows for Workgroups 3.11(TM), Windows NT 3.51
Workstation(TM), and Windows 95(TM). The custom controls can be
utilized with any application development environment which
supports the use of such controls (e.g. Visual Basic and Visual
C++).

Playback and Record


An object which allows developers to use the IBM Speech
Engine to record and play back sound files. Can be used
to add voice prompts and to allow end users to record and
playback sound files.

Voice Button
An object having standard button properties and behavior,
which can additionally be controlled by voice. The button
can also be used as a label or a 3D panel.

Dictation Window
A text box that allows free dictation, voice macro
utilization, and correction by voice. Each Dictation
Window has access to global and context sensitive
vocabularies for both command and dictation. There are
three correction modes.

Voice List Box


Has standard list box properties and behavior, but can
additionally be controlled by voice. A user can select
items by pronouncing the entry's text or the entries can
be numbered and selected accordingly.

Voice Navigator
Provides navigation by voice within an application
developed with the Voice Tools, between voice-enabled
objects described above, as well as some standard objects
found within the application.

* Requirements: Hardware: 80486/33 DX or higher, 60MB hard disk


space for IBM VoiceType Dictation software, 10MB hard disk space
for ProNotes Voice Tools, 3.5" floppy, VGA (or compatible), 16MB
RAM, IBM VoiceType Dictation adapter, microphone, and speakers.
Software: DOS version 6.0 or later, with SHARE.EXE running,
Windows 3.1 or later, IBM VoiceType Dictation software, any
programming environment or system compatible with Visual Basic or
OLE Custom Controls.
* Price: Unknown
* Contact: Pronotes, Inc.
1546 Magee Avenue, Philadelphia, PA 19149, USA
Ph: 800-70-NOTES or +1-215-533-8569, Fax: +1-215-533-1276
Email: proinfo@pronotes.com
WWW: http://www.pronotes.com/

PureSpeech 2.0 Recognition Engine

* Platform: Windows 3.1, Windows 95, Unix, Dialogic Antares DSP


* Description: Speaker-independent, continuous speech, large active
vocabulary speech recognition engine for American English, UK
English, French, German and Spanish. Permits on-the-fly additions
to the vocabulary using phonetic models and telephone or wideband
microphone input. Flexible grammar, natural language processing,
discourse models. Software only with a small RAM/CPU footprint.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (53 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Can be used as a voice user interfaces (VUI's) for PC software


applications. Can also be used for high-volume call center
telephony, especially in banks, finance and other specialized
applications.
A toolkit for the Dialogic Antares is available.
* Availability: PureSpeech is not available as a stand-alone
product. It is available embedded in Windows-based software or as
a toolkit.
* Contact: PureSpeech, Inc
100 Cambridge Park Drive, Cambridge, MA 02140, USA
Ph: (617) 441-0000 Fax: (617) 441-0001
Email: amy@speech.com
WWW: http://www.speech.com/

recnet

* Platform: UNIX
* Description: Speech recognition for the speaker independent TIMIT
and Resource Management tasks. It uses recurrent networks to
estimate phone probabilities and Markov models to find the most
probable sequence of phones or words. The system is a snapshot of
evolving research code. There is no documentation other than
published research papers. The components are:
+ A preprocessor which implements many standard and many non-
standard front end processing techniques.
+ A recurrent net recogniser and parameter files
+ Two Markov model based recognisers, one for phone recognition
and one for word recognition
+ A dynamic programming scoring package. The complete system
performs competatively.
* Cost: Free
* Requirements: TIMIT and Resource Management databases
* Contact: Tony Robinson: _ajr@eng.cam.ac.uk_
* Availability: by anonymous ftp

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/r
ecnet-1.3.tar.Z

Sensory Inc. Integrated Circuits

* Platform: Integrated circuits


* Description: Sensory's low cost high quality Interactive Speech
line of speech recognition IC's are designed for consumer
telephony products, portable consumer electronics, and other
consumer applications. Technologies available include speech
recognition (speaker-independent and speaker-dependent), speaker
verification, speech/music synthesis, digital record/playback, and
general product control on one chip. Development tools and
demonstration units are available. Detailed product information on
the Interactive Speech chips is available from the Sensory
Circuits WWW site.
* Contact: Sensory, Inc.
521 E. Weddell Drive, Sunnyvale, CA 94089
Ph: +1-408-744-9000, Fax: +1-408-744-1299
Email: Sales@SensoryInc.com
WWW: http://www.sensoryinc.com/

Simon Says (NeXT)

* Platform: NeXT
* Description: Provides the ability to link commands to spoken

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (54 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

phrases.
* Availability:By anonymous ftp.
Simon Says demo
ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
/audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz
Readme file
ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
/audio-apps/SimonSaysDemo.1.5.1.README
* Contact: Metrosoft
710 13th Street, Suite 310 X, San Diego, California 92101
Ph: 619.488.9411 Fax: 619.488.3045
Email: info@metrosoft.com [NeXTmail welcome]

smARTspeak from Advanced Recognition Technologies, Inc.

* Platform: Windows, Windows 95, DOS, and General Magic


It also works on the following Processors/Microcontollers: Intel's
80 x 86, Intel's 8031, 8051, Motorola's 68000, and Hitachi's SH1,
SH3, SH8.
* Description: smARTspeak is suited to voice command and control
applications, such as voice dialing in cellular and desktop
telephones, or voice command operation in computers and multimedia
products. It uses a compact (10KB size on 16 bit machines), fast,
user dependent recognition engine.
smARTspeak can recognize any language in any accent.
ART recently completed a Software Developer Kit (SDK) for
smARTspeak, running under Windows 3.1 or higher which allows the
voice recognition engine to be used within Windows Applications.
More detailed information on smARTspeak and the SDK is available
on the ART WWW pages.
* Availability: Currently liscensed to other equipment manufacturers
(OEMs), system integraters, software, and application developers,
and value added resellers (VARs) who port are technology into
their product.
* Contact: Advanced Recognition Technologies, Inc.
International Office:
43 Brodezky Street, POB 39918, 61398 Tel Aviv, lsrael
Ph: 972-3-642-7242, Fax: 972-3-642-5887
Email: 100274.3223@Compuserve.com
WWW: http://www.artcomp.com/
US Office:
9574 Topanga Canyon Blvd. Chatsworth, CA 91311, USA
Ph: 818-678-3999, Fax: 8181-678-3994
WWW: http://www.artcomp.com/

Speech Commander - Verbex Voice Systems

* Platform: Various: external hardware with serial port connection


* Description: A hand-held (portable) device about the size of a
paperback book which provides speaker-dependent continuous speech
recognition. The active vocabulary is dependent on the model
chosen and can vary from 300 to 10,000 active words. The device
connects through a serial port, so it can be connected to a wide
range of computers. It comes with a battery pack.
* Contact: Verbex Voice Systems
1090 King Georges Post Rd., Bldg 107,
Edison NJ 08837, USA
Ph: (908) 225-5225, Fax: (908) 225-7764
Email: sales@listen.verbex.com
WWW: http://www.verbex.com/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (55 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

'Speech Recognition Expert' Toolkit for Windows

* Description: Provides an object-oriented development tool designed


to rapidly build speech enabled applications without writting
source code. Currently supports IBM's VoiceType Application
Factory. Future versions to support other platforms. Includes
BlackBox library and Custom Grammar Tools.
* Requirements: Layout for Windows from Objects, Inc.
* Price: $US349 + Shipping/Handling
* Contact: Speech Technologies, Inc.
P.O. Box 3905
Naperville, IL 60567-3905
CompuServe @102147,3521
Ph: (708)983-7634

Visual Voice from Stylus Innovation

* Platform: Microsoft Windows


* Description: Visual Voice is a toolkit for building Windows-based
voice processing and telephony applications including interactive
voice response (e.g. touch-tone banking), fax-on-demand, and voice
mail. Visual Voice can be used to add voice recognition to your
telephony applications.
Voice Recognition (VR) Support for Visual Voice is exposed as a
standard VBX control and provides one or more voice recognition
"resources" to your application. Applications can dynamically
assign resources across several voice lines. Voice recognition is
either "discrete" or "continuous". Discrete recognition is
slightly more accurate and requires the speaker to pause briefly
between words. Continuous recognition provides a natural way to
enter information by speaking without pauses. Three configurations
are supported:

Software-Only Solution
The software only solution uses Telaccount's SpeechEasy
technology for discrete recognition using your PC's CPU.
A vocabulary is included with digits, basic command words
and more.

Hardware-Assisted Solution with Dialogic AEB boards


Discrete voice recognition in over 25 languages using
Dialogic D/41D voice boards and the Dialogic VR/40 board.
Vocabularies are included with digits, basic command
words, voice mail vocabulary and more.

Hardware-Assisted Solution with Dialogic PEB boards.


Use the VR control with any Dialogic PEB-based voice
board, such as the D/12x or D/24x, to access voice
recognition resources from your phone lines. This
requires a Dialogic VRP board with either 1 to 4 VRM/40
modules (4 channel discrete voice recognition modules)
and/or 1 to 4 VRM/2C modules (2 channel continuous voice
recognition modules). You can have up to 4 modules on
each VRP: 4 VRM/40s for 16 channels of discrete voice
recognition; 4 VRM/2Cs for 8 channels of continuous
recognition; or a combination. Over 25 languages
supported. Includes vocabularies as described above.

* Pricing: Unknown
* Availability: From Stylus Innovations Inc. or from the
distributors listed on the Stylus WWW pages.
* Misc: More detailed technical information, slide show
demonstration software is available on the Stylus home page.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (56 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* Contact: Stylus Innovation Inc.


One Kendall Square, Building 300, Cambridge, MA 02139
Ph: (617) 621 9545, Fax: (617) 621 7862
WWW: http://www.stylus.com/
Compuserve forum: GO STYLUS
Email: info@stylus.com

Voice Command Line Interface

* Platform: Amiga
* Description: VCLI will execute CLI commands, ARexx commands, or
ARexx scripts by voice command through your audio digitizer. VCLI
allows you to launch multiple applications or control any program
with an ARexx capability entirely by spoken voice command. VCLI is
fully multitasking and will run in the background, continuously
listening for your voice commands even while other programs are
running. Documentation is provided in AmigaGuide format. VCLI 6.0
runs under either Amiga DOS 2.0 or 3.0.
* Requirements: Supports the DSS8, PerfectSound 3, Sound Master,
Sound Magic, and Generic audio digitizers.
* Availability: by ftp from wuarchive.wustl.edu in the file
systems/amiga/incoming/audio/VCLI60.lha and from
amiga.physik.unizh.ch as the file pub/aminet/util/misc/VCLI60.lha
* Contact: Author's email is RHorne@cup.portal.com

Voice Control Systems Continuous Speech Recognition

* Description: Voice Control Systems (VCS) continuous speech


recognition is a proprietary phonetic recognizer based on
technology developed at VCS over the last 17 years. It is robust
for applications such as the "hands-free" automotive environment
or telephone networks, both wireless and wireline. VCS speech
recognition is used by many developers and manufacturers in
telecommunications. VCS technology is a software-based capability
which VCS has currently developed for a limited number of
processing environments. VCS offers "off-the-shelf" capabilities
for the TI-C3X and C4X DSPs with other hardware platform support
planned for the future. As a benchmark, today's VCS continuous
technology requires about 1/2 of a 33Mhz TMS320C31. VCS continuous
technology is available in cellular and wireline based libraries
for continuous digit input in approximately 15 languages. VCS
continuous recognition is a modified HMM decision strategy built
upon the foundation of VCS phonetic "front end".
* Availability: VCS continuous technology is available today in
software form from VCS or implemented in hardware or speech
systems from VCS distributors including Dialogic Corporation,
Brite Voice, Intervoice, Periphonics, and Syntellect.
* Cost: Software royalties are volume based and range from per unit
costs of $500 per recognizer to less than $5 in large quantities.
* See also: the VCS Phonetic Dictionary Recognizer and VCS Isolated
Word Speech Recognition below, and the VCS 2030 & 2060 Voice
Dialers.
* Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Voice Control Systems Phonetic Dictionary Recognizer

* Description: This recognizer is based upon a HMM type recognition


strategy coupled with the VCS "front end" (feature extraction

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (57 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

software). The HMM modeling is based upon the basic phonetic


building blocks in each language. In American English this is
approximately 43 units. The recognition vocabulary is built up by
combining these units into word models. By building the words in
this way new recognition vocabularies may be constructed. The
phonetic assembly can also be used for "word spotting" recognition
libraries.
* Platform: This VCS recognition software runs on the TI TMS320C30
DSP. Two recognizers can operate on a single 55mhz C30. Currently
the software may be purchased as an Enhanced Technology from VCS
to run on the Dialogic VR/160p speech recognizer board. The
hardware is purchased from Dialogic, with the "Enhanced" software
purchased from VCS. Up to four phonetic recognizers can run on a
single 160; one per VRM2C (C30-33mhz DSP) daughtercard.
* Note: This recognizer is in its late "beta" stage of development
and is available for U.S. English vocabularies. Other languages
are presently under development.
* Price: VCS software is priced at $350 per recognizer for unit
quantities with volume discounts available.
* See also: VCS Continuous Recognition above, VCS Isolated Word
Speech Recognition below, and the VCS 2030 & 2060 Voice Dialers.
* Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Voice Control Systems Isolated Word Speech Recognition

* Description: Voice Control Systems (VCS) isolated word recognition


using VCS phonetic recognizer technology. It is robust in
demanding environments such as the "hands-free" automotive
environment, telephone networks, wireless or wireline.
Capabilities include speaker-independent, speaker-dependent and
speaker-adaptive recognition. Libraries are available for 45+
languages and custom vocabulary development services are
available. The technology is suited for many applications
including:
+ Desktop computing: such as keyboard accelerators
orinteractive multimedia.
+ Network telephony: such as automating operator functions or
voice dialing.
+ Computer telephony: such as remote access to a personal
computers.
+ Automotive accessory control: such as voice activated
cellular phones or other automotive accessories.
+ Consumer electronics: such as voice controllers for video
games or VCRs and televisions.
* Platform: Include Intel-X86, TI-C5X, C3X, C4X and C2X, OKI 6679,
and NEC-V20 and V30, and can operate on 16 bit microcontrollers.
As a benchmark, 8 recognizers can run on an Intel 486-33 DX.
* Availability: The technology is available under software licenses
direct from VCS or by purchasing hardware from an OEM. VCS OEMs
include: Dialogic, Oki Semiconductor, Intervoice, Periphonics,
etc.
* Cost: VCS isolated word recognition software is available under a
volume pricing license agreement. Small quantity royalties are in
the $500.00 per recognizer range while large (millions) quantity
royalties are less than $1.00 per recognizer.
* See also: VCS Continuous Speech Recognition and VCS Phonetic
Dictionary Recognizer above, and the VCS 2030 & 2060 Voice
Dialers.
* Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (58 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Visus SpeechKit

* Platform: NeXT
* Description: SpeechKit is based on SPHINX, a speaker-independent,
1000 word or so, continuous speech recognition system which allows
you to incorporate speech recognition into your applications. You
can design your vocabulary and grammars.
* Contact: Visus - no address or phone provided. A possible contact
is Robert Brennan at Carnegie Mellon University. email:
Robert_Brennan@cmu.edu

VCS 2060 Voice Dialer

VCS 2030 Voice Dialer

* Platform: Stand-alone hardware, TMS320C5X based with VCS phonetic


speech recognition and CELP speech compression.
* Description: The VCS 2060 is a telephone dialing system which
recognizes 50 names - and speed dials the associated telephone
number. The VCS 2030 has 20 memories. Users use
speaker-independent recognition to select the "call", "program",
or "list" menu, then place a call, enroll a new memory, or listen
to playback of entries in the phonebook. Enrollment is simple and
includes a "name tag" enrollment pass so that when one selects an
entry to call, the selection is confirmed by repeating the
memory's associated name tag, e.g. "calling Pete". The system uses
both speaker-independent and speaker-dependent technology from
Voice Control Systems, Inc.
* Installation: The VCS 2060 can be installed in series (RJ-11) with
one phone for single phone operation or installed in parallel
(RJ-31) to provide voice dialing from every phone in a house.
* Cost: Standard retail prices:
+ VCS 2030 Voice Dialer - $269.00
+ VCS 2060 Voice Dialer - $299.00
* Availability: From catalogs or direct from Voice Control Systems.
Voice Control Systems
14140 Midway Rd., Dallas, Tx. 75225, USA
Ph: 800-VCS-7525, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Voice-Trek 2.0

* Platform: Unknown.
* Description: VoiceTrek is primarily used by the United States
Postal Service to sort mail. Tardis Technology Inc. was created to
develop and market applications that utilize speech recognition.
They do consulting work as well as turnkey systems.
* Contact: Tardis Technology Inc., Voice Recognition Div.
6444 E. Spring St., #286, Long Beach, CA 90815-1500, USA
Phone: +1-310-497-0077, Fax: +1-310-497-0080

VoiceAssist for Windows from Creative Labs, Inc.

* Platform: Windows
* Description: Seeking a description.
* Availability: VoiceAssist preview software is available from the

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (59 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

Creative Labs VoiceAssist home page.


* Contact: Creative Labs, Inc.
Ph: 1-800-998-1000 (Sales)
Ph: 1-800-998-5227 (Product info and dealer referrals)
CompuServe: support forum: GO BLASTER
WWW: http://www.creaf.com/

VoiceServer for Windows

* Platform: Windows
* Description: Speaker dependent, each with an independent
directory. Isolated words. Up to 1000 words/user, 300
words/window. 1 word occupies 2Kb on hard disk. Can be used to
control Windows applications by issuing voice commands instead of
menu selection.
* Rough Cost: 292 Pounds(UK)
* Requirements: None
* Misc: Price includes a half-sized AT voice card (including a DSP),
software, documentation & a microphone (attachable to keyboard or
speaker). A light-weight high-spec headset is an optional extra.
* Contact:

Mark Redwood
Applied Voice Technologies
26 Danbury Street, Islington,
London, UK, N1 8JU
Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225

Voicetek Corp.

* Platform: Unknown.
* Description:Voicetek Corporation provides voice processing
solutions, training and consulting services and an
object-oriented, graphical Generations Platform for development of
integrated computer telephony systems.
* Contact: Voicetek Corporation
19 Alpha Road, Chelmsford, MA 01824, USA
Ph: +1-508-250-9393, Fax: +1-508-250-9378
WWW: http://www.voicetek.com/

Votan VPC2100 Voice Card and VSP 1010 Speech Processor

* Platform: DOS
* VPC2100 Voice Card: a hardware and software system based on the
TMS320C10. providing continuous speech recognition. The VPC2100
consists of a circuit board, microphone, speaker, software, and
documentation. It is designed to add voice I/O and telephone
management capabilities to the PC/AT and compatibles. Features:
+ Voice store-and-forward at 4- to 16.4-Kb/s speed
+ Speaker-independent speech recognition (0-9, YES, NO)
+ Continuous speaker-dependent speech recognition
+ Telephone interface, pulse or tone dialing, call progress,
and DTMF
+ Software for development, voice mail, telephone management,
and VoiceKey
+ High-level applications-generator software
* Votan VSP 1010 speech-processor board: can service a single voice
channel, providing recognition, voice output, and telephone
interfacing. Digital signal processing is performed by a TMS320
integrated circuit.
* Costs: Unknown

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (60 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* WWW: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
* Contact: Votan Division, MOSCOM Corporation
6920 Koll Center Parkway, Suite 214, Pleasanton, CA 94566, USA
Ph: +1-510-426-5600, Fax: +1-510-426-6767

Voice Processing Corporation Speech Recognition Product Line

* Platform: Unknown.
* Description: Voice Processing Corporation (VPC) supplies automated
speech recognition systems. VPC's products are used in the
telecommunications, cellular and personal computer markets to
enable computers to understand human speech. The company's VPro
product line is sold to original equipment manufacturers (OEMs),
value added resellers (VARs), system integrators and application
developers. VPC's speech recognition systems are currently used in
applications such as voice mail, voice activated dialing,
interactive voice response, and command and control of personal
computers.
The following are descriptions of the Voice Processing
Corporation's VPro Product Line: VProContinuous, VPro/XD, VPro/RT,
VProCel, VProSpeller, VProPRL, VPro hardware platforms, and the
application Osprey.
More information is available on these products at the VPC WWW
site: http://www.vpro.com/
* VProContinuous(TM) is a speaker-independent, continuous digit
recognizer. It recognizes digit strings spoken in a continuous
manner, by any caller, without unnatural beeps or pauses.
VProContinuous uses out-of-vocabulary rejection and word spotting
technologies to reject extraneous words and phrases often spoken
by callers. The VProContinuous vocabulary consists of the words
"zero" through "nine," "yes," "no," and "oh." The product is
language-independent. American English, Australian English,
Brazilian Portuguese, Canadian French, Castilian Spanish, French,
German, Italian, Mexican Spanish, Portuguese, Swiss German and
U.K. English versions are available.
* VPro/XD(TM) is a discrete or multiword speech recognizer for
extra-demanding applications and/or vocabularies. This robust
discrete product recognizes isolated discrete utterances (words or
very short phrases). VPro/XD utilizes proprietary
out-of-vocabulary rejection and word-spotting technologies.
VPro/XD is speaker-independent and includes Talkover capability
allowing speech-interrupt over prompts. Pre-trained vocabulary
libraries are available in American English, Australian English,
Brazilian Portuguese, Canadian French, Castilian Spanish, Central
American Spanish, German, Italian, Mandarin Chinese, Mexican
Spanish, Portuguese, Swiss German and UK English. Pre-trained
vocabularies consisting of voice mail words, voice dialing words,
call control words, banking, and emergency words are available in
American English (both cellular and land-line).
* VPro/RT(TM) is a discrete speech recognizer for rapid training of
vocabularies in the field. This robust discrete product recognizes
isolated discrete utterances. Application designers and end-users
define the vocabulary of their choice and train the system in
real-time either prior to system start-up, or adapting on-the-fly
while the system is running live. Vocabularies can be subset, and
applications involving thousands of words can be developed
quickly. VPro/RT, which also supports Talkover, is suited to
speaker-dependent recognition tasks, such as the personal
directory of names in a voice-activated dailing application.
VPro/RT is also good for applications that require
speaker-independent vocabularies to be developed quickly in the
field or those that require many vocabularies. VPro/RT can also be
used as a tool for quick prototyping of applications.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (61 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

* VProCel consists of speaker-independent VProContinuous, VPro/XD


and speaker-dependent VPro/RT specifically tuned for the cellular
environment. The speaker-dependent discrete feature of VProCel
allows for a user-defined 20-word personal directory, with a
one-pass enrollment whereby users need only speak their chosen
commands once. In addition, cellular-ready VPro/XD vocabularies
consisting of voice-activated dialing command words are also
available. VProCel is suited to voice-activated dialing
applications using either digit strings or a listing of words in a
personal directory.
* VProSpeller is a recognizer that can determine which name or word
is being spelled by a caller. Users may spell a string of letters
(up to 32 letters) in an uninterrupted manner (without prompts or
beeps between each letter). VProSpeller can recognize confusable
letters by conducting an automated search of a database of words
maintained by the application for the best candidates to match.
* VProPRL Designed for customers who wish to enable VPC speech
recognition technologies on platforms other than those supported
by VPro hardware, the VProPRL is a portable recognizer library of
VProContinuous, VPro/XD and VPro/RT, which can be embedded into a
wide variety of hardware platforms. It consists of a library of
object modules which can be linked with a user application or
task.
* VPro Hardware Platforms: VPro-42, VPro-84, VPro-88 : The VPro
platforms are ISA compliant PC/AT boards. Each supports four to
eight Virtual Speech Processors (VSPs). Each VSP, depending on
load factors, can handle multiple telephone lines. Application and
host computers communicate with each of the VSPs as separate
autonomous units. VPro platforms use Texas Instruments TMS320C31
microprocessors which provide up to 133 MFLOPS of compute power.
The platforms can have up to 8 megabytes of memory shared among
all processors. In addition, each processor has 512K bytes of
local memory. Both the PEB and MVIP PCM audio buses are supported
by all VPro platforms.
* Osprey is a call management software application that performs the
kinds of telephone related activities typically done by a personal
assistant, such as answering the phone, screening callers, routing
calls, and taking and delivering messages. It is an automated
phone attendant.
* Price and availability: Contact Voice Processing Corporation
* Contact: Kelli V. Smith

Voice Processing Corporation


1 Main Street, Cambridge, MA, 02142 USA
Ph: (617)494-0100 Fax: (617)494-4970
e-mail: KSmith@vpro.com
WWW: http://www.vpro.com/

Whisper

See the new page for Microsoft speech recognition software.


* Platform: Windows 95 and Windows NT 4.0
* Description: Command and control recognition.

WildCard Speech Products

* Platform: Windows 3.1 and Windows 95


* OfficeTalk for Windows: provides voice commands for dictation,
navigation, command and control, and formatting for business uses
of computers. Provides user voice access to a wide variety of
software applications in office suites from Microsoft,
Novell/WordPerfect, and Lotus. More information on the WildCard

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (62 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

OfficeTalk page.
* LawTalk for Windows: adds features and interfaces that meet the
specific needs of legal users. More information on the WildCard
LawTalk page.
* VoiceCompanion for the Internet: Surf the net using voice
commands. Controls browsers like Netscape and Microsoft Explorer.
More information on the VoiceCompanion web page.
* VoiceCompanion - RemoteAccess: Over the telephone remote access to
your desktop PC, for voicemail, FAX forwarding and address book
information. More information on the VoiceCompanion web page.
* Availability: WildCard Technologies Inc.
180 West Beaver Creek Road, Richmond Hill, Ontario, Canada L4B 1B4

Phone: (905) 731-6444, Fax: (905) 731-7017


Email: sales@wildcardtech.com
WWW: http://www.wildcardtech.com/

___________________________________________________________________________

Q6.6: Speaker Recognition (Verification and Identification)

* Introduction
* In the FAQ
* On the WWW

Introduction

Speaker recognition is the process of automatically recognizing who is


speaking on the basis of individual information included in speech
signals. It can be divided into Speaker Identification and Speaker
Verification. Speaker identification determines which registered
speaker provides a given utterance from amongst a set of known
speakers. Speaker verification accepts or rejects the identity claim
of a speaker - is the speaker the person they say they are?

Speaker recognition technology makes it possible to a the speaker's


voice to control access to restricted services, for example, phone
access to banking, database services, shopping or voice mail, and
access to secure equipment.

Both technologies require users to "enroll" in the system, that is, to


give examples of their speech to a system so that it can characterise
(or learn) their voice patterns.

In the FAQ:

* ImagineNation: Voice Activated UnLock Technology


* Jialong He's Speaker Recognition (Identification) Tool
* Keyware Biometric Security Products
* SpeakerKey Voice Verifier from ITT
* SpeakEZ Voice Print Speaker Verification
* Voice Control Systems: Speaker Verification Technology

On the WWW

Survey of the State of the Art in Human Language Technology


Report edited by Ronald A. Cole et. al. with a section on
Speaker Recognition.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html

Speaker Identification And Verification: LIMSI Report


A technical description.
http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html

Long Index of References on Automatic Speaker Verification


A list of more than 350 papers on speaker verification in text

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (63 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

or BibTeX format. Provided by G.Matas.


http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html

CAVE: Caller Verification in Banking and Telecommunications


European consortium developing speaker recognition
technologies.
http://www.ptt-telecom.nl/cave/

Hangai Lab demonstrations of speaker verification and speaker


identification.
Do it yourself demonstrations:
http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech1.html
http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech2.html

Voice Activated UnLock Technology (VAULT): ImagineNation

* Description: Password-based voice verification technology using a


card to store voice-print data. Introductory information and the
VAULT FAQ are provided on the ImagineNation WWW pages.
* Contact: Imagine
PO Box 212, Swansea, MA 02777, USA
Ph: +1-508-678-9563
Fax: 508-678-1470
Email: feedback@ImagineNation.com
WWW: http://www.ImagineNation.com/

Jialong He's Speaker Recognition (Identification) Tool

* Platform: SUN SPARC (SunOS), PC (MSDOS)


* Description: This package contains a set of speaker recognition
research utilities, including Gaussian mixture models, VQ codebook
designing program and MLP network. They can also be used as
general classifiers. The utilities are divided into the following
categories:
+ Feature extraction and dimensional reduction
cepstrum -- extract features from speech sigals (LPCC, MFCC,
etc.).
search -- select effective features (SFS, SBS method).
randline -- randomize the a sequence, auxiliary utility.
bin2asc -- binary to ASCII, auxiliary utility.
+ MLP network
mlptrain -- MLP network training program.
mlptest -- MLP network test program.
+ VQ codebook training and test programs
lbglvq -- VQ codebook training program.
nearest -- VQ codebook test program.
+ Gaussian mixture model (GMM)
gmmtrain -- GMM training program.
gmmtest -- GMM test program.
Note: this is a research tool not a true speaker recognition
system.
* Availability: By anonymous ftp:

MSDOS Version
UK:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pkrtool.zip
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.z
ip

Sun SPARC version, compiled with GNU C


UK:

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (64 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pkr_sun_v1.tar.gz
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_su
n_v1.tar.gz

* See also: Jialong He's Speech Recognition Research Tool


* Contact: Jialong He
email: jialong@neuro.informatik.uni-ulm.de

Keyware Biometric Security Products

* Description: VoiceGuardian and S2 Security Server provide


authentication and access control technologies. An online demo of
Voice Guardian is available.
* Contact: Keyware Technologies
_USA_
Keyware Technologies
500 West Cummings Park, Suite 3600, Woburn, MA 01801, USA
Ph: (617) 933 1311, Fax: (617) 933 1554
_Belgium_
Keyware Technologies
Excelsiorlaan 28-30, 1930 Zaventem, Belgium
Ph: 32 2 721 4574, Fax: 32 2 721 5015
_Email:_ sales@keywareusa.com
_WWW:_ http://www.keywareusa.com/

SpeakerKey Voice Verifier from ITT

* Platform: Windows/Pentium and Solaris/SPARC


* Description: SpeakerKey provides over-the-phone voice
verification. It is configurable for use in a wide range of
applications.
SpeakerKey provides a Speaker Verification API (SVAPI).
SpeakerKey uses two technologies: (1) speaker-independent digit
recognition using hidden Markov models, (2) speaker verification
using "Nearest Neighbour Matching with Likelihood Ratio Scoring
and cohort speakers."
Dr. Joe Campbell maintains a SpeakerKey FAQ on the WWW. It
provides a more detailed description of SpeakerKey and discusses
several speaker verification issues:
http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.
html
* Requirements: Minimum 60 MHz Pentium (with sound card) or
SPARCstation 5, plus phone line interface devices.
* Price: Evaluation kits available from $75. Developer's kits are
$1500. Run-time licenses are priced from $600 to $10,000 depending
upon the number of user and/or verifications per hour. Application
customization is available.
* Contact: ITT Industries
Fort Wayne, IN, USA
Ph: +1-219-487-6321, Fax: +1-219-487-6126
Email: speakerkey@itt.com

SpeakEZ Voice Print Speaker Verification

* Description: Designed to prevent cell phone theft and cloning


fraud by comparing the cellular caller's statement of a
pass-phrase to a stored digital "voice print" of the authorized
subscriber. If the caller's voice patterns do not match the stored
voice print, service will be denied or the caller will be referred

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (65 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

to operator assistance for further validation processing. Features


include:
+ Customer selected password.
+ Vocabulary and language independent.
+ No special hardware required by customer.
+ Multiple delivery options.
* Contact: T-NETIX, Inc.
6675 South Kenton Street Englewood, CO 80111 USA
Phone: (800) 352-8628, (303) 790-9111, Fax: (303) 790-9540
WWW: http://www.t-netix.com/

Voice Control Systems: Speaker Verification Technology

* Description: SpeechPrint ID technology provides language


independent speaker verification. Features:
+ Multiple speech input formats
+ Operates over various microphones or the telephone network
+ Can can be used in conjunction with discrete and continuous
recognition
+ Robust against background noise and spurious telephone
channel noise
For more information on features, hardware and software
requirements, pricing and availability, contact Voice Control
Systems, Inc. or visit their the VCS WWW site or the SpeechPrint
ID WWW page.
* See also: VCS speech recognition products in Q6.5.
* Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

___________________________________________________________________________

Q6.7: Integrated Speech Products

This section lists those products which integrate different speech


technologies into a single user package. For example, speech
recognition and speech synthesis can be combined to provide a dialog
management system. Strictly speaking, this doesn't really belong under
in Section 6 (Speech Recognition) but since these products all include
speech recognition, it seems a reasonable place to put it for now!

In the FAQ...

* SpeechWorksfrom Applied Language Technologies, Inc.


* Nortel Speech Technology Products

SpeechWorksfrom Applied Language Technologies, Inc.

* Description: SpeechWorks and companion products provide advanced


speech recognition technology for the telephony market.
SpeechWorks can be used by developers to "speech-enable" call
center, messaging, enhanced services, and other types of
applications. The three major system modules - SpeechWorks,
DialogModules and SpeechBuilder - are described below. More
detailed information is available from the Applied Language
Technologies home page.
ALTech develops and markets speech understanding software which
provides large vocabulary, speaker-independent, phonetic speech
recognition. ALTech's software contains a comprehensive set of
features for speech-enabling telephone-based transactions and

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (66 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

services. SpeechWorks is based on technology licensed from the


Spoken Language Systems Group at the Massachusetts Institute of
Technology.
* SpeechWorks: provides the core speech recognition capabilities.
Features include:
+ Phonetic segment-based, speaker-independent, large
vocabulary, continuous speech recognition
+ Real-time vocabulary generation directly from text
+ Database integration
+ "Barge-in" capability
+ Adaptive channel normalization
+ "n-best" output and associated confidence scores
+ Support for multiple languages
+ Software-only or DSP-based implementations
+ Support for multiple platforms and operating systems (e.g.,
SCO UNIX, WindowsNT, etc.)
* DialogModules: manage the "conversation" between the system and
the caller within an application. They provide high-level
application building blocks which enable developers to quickly and
easily add speech interfaces to computer telephony applications.
Each DialogModule accomplishes a particular task within an
application, ranging from "simple" tasks such as capturing a
yes/no response or a phone number, to more complex tasks such as
capturing credit card information or name and address information.

DialogModules provide "out-of-the-box" functionality. They contain


pre-built grammars, user-interface design, internal call flow and
error recovery routines, parameters for customization and a set of
C++ class libraries and C APIs.
* SpeechBuilder: provides tools for customizing the DialogModules
and for developing and maintaining applications. A GUI-based
Vocabulary Editor provides the ability to generate and maintain
vocabulary or word lists. Pronunciations can be generated
automatically using the built-in dictionary or can be
automatically generated using a set of text-to-phoneme rules.
* Product Bundles: are available which combine SpeechWorks and
multiple DialogModules into application templates for a set of
generic application categories.
+ SpeechForms SpeechForms provides an interactive method for
entering data over the phone, such as ordering products,
filling out surveys and completing registration forms.
Typical applications include: order entry, reservations,
catalog and literature requests, catalog shopping,
subscriptions, change of service, claims, credit card
activation, home banking, stock transactions, and warranty
reservations.
+ SpeechQuery SpeechQuery is used to deliver information in
response to voice requests over the phone, such as airline
information, product delivery status and retirement benefit
information. Typical applications include: order status,
product information, account balance, flight status, movie
listings, job listings, stock quotes, guide
services,classified ads, claims status, dealer locator
services, and technical support.
+ SpeechAgent SpeechAgent provides a set of modules for
automating telephone-based voice messaging applications, such
as integrated messaging, single-number services and
voice-dialing. Typical applications include: voice messaging,
voice dialing, auto attendant, address book access, email
access, and scheduling.
* Platform: Platforms and Operating systems: ALTech's software can
be deployed on industry-standard hardware platforms and operating
systems including: Sun SPARC-based systems running SunOS or
Solaris, IBM RS/6000s running AIX, HP systems running HP-UX, and
486/Pentium-based PCs and servers running Windows, WindowsNT, SCO

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (67 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

UNIX, or Solaris. ALTech's systems are designed to run all or some


of the software on a digital signal processor.
* Availability: contact ALTech for licensing information.
* Contact: Applied Language Technologies, Inc.
215 First Street, Cambridge, MA 02142
Ph: 617-225-0012, Fax: 617-225-0322
Email: to Alisa Moyer: moyer@altech.com
WWW: http://www.altech.com/

Nortel Speech Technology Products

* Nortel's AudioGram Delivery Service (ADS):


When a busy or no answer condition is encountered, an intercept
message offers ADS, which provides a service to the calling party
by taking a message automatically. ADS records the caller's
message and attempts delivery repeatedly if needed until the
message is delivered. ADS is comprised of four independent
services: 0+, 1+ and Local, Intentional, and Millenium AudioGram.
ADS services utilize Nortel's Flexible Voice Recognition (FVR)
voice-processing capabilities. ADS features include:
+ Cost-saving common service platform (NAV)
+ Builds upon existing network investment in toll
infrastructure capabilities of AABS (Automated Alternate
Billing Service)
+ Leverages the capabilities of existing TOPS (Traffic Operator
Position System) attendants.
More information: is available on the Nortel Multimedia Network
Applications WWW page for AudioGram Delivery Service.
* Nortel's Voice-Activated Auto Attendant (VAAA):
Replaces touch tone menu with easy-to-use voice interface. Geared
to businesses and corporations to provide more effective
management of incoming customer calls. Residing on the Network
Applications Vehicle (NAV) platform, VAAA uses Flexible Vocabulary
Recognition (speaker-independent) technology to recognize spoken
words, and directs calls accordingly. Other features include:
+ Cost-saving common service platform (NAV)
+ Serves DTMF and rotary dial callers.
+ Handles incoming calls for all corporate users (Centrex, PBX,
or key systems)
More information: is available on the Nortel Multimedia Network
Applications WWW page for Voice-Activated Auto Attendant.
* Nortel's Voice-Activated Dialing (VAD):
Phoneme-based speech dialing capabilities provided through
speaker-trained and speaker-independent technologies. Residing on
the Network Applications Vehicle (NAV) platform, VAD enables
subscribers to dial using speech, as well as to create and
customize personal telephone directories. Other features include:
+ Cost-saving common service platform (NAV)
+ Speech playback and Text-to-speech synthesis
+ Dual Language capability (optional)
+ Speech Recording
+ Canadian French speechware (optional, prompts and FVR)
+ Spanish speechware (optional, prompts and FVR)
+ 75-name VAD directory size
+ Word-spotting
+ DTMF tone detection
+ Directory sharing
+ Scalable service deployment
+ Talk-through
More information: is available on the Nortel Multimedia Network
Applications WWW page for Voice-Activated Dialing.
* Nortel's Voice-Activated Premier Dialing (VAPD):
Enables businesses to take advantage of the public network

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (68 of 69) [10/31/2003 8:41:31 AM]


http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt

directories to stimulate customer calls. Residing on the Network


Applications Vehicle (NAV) platform, VAPD uses Flexible Vocabulary
Recognition (speaker-independent) technology to recognize business
names, and routes calls to the appropriate business entity. VAPD
promotes cost savings by utilizing a common service platform, the
Network Applications Vehicle (NAV). It services DTMF callers as
well as rotary dialers, and handles incoming calls for all
corporate users: Centrex, PBX, and key systems. More information:
is available on the Nortel Multimedia Network Applications WWW
page for Voice-Activated Premier Dialing.
* Platform: This speech-based service operates on the Network
Applications Vehicle (NAV) platform. NAV is a multi-application,
digital signal processing platform supporting both speech- and
display-based applications. The NAV platform provides the speech
recognition capabilities and application logic used by NAV
features an open, modular hardware architecture and flexible
software design. Other features include:
+ Scalable hardware - from 24 to over 2000 ports per NAV node;
1 to 24 independent application shelves per node
+ Powerful speech processing - speaker-independent and
speaker-trained speech processing support
+ Reliability - N+1, N+M, and 2N redundancy
+ Central Management - access via graphical user interface to
remote connections
* See Also: Nortel Feature Planning Guide, reference number
50004.11; NAV Applications and Planning Guide, reference number
50118.16.
Nortel's Multimedia web pages:
http://www.nortel.com/entprods/multimedia/
* Contact: NORTEL
Multimedia Communications Systems Division
Multimedia Network Applications
1000 Park Forty Plaza
Durham, NC 27713 USA
Ph: 1-800-4NORTEL
WWW: http://www.nortel.com/entprods/multimedia/

___________________________________________________________________________

Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.


This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
long as it is posted in its entirety and includes this copyright statement.
This FAQ may not be distributed for financial gain.
This FAQ may not be included in any collections or compilations
without express permission from the author.

---

Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories Ph: (508) 442-2681
2 Elizabeth Drive, MS UCHL03-207 Fax: (508) 250-5067
Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part3.txt (69 of 69) [10/31/2003 8:41:31 AM]


SpeechLinks

SpeechLinks
Speech Technology Hyperlinks Page
comp.speech FAQ

Following is the list of all the hyperlinks from the comp.speech FAQ. This is probably the biggest list of speech technology links
available. The links are provided to WWW references, ftp sites, and newsgroups. Cross-references to the comp.speech WWW pages
are also provided.

Numbers of links:

Total Links: 639


439 links to WWW sites
178 links to ftp sites
22 links to newsgroups

SpeechLinks Pages

SpeechLinks: General Speech Technology


SpeechLinks: Signal Processing for Speech
SpeechLinks: Speech Coding
SpeechLinks: Speech Synthesis
SpeechLinks: Speech Recognition

comp.speech WWW Availability

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

WWW Links
1stVoice Dragon Systems reseller
URL: http://www.1stvoice.com/
comp.speech refs: [1]
21st Century Eloquence: speech recognition reseller
URL: http://www.voicerecognition.com/
comp.speech refs: [1]
32 kbps ADPCM
URL: http://www.cwi.nl/ftp/audio/adpcm.shar
comp.speech refs: [1]
Academic Press Limited: Computer Speech and Language Journal
URL: http://www.apnet.com/
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (1 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

ACELP Codecs from Sipro Lab Telecom Inc.


URL: http://www.sipro.com/acelp.html
comp.speech refs: [1]
Acuvoice, Inc. speech synthesizer
URL: http://www.acuvoice.com/
comp.speech refs: [1]
Advanced Recognition Technologies, Inc: smARTspeak
URL: http://www.artcomp.com/speak.htm
comp.speech refs: [1]
AF audio networking software
URL: http://www.research.digital.com/CRL/projects/AF/home.html
comp.speech refs: [1]
American National Standards Institute (ANSI)
URL: http://www.ansi.org/
comp.speech refs: [1]
American Voice Input/Output Society (AVIOS) home page
URL: http://www.avios.com/
comp.speech refs: [1]
An Introduction to Text-to-Speech Synthesis: Thierry Dutoit
URL: http://kapis.www.wkap.nl/kapis/CGI-BIN/WORLD/book.htm?0-7923-4498-7
comp.speech refs: [1]
Andrew Simpson's home page
URL: http://www.phon.ucl.ac.uk/home/andrew/home.html
comp.speech refs: [1]
Applied Language Technologies, Inc.: SpeechWorks
URL: http://www.altech.com/
comp.speech refs: [1]
ART: Advanced Recognition Technologies, Inc
URL: http://www.artcomp.com/
comp.speech refs: [1]
Articulate Systems PowerSecretary speech recognition
URL: http://www.artsys.com/
comp.speech refs: [1]
Association for Computational Linguistics (ACL) home page
URL: http://www.cs.columbia.edu:80/~acl/
comp.speech refs: [1]
ASSTA: Australian Speech Science and Technology home page
URL: http://cslab.anu.edu.au/~bruce/assta/
comp.speech refs: [1]
ASSTA: List of Members
URL: http://ciips.ee.uwa.edu.au/~roberto/assta-users/
comp.speech refs: [1]
AsTeR text-to-speech processing
URL: http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html
comp.speech refs: [1]
AT&T Advanced Speech Products Group home page
URL: http://www.att.com/aspg/
comp.speech refs: [1] - [2] - [3]
ATT Bell Laboratories Voices
URL: http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
comp.speech refs: [1]
AT&T Watson: Engineer Training Program
URL: http://www.att.com/aspg/SSI_Class.html
comp.speech refs: [1] - [2]
AT&T Watson: Independent Software Vendor (ISV) Program
URL: http://www.att.com/aspg/ISV_program.html

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (2 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1] - [2]


AT&T Watson: Licensing Program
URL: http://www.att.com/aspg/ISV_program.html#2
comp.speech refs: [1] - [2]
AT&T Watson: Software Development Kit
URL: http://www.att.com/aspg/ISV_program.html#1
comp.speech refs: [1] - [2]
AT&T Watson Speech Applications Platform FAQ
URL: http://www.att.com/aspg/FAQ.html
comp.speech refs: [1] - [2]
Audio and Music Applications for Silicon Graphics Systems
URL: http://reality.sgi.com/employees/cook/audio.apps/public.html
comp.speech refs: [1]
Auditory Modeling information in Malcolm Slaney's home page
URL: http://www.interval.com/~malcolm/pubs.html
comp.speech refs: [1]
Auditory User Interfaces --Toward The Speaking Computer, by T.V. Raman
URL: http://cs.cornell.edu/home/raman/aui
comp.speech refs: [1]
Auscript speech technology vendor
URL: http://www.auscript.com.au/
comp.speech refs: [1]
AVAAZ Home Page
URL: http://www.icis.on.ca/homepages/avaaz/
comp.speech refs: [1] - [2]
Axel Belinfante's home page
URL: http://www.cs.utwente.nl/~belinfan/
comp.speech refs: [1] - [2]
Bavarian Archive for Speech Signals
URL: http://www.phonetik.uni-muenchen.de/BASSeng.html
comp.speech refs: [1]
BBN Hark's home page
URL: http://www.bbn.com/bbn_hark/HarkHome.html
comp.speech refs: [1]
Berkeley Restaurant Project (BeRP)
URL: http://www.icsi.berkeley.edu/real/berp.html
comp.speech refs: [1]
BeSTspeech from Berkeley Speech Technologies, Inc.
URL: http://www.bestspeech.com/index.html
comp.speech refs: [1]
Brite: Computer Telephony Integration & Interactive Voice Response
URL: http://www.brite.com/
comp.speech refs: [1]
Buddy Software Library: MPEG-1 Audio Layer 3 encoder and player
URL: http://www.buddy.org/softlib.html
comp.speech refs: [1]
Castleton Network Systems - G.729 Voice Coder
URL: http://www.castleton.com/
comp.speech refs: [1]
CAVE: Caller Verification in Banking and Telecommunications
URL: http://www.ptt-telecom.nl/cave/
comp.speech refs: [1]
Center for Spoken Language Understanding (CSLU) at the Oregon Graduate Institute of Science and Technology
URL: http://www.cse.ogi.edu/CSLU/
comp.speech refs: [1] - [2]
Centre for Cognitive Science at the University of Edinburgh

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (3 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://www.cogsci.ed.ac.uk/ccs/home.html
comp.speech refs: [1]
Centre for Speech Technology Research, Edinburgh University
URL: http://www.cstr.ed.ac.uk/
comp.speech refs: [1]
Ciaran McElroy's Speech Coding Page
URL: http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.html
comp.speech refs: [1]
CMU dictionary on the WWW
URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
comp.speech refs: [1] - [2]
COCOSDA Home page
URL: http://www.itl.atr.co.jp/cocosda/
comp.speech refs: [1]
Cognitive Science Laboratory at Princeton University
URL: http://www.cogsci.princeton.edu/
comp.speech refs: [1]
Colibri mailing list
URL: http://colibri.let.ruu.nl/
comp.speech refs: [1]
comp.dsp newsgroup FAQ
URL: http://www.bdti.com/faq/dsp_faq.htm
comp.speech refs: [1] - [2] - [3]
Comprehensive list of FFT software
URL: http://tjev.tel.etf.hr/josip/DSP/fft.html
comp.speech refs: [1]
Comprehensive list of WWW dictionaries, acronym lists, translation resources, and a Thesaurus.
URL: http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-Information/Dictionaries-etc.html
comp.speech refs: [1]
comp.speech FAQ at Cambridge University (UK)
URL: http://svr-www.eng.cam.ac.uk/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
comp.speech FAQ at CMU: USA
URL: http://www.speech.cs.cmu.edu/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
comp.speech FAQ at Sydney University: Australia
URL: http://www.speech.su.oz.au/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
comp.speech FAQ: ATR ITL, Japan
URL: http://www.itl.atr.co.jp/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
Computation and Language E-Print Archive
URL: http://xxx.lanl.gov/cmp-lg/
Refs: [1]
Computational Linguistics journal home page
URL: http://www-mitpress.mit.edu/jrnls-catalog/comp-ling.html
comp.speech refs: [1]
Computational Phonology: special issue of Computational Linguistics
URL: http://mitpress.mit.edu/jrnls-catalog/comp-ling-abstracts/comp-ling20-3.html
comp.speech refs: [1]
Computing and Information Systems Department (CISD) of Rutherford Appleton Laboratory, UK
URL: http://www.cis.rl.ac.uk/index.html
comp.speech refs: [1]
Consortium for Lexical Research
URL: http://crl.nmsu.edu/clr/CLR.html
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (4 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

CRC Press: Scientific and Technical Publisher


URL: http://www.crcpress.com/
comp.speech refs: [1]
Creative Labs, Inc.
URL: http://www.creaf.com/
comp.speech refs: [1] - [2]
Creative Labs TextAssist
URL: http://www.creaf.com/wwwnew/products/sound/demo/tareader.html
comp.speech refs: [1]
Creative Labs VoiceAssist
URL: http://www.creaf.com/wwwnew/products/sound/demo/vassist.html
comp.speech refs: [1]
Creative TextAssist description
URL: http://www.creaf.com/wwwnew/tech/devcnr/tassist.html
comp.speech refs: [1]
Creative TextAssist FAQ
URL: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html
comp.speech refs: [1]
CUSeeMe Audio and Video conferencing software
URL: http://cu-seeme.cornell.edu/
comp.speech refs: [1]
CUSeeMe Audio and Video conferencing software
URL: http://cu-seeme.cornell.edu/get_cuseeme.html
comp.speech refs: [1]
CUSeeMe Audio and Video conferencing software
URL: http://cu-seeme.cornell.edu/PC.CU-SeeMeCurrent.html
comp.speech refs: [1]
CustomVoice and CustomTelephone from A&G Graphics Interface Inc.
URL: http://www.customvoice.com/
comp.speech refs: [1]
CyberPhone home page
URL: http://magenta.com/cyberphone/
comp.speech refs: [1]
CyberVoice speech coding
URL: http://www.cybit.com/
comp.speech refs: [1]
DADiSP from DSP Development Corporation
URL: http://www.dadisp.com/
comp.speech refs: [1]
DADiSP from DSP Development Corporation: application to speech processing
URL: http://www.dadisp.com/ab2.htm
comp.speech refs: [1]
DADiSP from DSP Development Corporation: detailed information
URL: http://www.dadisp.com/contact.htm
comp.speech refs: [1]
DADiSP from DSP Development Corporation: free demo software
URL: http://www.dadisp.com/download.htm
comp.speech refs: [1]
DADiSP from DSP Development Corporation: free student edition
URL: http://www.dadisp.com/studntdl.htm
comp.speech refs: [1]
DAX Systems, Inc.: Computer Telephony and Integrated Voice Response
URL: http://www.daxsystems.com/
comp.speech refs: [1]
DECtalk pricing
URL: http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis-oi.html

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (5 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


DECtalk software
URL: http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.html
comp.speech refs: [1]
DECtalk speech synthesis
URL: http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis.html
comp.speech refs: [1]
Demo of BeSTspeech from Berkeley Speech Technologies, Inc.
URL: http://www.bestspeech.com/weblang.html
comp.speech refs: [1]
Demo of rsynth on the WWW
URL: http://wwwtios.cs.utwente.nl/say/
comp.speech refs: [1] - [2]
Dept. of Psychology, University of Western Australia
URL: http://www.psy.uwa.edu.au/
comp.speech refs: [1]
DigiPhone availability
URL: http://www.planeteers.com/retail.htm
comp.speech refs: [1]
DigiPhone Deluxe
URL: http://www.planeteers.com/digiphon/dpsr.htm
comp.speech refs: [1]
DigiPhone for Mac
URL: http://www.planeteers.com/digifone/digimac.htm
comp.speech refs: [1]
DigiPhone for Third Planet Publishing
URL: http://www.planeteers.com/
comp.speech refs: [1]
DigiPhone Global Directory of users
URL: http://www.planeteers.com/digiphon/global.htm
comp.speech refs: [1]
DigiPhone trial download page
URL: http://www.planeteers.com/download/download.htm
comp.speech refs: [1]
DigiPhone v1.03
URL: http://www.planeteers.com/digiphon/dpjr.htm
comp.speech refs: [1]
Digital Dreams Speech Recognition Plug-Ins
URL: http://www.surftalk.com/
comp.speech refs: [1]
Digital Equipment Corporation
URL: http://www.digital.com/
comp.speech refs: [1]
Digital Signal Processing Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1995/ee_4773/
comp.speech refs: [1]
Digital Signal Processing course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1995/ee_4773/SYLLABUS.ps
comp.speech refs: [1]
Digital Signal Processing (DSP) group at Rice University
URL: http://www-dsp.rice.edu/
comp.speech refs: [1]
Discrete HMM demonstration software
URL: http://www.isip.msstate.edu/software/
comp.speech refs: [1]
Dragon Developers Page

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (6 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://www.dragonsys.com/marketing/dragondeveloper.html
comp.speech refs: [1]
Dragon home page
URL: http://www.dragonsys.com/
comp.speech refs: [1] - [2]
Dragon NaturallySpeaking
URL: http://www.naturallyspeaking.com/
comp.speech refs: [1]
Dragon PowerSecretary
URL: http://www.dragonsys.com/marketing/powersecretary.html
comp.speech refs: [1]
Dragon Telephony Products
URL: http://www.dragonsys.com/marketing/telephony.html
comp.speech refs: [1]
Duncan M. Forrest's Speech Recognition Resource List
URL: http://www.skye.co.za/dmf/speech/
comp.speech refs: [1]
Dynastat, Inc: Speech Intelligibility and Quality Testing
URL: http://www.bga.com/dynastat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus
URL: http://www.cis.rl.ac.uk/proj/psych/eat/eat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus: WWW Interactive version
URL: http://www.cis.rl.ac.uk/proj/psych/eat.html
comp.speech refs: [1]
Eg3 Communications: DSP Internet Resources
URL: http://www.eg3.com/dsp.htm
comp.speech refs: [1]
Eg3 Communications: Engineering Information Online
URL: http://www.eg3.com/
comp.speech refs: [1]
Elan Informatique demo registration
URL: http://www.elan.fr/speech/spe-LITO.htm
comp.speech refs: [1]
Elan Informatique: Proverbe demonstration software
URL: http://www.elan.fr/vocal/technical/demoSE.htm
comp.speech refs: [1]
Elan Informatique: Proverbe sample sound files
URL: http://www.elan.fr/vocal/technical/sndwave.htm
comp.speech refs: [1]
Elan Informatique: Proverbe speech synthesis
URL: http://www.elan.fr/vocal/prod-pse.htm
comp.speech refs: [1]
Elan Informatique: ProVerbe Speech Synthesis Engine
URL: http://www.elan.fr/
comp.speech refs: [1]
Eloquence speech synthesis
URL: http://www.eloq.com/
comp.speech refs: [1]
Elsevier Science: Speech Communication journal
URL: http://www.elsevier.com/
comp.speech refs: [1]
Emacspeak - A Speech Output Subsystem For Emacs
URL: http://www.research.digital.com/CRL/personal/raman/emacspeak/emacspeak.html
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (7 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Emacspeak FAQ
URL: http://www.research.digital.com/CRL/personal/raman/emacspeak/faqs.html
comp.speech refs: [1]
Entropic Research Laboratory home page
URL: http://www.entropic.com/
comp.speech refs: [1] - [2] - [3]
Entropic Signal Processing System (ESPS)
URL: http://www.entropic.com/esps.html
comp.speech refs: [1]
HTK (Hidden-Markov Model Toolkit)
http://htk.eng.cam.ac.uk/
comp.speech refs: [1]
ESCA: European Speech Communication Association list of research sites
URL: http://ophale.icp.grenet.fr/esca/labos.html
comp.speech refs: [1]
European Language Resources Association
URL: http://www.icp.grenet.fr/ELRA/home.html
comp.speech refs: [1]
European Speech Communication Association (ESCA) home page
URL: http://ophale.icp.grenet.fr/esca/esca.html
comp.speech refs: [1]
Eurovocs speech synthesis
URL: http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
comp.speech refs: [1] - [2]
FAQ: How can I use the Internet as a telephone?
URL: http://rpcp.mit.edu/~asears/voice-faq.html
comp.speech refs: [1]
Festival Speech Synthesis System: download software
URL: http://www.cstr.ed.ac.uk/projects/festival/download.html
comp.speech refs: [1]
Festival Speech Synthesis System: home page
URL: http://www.cstr.ed.ac.uk/projects/festival.html
comp.speech refs: [1] - [2]
FFTW software
URL: http://theory.lcs.mit.edu/~fftw
comp.speech refs: [1]
Ficomp Inc. Interpreter 6000
URL: http://www.ficompsystems.com/
comp.speech refs: [1]
Free Speech Journal
URL: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html
comp.speech refs: [1]
Fundamentals of Speech Recognition Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course lecture notes by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/lectures/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/SYLLABUS.ps
comp.speech refs: [1] - [2]
G.729 Annex A from Sipro Lab Telecom Inc
URL: http://www.sipro.com/g729a.html
comp.speech refs: [1]
George L. Dillon's Consonant sounds of English
URL: http://weber.u.washington.edu/~dillon/consonants.html

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (8 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


George L. Dillon's list of phonetic resources
URL: http://weber.u.washington.edu/~dillon/PhonResources.html
comp.speech refs: [1]
George L. Dillon's Vowel Quadrilaterals for American and British English
URL: http://weber.u.washington.edu/~dillon/newstart.html
comp.speech refs: [1]
George L. Dillon's Vowel sounds of American English
URL: http://weber.u.washington.edu/~dillon/vowels.html
comp.speech refs: [1]
German speech synthesis from Institut fur Technische Informatik und Kommunikationsnetze
URL: http://www.tik.ee.ethz.ch/cgi-bin/w3svox
comp.speech refs: [1]
GoldWave digital audio editor for Microsoft Windows
URL: http://web.cs.mun.ca/~chris3/goldwave/goldwave.html
comp.speech refs: [1]
GSM 06.10 Compression
URL: http://www.cs.tu-berlin.de/~jutta/toast.html
comp.speech refs: [1]
Hadifix German speech synthesis
URL: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
comp.speech refs: [1]
Hadifix speech synthesis demo
URL: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
comp.speech refs: [1] - [2]
Hangai Lab: demo of speaker identification
URL: http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech1.html
comp.speech refs: [1]
Hangai Lab: demo of speaker verification
URL: http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech2.html
comp.speech refs: [1]
Hangai Lab: demos of speaker recognition
URL: http://miya8f05.ee.kagu.sut.ac.jp/index.html
comp.speech refs: [1]
Haskins Laboratory WWW Site
URL: http://www.haskins.yale.edu/Haskins/MISC/special.html
comp.speech refs: [1]
Head-Driven Phrase Structure Grammar Home Page
URL: http://julius.ling.ohio-state.edu/HPSG/Hpsg.html
comp.speech refs: [1]
How to Install an MPEG Audio Player for your Web Navigator
URL: http://www.mpeg.org/index.html/MPEG-audio-player.html
comp.speech refs: [1]
IBM VoiceType Control
URL: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
comp.speech refs: [1]
IBM VoiceType Dictation
URL: http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
comp.speech refs: [1]
IBM VoiceType Dictation FAQ
URL: http://www.infi.net/~ums/ibmfaq.htm
comp.speech refs: [1]
IBM VoiceType Dictation from UltraMedia Systems International
URL: http://www.infi.net/~ums/
comp.speech refs: [1]
IBM VoiceType Ordering

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (9 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
comp.speech refs: [1]
IBM VoiceType System Requirements
URL: http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html
comp.speech refs: [1]
ICSLP '98: Sydney, Australia
URL: http://cslab.anu.edu.au/icslp98/
comp.speech refs: [1]
ICSLP'96: Philadelphia
URL: http://www.asel.udel.edu/speech/icslp.html
comp.speech refs: [1]
IEEE Home Page
URL: http://www.ieee.org/
comp.speech refs: [1]
IEEE Signal Processing Society
URL: http://www.ieee.org/sp/index.html
comp.speech refs: [1]
IGE
URL: http://www.york.ac.uk/~rpf1/IGE.html
comp.speech refs: [1]
IN CUBE for Windows 95
URL: http://www.commandcorp.com/cci/win95.html
comp.speech refs: [1]
IN CUBE from Command Corp. Inc.
URL: http://www.commandcorp.com/incube_welcome.html
comp.speech refs: [1]
IN CUBE Mark II Pro for Windows NT
URL: http://www.commandcorp.com/cci/pront.html
comp.speech refs: [1]
IN CUBE Voice Command for Sun SPARCstations
URL: http://www.commandcorp.com/cci/in3sparc.html
comp.speech refs: [1]
Infolingua Bibliographies
URL: http://gomer.mlink.net/infolingua.html
comp.speech refs: [1] - [2] - [3]
Infovox Multi-Lingual Speech Synthesis Products
URL: http://www.promotor.telia.se/NYA/cc/t-s/index.html
comp.speech refs: [1]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
Institute for Communications Research and Phonetics, University of Bonn: Hadifix speech synthesis
URL: http://asl1.ikp.uni-bonn.de/Welcome.html
comp.speech refs: [1]
Institute for Language Speech and Hearing, the University of Sheffield
URL: http://www.dcs.shef.ac.uk/research/ilash/
comp.speech refs: [1]
Institute for Perception Research: Speech on the Web
URL: http://www.tue.nl/ipo/hearing/webspeak.htm
comp.speech refs: [1]
Institute for Signal and Information Processing (ISIP) at Mississippi State University
URL: http://www.isip.msstate.edu/
comp.speech refs: [1] - [2] - [3] - [4]
Institute of Phonetic Sciences, University of Amsterdam
URL: http://fonsg3.let.uva.nl/Welcome.html
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (10 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

InterFACE from Hijinx: Internet phone software


URL: http://www.hijinx.com.au/
comp.speech refs: [1]
International Computer Science Institute in Berkeley, CA
URL: http://www.icsi.berkeley.edu/
comp.speech refs: [1]
International Phonetic Alphabet cassette of sounds
URL: http://www.phon.ucl.ac.uk/home/wells/cassette.htm
comp.speech refs: [1]
International Phonetic Alphabet (IPA)
URL: http://www.arts.gla.ac.uk/IPA/ipachart.html
comp.speech refs: [1] - [2] - [3]
International Phonetic Alphabet (IPA) chart of symbols
URL: http://www.arts.gla.ac.uk/IPA/fullchart.html
comp.speech refs: [1]
International Phonetic Association
URL: http://www.arts.gla.ac.uk/IPA/ipa.html
comp.speech refs: [1]
International Telecommunications Union standards information
URL: http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html
comp.speech refs: [1]
International Telecommunications Union WWW site
URL: http://www.itu.ch/
comp.speech refs: [1]
Internet Phone from VocalTec
URL: http://www.vocaltec.com/
comp.speech refs: [1]
Internet Phone from VocalTec ordering information
URL: http://www.vocaltec.com/order.html
comp.speech refs: [1]
Intl. Phonetic Alphabet transcriptions in ASCII
URL: http://weber.u.washington.edu/~dillon/ipaascii.html
comp.speech refs: [1]
Introduction to Computational Phonology by Steven Bird
URL: http://www.cogsci.ed.ac.uk/phonology/comp-phon-intro.ps.Z
comp.speech refs: [1]
IPOX: All Prosodic Speech Synthesis Architecture
URL: http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm
comp.speech refs: [1]
Jason Woodard's Speech Coding Page
URL: http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Jon Iles' Speech Synthesis "Museum"
URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
comp.speech refs: [1] - [2]
Journal of the Acoustical Society of America (JASA)
URL: http://asa.aip.org/jasa.html
comp.speech refs: [1]
JTS Micro Consulting Ltd: PAM, JTS Reader and Listen2
URL: http://www.islandnet.com/jts/
comp.speech refs: [1]
Kay Elemetrics home page
URL: http://www.kayelemetrics.com/

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (11 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


Kevin Lenzo's home page
URL: http://www.cs.cmu.edu/afs/cs/user/lenzo/html/index.html
comp.speech refs: [1] - [2] - [3]
Kevin Lenzo's page of Speech Applications for the Macintosh
URL: http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Keyware S2 Security Service
URL: http://www.keywareusa.com/Products/S2SecurityServer/main.html
comp.speech refs: [1]
Keyware Technologies Biometric Verificaton
URL: http://www.keywareusa.com/
comp.speech refs: [1]
Keyware VoiceGuardian
URL: http://www.keywareusa.com/Products/VoiceGuardian/main.html
comp.speech refs: [1]
Keyware VoiceGuardian online demo
URL: http://www.keywareusa.com/Demos/
comp.speech refs: [1]
Khoros signal and image processing environment from Khoral Research Inc.
URL: http://www.khoral.com/
comp.speech refs: [1]
Kurzweil Clinical Reporter speech recognition
URL: http://www.kurzweil.com/medical/
comp.speech refs: [1]
Kurzweil Voice for Windows: speech recognition
URL: http://www.kurzweil.com/
comp.speech refs: [1]
Laureate speech synthesis from British Telecom
URL: http://www.labs.bt.com/innovate/speech/laureate/
comp.speech refs: [1]
LawTalk from WildCard
URL: http://www.wildcardtech.com/speech/info/lawtalk.htm
comp.speech refs: [1]
Learning Company's Language Training
URL: http://www.learningco.Inter.net/foreign.html
comp.speech refs: [1]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie speech coding
URL: http://www.lhs.com/coding.html
comp.speech refs: [1] - [2]
Lernout and Hauspie speech recognition
URL: http://www.lhs.com/asr.html
comp.speech refs: [1] - [2]
Lernout and Hauspie text-to-speech
URL: http://www.lhs.com/tts.html
comp.speech refs: [1] - [2]
Linguistic associations and linguistic WWW links
URL: http://engserve.tamu.edu/files/linguistics/linguist/associations.html
comp.speech refs: [1]
Linguistic Data Consortium home page
URL: http://www.ldc.upenn.edu/
comp.speech refs: [1] - [2]
Linguistics Abstracts Online

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (12 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://www.blackwellpublishers.co.uk/labs/
comp.speech refs: [1]
List of Links Relating to Sound Computation
URL: http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html
comp.speech refs: [1]
List of online dictionaries: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Dictionaries
comp.speech refs: [1]
List of speech conferences and meetings: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Meetings
comp.speech refs: [1]
List of speech research sites: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Phonetics
comp.speech refs: [1]
Listen2 web page
URL: http://www.islandnet.com/jts/listen2.htm
comp.speech refs: [1]
Lists of References on Automatic Speaker Verification
URL: http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html
comp.speech refs: [1]
Louis Pols's List of References on Synthesis Development And Assessment
URL: http://www.itl.atr.co.jp/cocosda/output/synth.refs
comp.speech refs: [1]
LPC-10 speech coding software
URL: http://www.arl.wustl.edu/~jaf/lpc/
comp.speech refs: [1]
Lucent Technologies Bell Labs Text-to-Speech
URL: http://www.bell-labs.com/project/tts/
comp.speech refs: [1] - [2]
Lucent Technologies Bell Labs Text-to-Speech: system description
URL: http://www.bell-labs.com/project/tts/tts-overview.html
comp.speech refs: [1]
Lyricos singing speech synthesis
URL: http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
comp.speech refs: [1]
Macintosh Speech: Developer's Information
URL: http://www.speech.apple.com/speech/dev/dev.html
comp.speech refs: [1]
Macintosh Speech Page: Speech Manager and PlainTalk
URL: http://www.speech.apple.com/
comp.speech refs: [1] - [2] - [3] - [4]
MacYack Pro Speech Synthesis software
URL: http://www.lowtek.com/macyack/
comp.speech refs: [1]
Malcolm Crawford's home page
URL: http://www.dcs.shef.ac.uk/~malc/
comp.speech refs: [1]
Malcolm Slaney's home page
URL: http://www.interval.com/~malcolm/
comp.speech refs: [1]
Man-Machine Interfacing
URL: http://www.speechrec.com/
comp.speech refs: [1]
Martin Cooke's home page: auditory modelling and speech recognition in noise
URL: http://www.dcs.shef.ac.uk/~martin/
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (13 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Martin Ramsch's Englisch-Worterbucher aller Art


URL: http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/englisch.html
comp.speech refs: [1]
Massachusetts Institute of Technology
URL: http://web.mit.edu/
comp.speech refs: [1]
Matlab plus Signal Processing Toolbox
URL: http://www.mathworks.com/
comp.speech refs: [1]
MBROLA speech synthesis demonstration
URL: http://tcts.fpms.ac.be/synthesis/modelcmp.html
comp.speech refs: [1] - [2]
MBROLA speech synthesis project home page
URL: http://tcts.fpms.ac.be/synthesis/mbrola.html
comp.speech refs: [1]
Meetings of the Acoustical Society of America (ASA)
URL: http://asa.aip.org/meetings.html
comp.speech refs: [1]
MicMac Recording Software for Macs
URL: http://moof.com/nirvana/
comp.speech refs: [1]
Microsoft Speech API
URL: http://www.microsoft.com/MEDIADEV/AUDIO/MSPEECH1.HTM
comp.speech refs: [1]
Microsoft Speech API: An Overview
URL: http://www.microsoft.com/mediadev/audio/mspover.htm
comp.speech refs: [1]
Microsoft Speech API SDK
URL: http://www.research.microsoft.com/research/srg/install.htm#SDK
comp.speech refs: [1]
Microsoft Speech SDK
URL: http://www.research.microsoft.com/research/srg/install.htm
comp.speech refs: [1] - [2]
Microsoft Speech Technology home page
URL: http://www.research.microsoft.com/research/srg/
comp.speech refs: [1]
Microsoft Telephony API
URL: http://www.microsoft.com/ntserver/communications/tapi.htm
comp.speech refs: [1]
Microsoft Telephony API White Paper
URL: http://www.microsoft.com/ntserver/communications/tapi_wp.htm
comp.speech refs: [1]
Mike Noel's home page (CLSU)
URL: http://www.cse.ogi.edu/~noel/
comp.speech refs: [1]
Mississippi State University
URL: http://www.msstate.edu/
comp.speech refs: [1] - [2]
Moby lexical resources
URL: http://www.dcs.shef.ac.uk/research/ilash/Moby/
comp.speech refs: [1]
Monologue for Windows from First Byte
URL: http://www.firstbyte.davd.com/
comp.speech refs: [1] - [2]
MPEG FAQ by Chad Fogg
URL: http://www-plateau.cs.berkeley.edu/mpegfaq/MPEG-2-FAQ.html

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (14 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


MPEG FAQ by Frank Gadegast
URL: http://www.powerweb.de/mpeg/mpegfaq/
comp.speech refs: [1]
MPEG FAQ by Luigi
URL: http://www.crs4.it/~luigi/MPEG/mpegfaq.html
comp.speech refs: [1]
MPEG Pointers and Resources
URL: http://www.mpeg.org/
comp.speech refs: [1]
MPEG-1 Audio Layer 3 encoder, decoder and FAQ
URL: http://www.iis.fhg.de/departs/amm/layer3/index.html
comp.speech refs: [1]
MPEG-2 Audio FAQ from Philips
URL: http://www.keymodules.philips.com/MD/mpeg/faqmpeg2.htm
comp.speech refs: [1]
MRC Psycholinguistic Database: WWW Interface
URL: http://www.psy.uwa.edu.au/uwa_mrc.htm
comp.speech refs: [1]
Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
URL: http://www.fb9-ti.uni-duisburg.de/demos/speech.html
comp.speech refs: [1]
Musee sonore de la synthese de la Parole en francais
URL: http://ophale.icp.grenet.fr/exemples_synthese/ex.html
comp.speech refs: [1]
Museum of Speech Analysis and Synthesis
URL: http://mambo.ucsc.edu/psl/smus/smus.html
comp.speech refs: [1]
National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/ncvs_home.html
comp.speech refs: [1] - [2]
National Institute of Standards and Technology (NIST).
URL: http://www.nist.gov/
comp.speech refs: [1]
Nautilus home page: Secure Computer Telephony
URL: http://www.lila.com/nautilus/
comp.speech refs: [1]
Netscape Communications Corporation
URL: http://home.netscape.com/
comp.speech refs: [1]
NetSpeak home page
URL: http://www.netspeak.com/
comp.speech refs: [1]
NICO Artificial Neural Network Toolkit
URL: http://www.speech.kth.se/NICO/index.html
comp.speech refs: [1]
NICO Artificial Neural Network Toolkit download page
URL: http://www.speech.kth.se/NICO/download.html
comp.speech refs: [1]
NOISEX-92 database
URL: http://spib.rice.edu/spib/select_noise.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications
URL: http://www.nortel.com/entprods/multimedia/
comp.speech refs: [1]
Nortel: Multimedia Network Applications: AudioGram Delivery Service

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (15 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://www.nortel.com/entprods/multimedia/applications/audiogrm.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Auto Attendant
URL: http://www.nortel.com/entprods/multimedia/applications/autoattd.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Dialing
URL: http://www.nortel.com/entprods/multimedia/applications/vad.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Premier Dialing
URL: http://www.nortel.com/entprods/multimedia/applications/premdial.html
comp.speech refs: [1]
Nortel: Network Applications Vehicle
URL: http://www.nortel.com/entprods/multimedia/nav.html
comp.speech refs: [1]
Nortel: Northern Telecom, provider of network voice applications
URL: http://www.nortel.com/
comp.speech refs: [1]
N!Power
URL: http://www.silcom.com/~stilarry/
comp.speech refs: [1]
Nuance Communications: Speech recognition
URL: http://www.nuance.com/
comp.speech refs: [1]
O'Brien Resources: Speech Recognition Sales
URL: http://www.crosslink.net/~obrien/
comp.speech refs: [1]
OfficeTalk and LawTalk from WildCard
URL: http://www.wildcardtech.com/
comp.speech refs: [1]
OfficeTalk from WildCard
URL: http://www.wildcardtech.com/speech/info/offtalk.htm
comp.speech refs: [1]
OGI Synthesis using Festival
URL: http://www.cse.ogi.edu/CSLU/research/TTS
comp.speech refs: [1]
Online bibiliography for Phonetics and Speech Technology
URL: http://www.uni-frankfurt.de/~ifb/bib_engl.html
comp.speech refs: [1] - [2] - [3]
Online Speech Synthesis: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/IFA-Features.html
comp.speech refs: [1]
Orator from Bellcore: home page
URL: http://www.bellcore.com/ORATOR/
comp.speech refs: [1] - [2]
PAM - A Text-To-Speech Application
URL: http://www.islandnet.com/~tslemko/
comp.speech refs: [1]
Pavarobotti synthesis technology from the National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
comp.speech refs: [1]
Peter Meijer's home page
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/
comp.speech refs: [1]
Peter Meijer's "the vOICe" Java applet/application
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/javoice.htm
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (16 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

PGPfone secure audio networking


URL: http://web.mit.edu/network/pgpfone/
comp.speech refs: [1]
Phil Karn's Digital/Analog Voice Demo
URL: http://www.qualcomm.com/people/pkarn/voicedemo/
comp.speech refs: [1]
Phillips Speech home page
URL: http://www.speech.be.philips.com/
comp.speech refs: [1]
Phillips Speech Processing System 6000s: Radiology dictation
URL: http://www.speech.be.philips.com/sp6000.htm
comp.speech refs: [1]
Phillips SpeechMagic dictation sstem
URL: http://www.speech.be.philips.com/sp-magic.htm
comp.speech refs: [1]
Plaintalk mailing list
URL: http://cgi.skyweyr.com/Plaintalk.Home
comp.speech refs: [1] - [2]
Poynton's Digital Signal Processing Resource List
URL: http://www.inforamp.net/~poynton/Poynton-dsp.html
comp.speech refs: [1]
Project Gutenberg
URL: http://www.prairienet.org/pg/
comp.speech refs: [1]
Pronotes Speech Recognition
URL: http://www.pronotes.com/
comp.speech refs: [1]
PureSpeech, Inc. WWW Site
URL: http://www.speech.com/
comp.speech refs: [1]
Quadravox Speech Processing Products - Qbox
URL: http://www.quadravox.com/
comp.speech refs: [1]
Real-time Visual Displays for Professional Voice Development
URL: http://www.york.ac.uk/~elec10/
comp.speech refs: [1]
RELATOR project: linguistic resources
URL: http://cristal.icp.grenet.fr/Relator/homepage.html
comp.speech refs: [1] - [2]
Rockwell's DigiTalk
URL: http://www.nb.rockwell.com/ref/digitalk/
comp.speech refs: [1]
Russ Wilcox's list of Commercial Speech Recognition
URL: http://www.tiac.net/users/rwilcox/speech.html
comp.speech refs: [1] - [2]
Scantron Quality Computers: for MacYack Pro Speech Synthesis software
URL: http://www.sqc.com/
comp.speech refs: [1]
SCI VoiceAutomated: speech recognition reseller
URL: http://www.voiceautomated.com/
comp.speech refs: [1]
Search Alta Vista for speech recognition
URL: http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=.&q=%2Bspeech+%2Brecognition
comp.speech refs: [1]
Search Lycos for speech recognition
URL: http://www.lycos.com/cgi-bin/pursuit?query=speech+recognition&ab=the_catalog

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (17 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


Sensimetrics Corporation: SENSYN speech synthesizer
URL: http://www.sens.com/
comp.speech refs: [1]
Sensory Circuits: Integrated Circuits for Speech Synthesis, Recognition and Verification
URL: http://www.sensoryinc.com/
comp.speech refs: [1]
ShATR: A Multi-simultaneous-speaker corpus
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.html
comp.speech refs: [1]
Shikano's WWW site on Speech and Acoustics
URL: http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html
comp.speech refs: [1] - [2] - [3]
Signal Processing and Interpretation Lab at Boston University
URL: http://raven.bu.edu/
comp.speech refs: [1]
Signal Processing Home page
URL: http://tjev.tel.etf.hr/josip/DSP/sigproc.html
comp.speech refs: [1]
Signalyze 3.0 from InfoSignal
URL: http://www.agoralang.com:2410/pubdirsoftware.html
comp.speech refs: [1]
Signalyze 3.0 from InfoSignal
URL: http://www.agoralang.com:2410/signalyze.html
comp.speech refs: [1]
SIGPHON Computational Phonology home page
URL: http://www.cogsci.ed.ac.uk/sigphon/
comp.speech refs: [1]
Silicon Graphics audio Frequently Asked Questions (FAQ)
URL: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
comp.speech refs: [1]
Simon Crosby's FAQ for DragonDictate
URL: http://www.cl.cam.ac.uk/users/sac/dd-faq.html
comp.speech refs: [1] - [2]
Simon Crosby's home page (maintains an FAQ for DragonDictate)
URL: http://www.cl.cam.ac.uk/users/sac/
comp.speech refs: [1] - [2]
SimTel programs for sound and soundcards
URL: http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
comp.speech refs: [1] - [2]
Sipro Lab Telecom Inc.: Speech coding technology
URL: http://www.sipro.com/
comp.speech refs: [1]
Some things about studying Speech
URL: http://www.ccp.uchicago.edu/grad/Francis_Alex/speech.html
comp.speech refs: [1]
Sound conversion software
URL: http://peace.wit.com/sounds/SoundConversion/
comp.speech refs: [1]
Sound Processing Kit
URL: http://www.music.helsinki.fi/research/spkit/documentation/SPKit.html
comp.speech refs: [1]
Sound Processing Kit software
URL: http://www.music.helsinki.fi/research/spkit/distribution/spkit.tar.Z
comp.speech refs: [1]
Sound Related Resources

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (18 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://pscinfo.psc.edu/~geigel/menus/sound.html
comp.speech refs: [1]
Soundcard WWW Site
URL: http://www.wi.leidenuniv.nl/audio/
comp.speech refs: [1]
Speak Freely audio networking software
URL: http://www.fourmilab.ch/netfone/windows/speak_freely.html
comp.speech refs: [1]
Speaker Identification And Verification: LIMSI Report
URL: http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html
comp.speech refs: [1]
SpeakerKey Speaker Verification: FAQ
URL: http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.html
comp.speech refs: [1]
Speech and Hearing Research Group, University of Sheffield
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/
comp.speech refs: [1]
Speech and Hearing Research Group, University of Sheffield: Links to Speech Sites
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclinks.html
comp.speech refs: [1]
Speech and Language Technology Club
URL: http://salt.essex.ac.uk/salt/
comp.speech refs: [1]
Speech Applications Project at Sun Microsystems Laboratories: SpeechActs
URL: http://www.sunlabs.com/research/speech/
comp.speech refs: [1]
Speech Coding and Synthesis Book
URL: http://www.elsevier.nl/section/engtech/scs/menu.htm
comp.speech refs: [1] - [2]
Speech Coding Demonstration
URL: http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html
comp.speech refs: [1]
Speech Communications journal home page
URL: http://www.elsevier.nl:80/eee/specom/contents.html
comp.speech refs: [1]
Speech Groups List from Leeds University Cognitive Psychology Research Group
URL: http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html
comp.speech refs: [1]
Speech Recognition Course Notes
URL: http://www.isip.msstate.edu/publications/1996/speech_recognition_short_course
comp.speech refs: [1]
Speech Recognition List: Applied Speech Technology Laboratory of CLSI at Stanford
URL: http://csli-www.stanford.edu/users/bscott/SRTech.html
comp.speech refs: [1]
Speech Research List
URL: http://mambo.ucsc.edu/psl/speech.html
comp.speech refs: [1]
Speech Systems Phonetic Engine speech recognition
URL: http://www.speechsys.com/
comp.speech refs: [1]
Speech Technology Research Ltd.
URL: http://www.speechtech.com/home/speechtech/
comp.speech refs: [1] - [2]
Speech Toys
URL: http://www.speechtoys.com/
comp.speech refs: [1] - [2]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (19 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Speech Toys page on Speech Recognition


URL: http://www.speechtoys.com/spchtoys/sprec.html
comp.speech refs: [1]
Speech Toys page on Speech Synthesis
URL: http://www.speechtoys.com/spchtoys/spsyn.html
comp.speech refs: [1]
SpeechPrint ID from Voice Control Systems, Inc.
URL: http://www.voicecontrol.com/speechid.html
comp.speech refs: [1]
SpeechTEK '96 Conference and Exhibition
URL: http://www.speechtek.com/
comp.speech refs: [1]
SpeechViewer II: speech therapy tool
URL: http://www.austin.ibm.com/pspinfo/snsspv2.html
comp.speech refs: [1]
SPLIB: Signal Processing url LIBrary
URL: http://jazz.rice.edu/splib/
comp.speech refs: [1]
Spoken Language Systems Group at the Massachusetts Institute of Technology
URL: http://www.sls.lcs.mit.edu/
comp.speech refs: [1]
SRAPI: Speech Recognition API home page
URL: http://www.srapi.com/
comp.speech refs: [1]
StarAudio Compressor/Player shareware software
URL: http://www.speechtech.com/home/speechtech/loadview.html
comp.speech refs: [1]
Stoneridge Technical Services: VoiceNews newsletter
URL: http://www.stoneridgetech.com/
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts
URL: http://www.sil.org/
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/HLTsurvey.html
comp.speech refs: [1] - [2] - [3]
Survey of the State of the Art in Human Language Technology: Speaker Recognition
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology: Spoken Input Technologies
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology: Text-to-Speech Technologies.
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html
comp.speech refs: [1]
Synapse: speech recognition sales
URL: http://www.synapseadaptive.com/
comp.speech refs: [1]
T. V. Raman's home page
URL: http://www.research.digital.com/CRL/personal/raman/raman.html
comp.speech refs: [1]
Talk Technology, Inc.: Speech recognition reseller
URL: http://www.usbusiness.com/talk/
comp.speech refs: [1]
Talk Technology: speech recognition reseller
URL: http://www.talktechnology.com/

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (20 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


Talking to a PC May Be Hazard To Your Throat, by Julie Chao
URL: http://www.bilbo.com/tae/bilbo/wsj.html
comp.speech refs: [1]
Talking to Computers Has its Hazards, by Gordon Arnaut
URL: http://www.bilbo.com/tae/bilbo/globmail.html
comp.speech refs: [1]
TCTS Home Page: MBROLA speech synthesis and SPRACH speech recognition
URL: http://tcts.fpms.ac.be/
comp.speech refs: [1]
TCTS-Multitel: Speech Synthesis research group home page
URL: http://tcts.fpms.ac.be/synthesis/synthesis.html
comp.speech refs: [1]
Text, Speech and Language Technology series
URL: http://kapis.www.wkap.nl/kapis/CGI-BIN/WORLD/series.htm?TLTB
comp.speech refs: [1]
The Acoustical Society of America (ASA) home page
URL: http://asa.aip.org/
comp.speech refs: [1]
The University of Sheffield
URL: http://www.shef.ac.uk/
comp.speech refs: [1]
Thierry Dutoit's home page
URL: http://tcts.fpms.ac.be/synthesis/dutoit.html
comp.speech refs: [1]
TMA Associates: speech technology consulting
URL: http://www.tmaa.com/
comp.speech refs: [1]
TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan
URL: http://www.speech.kth.se/info/software.html
comp.speech refs: [1]
T-Netix speaker verification for cellular communications
URL: http://www.t-netix.com/
comp.speech refs: [1]
Tony Robinson's home page
URL: http://svr-www.eng.cam.ac.uk/~ajr/
comp.speech refs: [1] - [2] - [3] - [4] - [5]
Tony Robinson's speech analysis course
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/
comp.speech refs: [1] - [2]
Tony Robinson's speech analysis course: Filter bank analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node17.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Formant analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node61.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Linear prediction analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node38.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Sampling theory
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node9.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Short-term fourier analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node21.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Speech coding

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (21 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
comp.speech refs: [1] - [2]
Tony Robinson's speech analysis course: Voicing analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node68.html
comp.speech refs: [1]
ToolVox from Voxware
URL: http://www.voxware.com/
comp.speech refs: [1]
ToppCopy Telecom: Speech recognition reseller
URL: http://www.toppcopy.com/
comp.speech refs: [1]
Trainable text-to-phoneme software by Antonio Lucca
URL: http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
comp.speech refs: [1]
TrueSpeech capability for WWW pages
URL: http://www.dspg.com/webpage.htm
comp.speech refs: [1]
TrueSpeech from DSP Group
URL: http://www.dspg.com/index.html
comp.speech refs: [1]
TrueTalk from Entropic
URL: http://www.entropic.com/truetalk.html
comp.speech refs: [1]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/centigram/Products/Technology/Truvoice/TruVoice_Brochure.html
comp.speech refs: [1]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/centigram/TruVoice/index.html
comp.speech refs: [1] - [2]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/
comp.speech refs: [1]
Typing Injuries Page
URL: http://alumni.caltech.edu/~dank/typing-archive.html
comp.speech refs: [1]
Typing Injury FAQ
URL: http://www.cs.princeton.edu:80/~dwallach/tifaq/
comp.speech refs: [1]
University of Edinburgh
URL: http://www.ed.ac.uk/
comp.speech refs: [1]
University of Victoria Phonetic Database
URL: http://www.speechtech.com/home/speechtech/csl3.html
comp.speech refs: [1]
VAULT Speaker Verification
URL: http://www.ImagineNation.com/Pavilion/Vault/Vault.htm
comp.speech refs: [1]
VAULT Speaker Verification FAQ
URL: http://www.ImagineNation.com/Xanadu/Vault/Vault.htm
comp.speech refs: [1]
VAULT Speaker Verification from ImagineNation
URL: http://www.ImagineNation.com/
comp.speech refs: [1]
Verbex demonstration speech recognition software
URL: http://www.verbex.com/demo.htm
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (22 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Verbex Listen for Windows


URL: http://www.verbex.com/lfwspec.htm
comp.speech refs: [1]
Verbex: Listen for Windows speech recognition
URL: http://www.verbex.com/
comp.speech refs: [1] - [2]
Verbex speech recognition ordering page
URL: http://www.verbex.com/basicord.htm
comp.speech refs: [1]
Verbex Verbal Advantage DeskTop speech recognition
URL: http://www.verbex.com/aplncher.htm
comp.speech refs: [1]
Verbex Verbal Advantage Voice Browser speech recognition
URL: http://www.verbex.com/browser.htm
comp.speech refs: [1]
Verbmobil project home page
URL: http://www.dfki.uni-sb.de/verbmobil/
comp.speech refs: [1] - [2]
Visual Voice from Stylus Innovation
URL: http://www.stylus.com/
comp.speech refs: [1]
Visual Voice from Stylus Innovation
URL: http://www.stylus.com/stylus/part.htm
comp.speech refs: [1]
VMB/60: Voice Message Bank
URL: http://www.best.com:80/~vmb60/
comp.speech refs: [1]
Vocal Health information from the National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/hygiene/home.html
comp.speech refs: [1]
Voice Control Systems: Speech recognition
URL: http://www.voicecontrol.com/
comp.speech refs: [1] - [2] - [3]
Voice E-Mail from Bonzi Software
URL: http://www.bonzi.com/
comp.speech refs: [1]
Voice Information Associates, Inc.
URL: http://www.tiac.net/users/asrnews/
comp.speech refs: [1]
Voice Processing Corporation Speech Recognition Product Line
URL: http://www.vpro.com/
comp.speech refs: [1]
Voice Users mailing list
URL: http://voicerecognition.com/voice-users/
comp.speech refs: [1]
VoiceCompanion - RemoteAccess from WildCard
URL: http://www.wildcardtech.com/speech/info/vcremote.htm
comp.speech refs: [1]
VoiceCompanion for the Internet from WildCard
URL: http://www.wildcardtech.com/vcibeta/beta2.htm
comp.speech refs: [1]
Voicetek Corp.
URL: http://www.voicetek.com/
comp.speech refs: [1]
VoiceWare Systems speech recognition resellers
URL: http://www.talk2type.com/home.htm

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (23 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


Votan VPC2100 Voice Card and VSP 1010 Speech Processor
URL: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
comp.speech refs: [1]
WATSON FlexTalk from AT&T Advanced Speech Products Group: WWW Demonstration
URL: http://www.att.com/aspg/demo.html
comp.speech refs: [1]
Wavelet Home Page
URL: http://www.mat.sbg.ac.at/~uhl/wav.html
comp.speech refs: [1]
WebPhone
URL: http://www.netspeak.com/about.html
comp.speech refs: [1]
WebPhone availability
URL: http://www.netspeak.com/getphone.html
comp.speech refs: [1]
Webster's dictionary online
URL: http://c.gp.cs.cmu.edu:5103/prog/webster
comp.speech refs: [1]
Webster's Revised Unabridged Dictionary, 1913
URL: http://humanities.uchicago.edu/forms_urest/webster.form.html
comp.speech refs: [1]
WebTalk from Quarterdeck
URL: http://www.quarterdeck.com/
comp.speech refs: [1]
Wildfire - an Electronic Assistant
URL: http://www.wildfire.com/
comp.speech refs: [1]
WinSpeech text-to-speech application
URL: http://www.pcww.com/index.html
comp.speech refs: [1]
WordNet home page
URL: http://www.cogsci.princeton.edu/~wn/
comp.speech refs: [1]
WordNet: WWW interface
URL: http://www.cogsci.princeton.edu/~wn/w3wn.html
comp.speech refs: [1]
WorkLink Dragon Systems reseller
URL: http://www.worklink.net/
comp.speech refs: [1]
Yahoo page on Signal and Image Processing
URL: http://www.yahoo.com/Science/Engineering/Electrical_Engineering/Signal_and_Image_Processing/
comp.speech refs: [1]
Yahoo page on speech generation/synthesis
URL:
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Generation/
comp.speech refs: [1]
Yahoo page on Speech Recognition
URL: http://www.yahoo.com/business/corporations/computers/software/voice_recognition/
comp.speech refs: [1]
Yahoo page on speech recognition
URL:
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Recognition/
comp.speech refs: [1]
Yamada Language Center Fonts
URL: http://babel.uoregon.edu/yamada/fonts.html

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (24 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


Yamada Language Center IPA fonts
URL: http://babel.uoregon.edu/yamada/fonts/phonetic.html
comp.speech refs: [1]
Yamada Language Center (Phonetic fonts)
URL: http://babel.uoregon.edu/yamada.html
comp.speech refs: [1]
Yamada Language Center windows fonts
URL: http://babel.uoregon.edu/yamada/winfonts.html
comp.speech refs: [1]
ZMD PCMCIA Speech Synthesis Card
URL: http://www.zmd-gmbh.de/assps/u2450app.htm
comp.speech refs: [1]
ZMD "Speaky" Speech Synthesis
URL: http://www.zmd-gmbh.de/assps/u2450.htm
comp.speech refs: [1]
ZMD: Zentrum Mikroelektronik Dresden speech synthesis
URL: http://www.zmd-gmbh.de/
comp.speech refs: [1]

FTP Links
A comprehensive list of American words
URL: ftp://wocket.vantage.gte.com/pub/standard_dictionary
comp.speech refs: [1]
AbbotDemo speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
comp.speech refs: [1]
AF Audio Networking System
URL: ftp://crl.dec.com/pub/DEC/AF
comp.speech refs: [1]
Answers to Frequently Asked Questions about Usenet
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to_Frequently_Asked_Questions_about_Usenet
comp.speech refs: [1] - [2]
Aria Soundcard FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_FAQ_v1.05
comp.speech refs: [1]
Aria Soundcard Support List
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_Support_List_v2.09
comp.speech refs: [1]
Audio file format conversion for G.723, G.721, A-law, u-law and linear
URL: ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
comp.speech refs: [1]
Audio file formats guide by Guido van Rossum
URL: ftp://ftp.cwi.nl/pub/audio/
comp.speech refs: [1]
Auditory Toolbox for Matlab
URL: ftp://ftp.apple.com/pub/malcolm
comp.speech refs: [1]
BEEP pronunciation dictionary
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep-0.7.README
comp.speech refs: [1]
BEEP pronunciation dictionary

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (25 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep.tar.gz
comp.speech refs: [1]
Brill part of speech tagger
URL: ftp://ftp.cs.jhu.edu/pub/brill/
comp.speech refs: [1]
Brill part of speech tagger: data and utilities
URL: ftp://ftp.cs.jhu.edu/pub/brill/Misc/
comp.speech refs: [1]
Brill part of speech tagger: papers and descriptions
URL: ftp://ftp.cs.jhu.edu/pub/brill/Papers/
comp.speech refs: [1]
Brill part of speech tagger: software
URL: ftp://ftp.cs.jhu.edu/pub/brill/Programs/
comp.speech refs: [1]
CELP 3.2a and LPC-10
URL: ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.gz
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.Z
comp.speech refs: [1]
Center for Spoken Language Understanding (CSLU): speech database
URL: ftp://speech.cse.ogi.edu/pub/releases
comp.speech refs: [1]
CMU dictionary
URL: ftp://ftp.cs.cmu.edu/project/fgdata/dict/
comp.speech refs: [1]
comp.ai FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai/
comp.speech refs: [1] - [2]
comp.compression FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.compression/
comp.speech refs: [1] - [2]
comp.dsp FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
comp.speech refs: [1] - [2]
comp.speech FAQ (text version)
URL: ftp://rtfm.mit.edu/pub/usenet/comp.speech/
comp.speech refs: [1] - [2] - [3]
comp.speech FAQ (text version)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete
comp.speech refs: [1] - [2] - [3]
comp.speech ftp site
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/
comp.speech refs: [1] - [2]
comp.speech ftp site: Analysis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/
comp.speech refs: [1] - [2]
comp.speech ftp site: Auditory modelling software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/auditory/
comp.speech refs: [1] - [2]
comp.speech ftp site: dictionaries and lexical resources
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/
comp.speech refs: [1] - [2]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (26 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech ftp site: speech coding software


URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech data
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/data/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech processing tools and software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/
comp.speech refs: [1]
comp.speech ftp site: speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
comp.speech refs: [1] - [2] - [3]
comp.speech ftp site: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/
comp.speech refs: [1] - [2]
comp.speech ftp site: Useful information
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/
comp.speech refs: [1]
comp.speech newsgroup archives
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/
comp.speech refs: [1] - [2]
comp.sys.ibm.pc.soundcard.misc newsgroup FAQs
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/
comp.speech refs: [1]
CyberPhone internet voice communication
URL: ftp://magenta.com/pub/cyberphone
comp.speech refs: [1]
Digital Dreams Speech Recognition Plug-Ins
URL: ftp://ftp.surftalk.com/
comp.speech refs: [1]
Do-it-yourself speech recognition
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognition
comp.speech refs: [1]
EARS speech recognition software
URL: ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
comp.speech refs: [1]
EARS speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.tar.gz
comp.speech refs: [1]
ECTL mailing list archives
URL: ftp://snowhite.cis.uoguelph.ca/pub/ectl
comp.speech refs: [1]
Elan Informatique
URL: ftp://ftp.elan.fr/
comp.speech refs: [1]
Elan Informatique: Proverbe documentation
URL: ftp://ftp.elan.fr/Voice_products/Text-To-Speech_Synthesis_Products/ProVerbe_Speech_Engine/SDKEN.DOC
comp.speech refs: [1]
FAQ: How can I use the Internet as a telephone?
URL: ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_How_can_I_use_the_Internet_as_a_telephone?
comp.speech refs: [1]
FAQs about FAQs
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about_FAQs
comp.speech refs: [1]
Festival Speech Synthesis System: source
URL: ftp://ftp.cstr.ed.ac.uk/pub/festival/1.1.1/

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (27 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


FFT Software
URL: ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip
comp.speech refs: [1]
FFT Software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff.tar.gz
comp.speech refs: [1]
FFT Software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.zip
comp.speech refs: [1]
FFT Software
URL: ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
comp.speech refs: [1]
ftp FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Anonymous_FTP:_Frequently_Asked_Questions_(FAQ)_List
comp.speech refs: [1]
G711, G721, G723 speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G721_G723.tar.Z
comp.speech refs: [1]
G.728 CELP Compression
URL: ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ddj/gsm-107.zip
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.7.tar.gz
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.mv.com/pub/ddj/1994.12/gsm-105.zip
comp.speech refs: [1]
Hadifix speech synthesis demo software
URL: ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
comp.speech refs: [1]
Hadifix speech synthesis demo software
URL: ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
comp.speech refs: [1]
Homophone list
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/homophones-1.01.txt
comp.speech refs: [1]
Human Audio Perception document
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/HumanAudioPerception
comp.speech refs: [1]
Internet Phone from VocalTec
URL: ftp://ftp.vocaltec.com/pub/iphone09.exe
comp.speech refs: [1]
IPA for LaTeX: Washington State University International Phonetic Alphabet fonts
URL: ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/
comp.speech refs: [1]
IPA for LaTeX: Washington State University International Phonetic Alphabet fonts
URL: ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: MSDOS version (UK)

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (28 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkr_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Tool: Sun version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.zip
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spchtool.zip
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: Sun version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spch_sun_v1.tar.gz
comp.speech refs: [1]
John Holdsworth's Auditory Modeller
URL: ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
comp.speech refs: [1]
Khoros signal and image processing environment from Khoral Research Inc.
URL: ftp://ftp.khoral.com/
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.gz
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps
comp.speech refs: [1]
Lists of speech recognition products posted to comp.speech
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognitionProducts
comp.speech refs: [1]
Lotec speech recognition software
URL: ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
comp.speech refs: [1]
Lowel O'Mard's Auditory Modeller
URL: ftp://suna.lut.ac.uk/public/hulpo/lutear
comp.speech refs: [1]
LPC-10 speech coding software
URL: ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (29 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Math Works Inc.: Matlab plus Signal Processing Toolbox


URL: ftp://ftp.mathworks.com
comp.speech refs: [1]
Matlab Sound and Image Toolbox
URL: ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx
comp.speech refs: [1]
Midi files information
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Midi_files_software_archives_on_the_Internet
comp.speech refs: [1]
Mirror of SIMTEL sound directory
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/simtel_sound/
comp.speech refs: [1] - [2]
Mirror of SIMTEL voice directory
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/simtel_voice/
comp.speech refs: [1] - [2]
MixViews Unix sound editor
URL: ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews/
comp.speech refs: [1]
Moby Hyphenator: 185,000 entries fully hyphenated
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mhyph.tar.Z
comp.speech refs: [1]
Moby Language: Word lists in five major languages
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mlang.tar.Z
comp.speech refs: [1]
Moby lexical resources
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
comp.speech refs: [1]
Moby Part-of-Speech: 230,000 entries with part(s) of speech listed in priority order
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mpos.tar.Z
comp.speech refs: [1]
Moby Pronunciator: 175,000 entries fully International Phonetic Alphabet coded
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mpron.tar.Z
comp.speech refs: [1]
Moby Shakespeare: The complete unabridged works of Shakespeare
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mshak.tar.Z
comp.speech refs: [1]
Moby Thesaurus: 30,000 root words, 2.5 million synonyms and related words
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mthes.tar.Z
comp.speech refs: [1]
Moby Words: 610,000+ words and phrases
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mwords.tar.Z
comp.speech refs: [1]
MPEG-1 and MPEG-2 audio software from Universitaet Hannover
URL: ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/
comp.speech refs: [1]
MPEG-1 Audio Layer 1 & 2 decoder and verifier at CCETT
URL: ftp://ftp.ccett.fr/pub/mpeg/audio_new/
comp.speech refs: [1]
MPEG-1 Audio Layer 1 &2 encoder - decoder
URL: ftp://ftp.iuma.com/audio_utils/converters/source/
comp.speech refs: [1]
MPEG-2 Audio encoder and decoder at CCETT
URL: ftp://ftp.ccett.fr/pub/mpeg/mpeg2/
comp.speech refs: [1]
MRC Psycholinguistic Database
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (30 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


MRC Psycholinguistic Database and Oxford Advanced Learner's Dictionary
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info
comp.speech refs: [1]
MRC Psycholinguistic Database: Readme
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.
comp.speech refs: [1]
Myers' Hidden Markov Model software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/hmm-1.03.tar.gz
comp.speech refs: [1]
Myers' Hidden Markov Model software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/hmm.README
comp.speech refs: [1]
Narrator Translator Library
URL: ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha
comp.speech refs: [1]
Narrator Translator Library
URL: ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
comp.speech refs: [1]
Natural Language Processing FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language_Processing_FAQ
comp.speech refs: [1] - [2]
Natural Language Software Registry
URL: ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
comp.speech refs: [1]
Natural Language Software Registry
URL: ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry
comp.speech refs: [1]
Natural Language Software Registry
URL: ftp://ftp.dfki.uni-sb.de/pub/registry
comp.speech refs: [1]
Nautilus: Secure Computer Telephony (access instructions)
URL: ftp://ftp.csn.org/mpj/README
comp.speech refs: [1]
Nautilus: Secure Computer Telephony (access instructions)
URL: ftp://ripem.msu.edu/pub/crypt/README
comp.speech refs: [1]
Network Audio System
URL: ftp://ftp.x.org:/contrib/audio/nas
comp.speech refs: [1]
Neural Networks FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai.neural-nets/
comp.speech refs: [1]
NEVOT audio conferencing tool
URL: ftp://gaia.cs.umass.edu/pub/hgschulz/nevot
comp.speech refs: [1]
NIST SPeech HEader REsources Package (SPHERE)
URL: ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z
comp.speech refs: [1]
NIST SPeech HEader REsources Package (SPHERE)
URL: ftp://jaguar.ncsl.nist.gov/pub/sphere.README
comp.speech refs: [1]
NIST Speech Recognition Scoring Package
URL: ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z
comp.speech refs: [1]
NIST Speech Recognition Scoring Package

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (31 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: ftp://jaguar.ncsl.nist.gov/pub/score.README
comp.speech refs: [1]
Numerical analysis software: including FFT
URL: ftp://usc.edu/pub/C-numanal/
comp.speech refs: [1]
OGI Speech Tools
URL: ftp://speech.cse.ogi.edu/pub/tools/
comp.speech refs: [1]
Oxford Advanced Learner's Dictionary
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/
comp.speech refs: [1]
Oxford Advanced Learner's Dictionary: documentation
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc
comp.speech refs: [1]
PAM: a talking personal assistant and text reader application
URL: ftp://ftp.islandnet.com/jts/pam_en3c.zip
comp.speech refs: [1]
Personal TrueTalk from Entropic
URL: ftp://ftp.entropic.com/pub/truetalk/README.ptt
comp.speech refs: [1]
Phonemic Samples
URL: ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes
comp.speech refs: [1]
Phonemic Samples
URL: ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes
comp.speech refs: [1]
Ptolemy signal processing software
URL: ftp://ptolemy.berkeley.edu/pub/
comp.speech refs: [1]
recnet: recurrent neural network speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/recnet-1.3.tar.Z
comp.speech refs: [1]
rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz
comp.speech refs: [1]
rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.Z
comp.speech refs: [1]
Rules for posting to Usenet
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_posting_to_Usenet
comp.speech refs: [1]
sci.lang FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/sci.lang
comp.speech refs: [1]
ShATR: A Multi-simultaneous-speaker corpus
URL: ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.gz
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.Z
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.zip
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (32 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Signal End-Point Detection software


URL: ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.tar.gz
comp.speech refs: [1]
Silicon Graphics audio Frequently Asked Questions (FAQ)
URL: ftp://viz.tamu.edu/pub/sgi/faq/
comp.speech refs: [1]
Simon Says speech recognition for NeXT
URL: ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio/audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz
comp.speech refs: [1]
Simon Says speech recognition for NeXT
URL: ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio/audio-apps/SimonSaysDemo.1.5.1.README
comp.speech refs: [1]
SIMTEL speech software
URL: ftp://ftp.coast.net/SimTel/msdos/voice/
comp.speech refs: [1]
spchsyn.exe: Speech synthesis
URL: ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
comp.speech refs: [1]
"Speak" - a Text to Speech Program
URL: ftp://wilma.cs.brown.edu/pub/speak.tar.Z
comp.speech refs: [1]
Speech End-point detection software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
comp.speech refs: [1]
Speech File Formats guide by Guido van Rossum
URL: ftp://ftp.cwi.nl/pub/audio/index.html
comp.speech refs: [1]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/
comp.speech refs: [1] - [2]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/README
comp.speech refs: [1]
Speech Manager and PlainTalk
URL: ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk Spanish speech synthesis
URL:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/Mexican_Spanish_TTS.hqx
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk: Speech Recognition
URL:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/English_Speech_Recognition.hqx
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk Speech Synthesis
URL: ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/English_Text-to-
Speech.hqx
comp.speech refs: [1] - [2]
StarAudio Compressor/Player technical documentation
URL: ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts (for Mac)
URL: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts (for Windows)
URL: ftp://ftp.sil.org/fonts/win/silip12a.exe

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (33 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

comp.speech refs: [1]


TCPPlay: a Mac-based audio server
URL: ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx
comp.speech refs: [1]
TCPPlay: Macintosh audio server
URL: ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
comp.speech refs: [1]
Text to phoneme program
URL: ftp://shark.cse.fau.edu/pub/src/phon.tar.Z
comp.speech refs: [1]
Text to phoneme program
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phoneme.tar.gz
comp.speech refs: [1]
Text to phoneme software
URL: ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz
comp.speech refs: [1]
The Big Mouth: NeXT speech synthesizer
URL: ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z
comp.speech refs: [1]
Tinytalk shareware screen reader
URL: ftp://ftp.netcom.com/pub/eb/ebohlman/
comp.speech refs: [1]
TIPA: LaTeX IPA font
URL: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
comp.speech refs: [1]
TIPA: LaTeX IPA font manual
URL: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
comp.speech refs: [1]
Turtle Beach sound cards FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Turtle_Beach_sound_cards_FAQ
comp.speech refs: [1]
Voice Problems, Prevention and Correction
URL: ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-problems
comp.speech refs: [1]
Voice Recognition Processors document
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProcessors
comp.speech refs: [1] - [2] - [3]
Voicemaker speech synthesis
URL: ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip
comp.speech refs: [1]
What is Usenet?
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Usenet?
comp.speech refs: [1]
WordNet: home page
URL: ftp://clarity.princeton.edu/pub/wordnet/
comp.speech refs: [1]
WordNet: README
URL: ftp://clarity.princeton.edu/pub/wordnet/README
comp.speech refs: [1]
WordNet: Technical Papers
URL: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps
comp.speech refs: [1]
WreadFiles: File reader for Commodore Amiga
URL: ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha
comp.speech refs: [1]
Yamada Language Center phonetic fonts

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (34 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

URL: ftp://yftp@www-vms.uoregon.edu/fonts
Refs: [1]

Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (35 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks

Soundcard discussion group


URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
comp.speech refs: [1]
Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 14:18 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/SpeechLinks.html (36 of 36) [10/31/2003 8:41:40 AM]


SpeechLinks: General

SpeechLinks - General
Speech Technology Hyperlinks Page
A list of hyperlinks from the comp.speech FAQ related to general speech technology matters. Links are provided to
WWW references, ftp sites, and newsgroups. Cross-references to the comp.speech WWW pages are also provided.

SpeechLinks Pages

SpeechLinks: The Complete List: 500+ speech technology links


SpeechLinks: Signal Processing for Speech
SpeechLinks: Speech Coding
SpeechLinks: Speech Synthesis
SpeechLinks: Speech Recognition

comp.speech WWW Availability

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

WWW Links
Academic Press Limited: Computer Speech and Language Journal
URL: http://www.apnet.com/
comp.speech refs: [1]
AF audio networking software
URL: http://www.research.digital.com/CRL/projects/AF/home.html
comp.speech refs: [1]
American National Standards Institute (ANSI)
URL: http://www.ansi.org/
comp.speech refs: [1]
American Voice Input/Output Society (AVIOS) home page
URL: http://www.avios.com/
comp.speech refs: [1]
Association for Computational Linguistics (ACL) home page
URL: http://www.cs.columbia.edu:80/~acl/
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (1 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

ASSTA: Australian Speech Science and Technology home page


URL: http://cslab.anu.edu.au/~bruce/assta/
comp.speech refs: [1]
ASSTA: List of Members
URL: http://ciips.ee.uwa.edu.au/~roberto/assta-users/
comp.speech refs: [1]
AT&T Advanced Speech Products Group home page
URL: http://www.att.com/aspg/
comp.speech refs: [1] - [2] - [3]
Auditory Modeling information in Malcolm Slaney's home page
URL: http://www.interval.com/~malcolm/pubs.html
comp.speech refs: [1]
AVAAZ Home Page
URL: http://www.icis.on.ca/homepages/avaaz/
comp.speech refs: [1] - [2]
Bavarian Archive for Speech Signals
URL: http://www.phonetik.uni-muenchen.de/BASSeng.html
comp.speech refs: [1]
Center for Spoken Language Understanding (CSLU) at the Oregon Graduate Institute of Science and Technology
URL: http://www.cse.ogi.edu/CSLU/
comp.speech refs: [1] - [2]
Centre for Cognitive Science at the University of Edinburgh
URL: http://www.cogsci.ed.ac.uk/ccs/home.html
comp.speech refs: [1]
CMU dictionary on the WWW
URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
comp.speech refs: [1] - [2]
COCOSDA Home page
URL: http://www.itl.atr.co.jp/cocosda/
comp.speech refs: [1]
Cognitive Science Laboratory at Princeton University
URL: http://www.cogsci.princeton.edu/
comp.speech refs: [1]
Colibri mailing list
URL: http://colibri.let.ruu.nl/
comp.speech refs: [1]
comp.dsp newsgroup FAQ
URL: http://www.bdti.com/faq/dsp_faq.htm
comp.speech refs: [1] - [2] - [3]
Comprehensive list of WWW dictionaries, acronym lists, translation resources, and a Thesaurus.
URL: http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-Information/Dictionaries-etc.html
comp.speech refs: [1]
Computational Linguistics journal home page
URL: http://www-mitpress.mit.edu/jrnls-catalog/comp-ling.html
comp.speech refs: [1]
Computational Phonology: special issue of Computational Linguistics
URL: http://mitpress.mit.edu/jrnls-catalog/comp-ling-abstracts/comp-ling20-3.html
comp.speech refs: [1]
Computing and Information Systems Department (CISD) of Rutherford Appleton Laboratory, UK
URL: http://www.cis.rl.ac.uk/index.html

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (2 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

comp.speech refs: [1]


CUSeeMe Audio and Video conferencing software
URL: http://cu-seeme.cornell.edu/
comp.speech refs: [1]
CUSeeMe Audio and Video conferencing software
URL: http://cu-seeme.cornell.edu/get_cuseeme.html
comp.speech refs: [1]
CUSeeMe Audio and Video conferencing software
URL: http://cu-seeme.cornell.edu/PC.CU-SeeMeCurrent.html
comp.speech refs: [1]
CyberPhone home page
URL: http://magenta.com/cyberphone/
comp.speech refs: [1]
DADiSP from DSP Development Corporation
URL: http://www.dadisp.com/
comp.speech refs: [1]
DADiSP from DSP Development Corporation: application to speech processing
URL: http://www.dadisp.com/ab2.htm
comp.speech refs: [1]
DADiSP from DSP Development Corporation: detailed information
URL: http://www.dadisp.com/contact.htm
comp.speech refs: [1]
DADiSP from DSP Development Corporation: free demo software
URL: http://www.dadisp.com/download.htm
comp.speech refs: [1]
DADiSP from DSP Development Corporation: free student edition
URL: http://www.dadisp.com/studntdl.htm
comp.speech refs: [1]
Dept. of Psychology, University of Western Australia
URL: http://www.psy.uwa.edu.au/
comp.speech refs: [1]
DigiPhone availability
URL: http://www.planeteers.com/retail.htm
comp.speech refs: [1]
DigiPhone Deluxe
URL: http://www.planeteers.com/digiphon/dpsr.htm
comp.speech refs: [1]
DigiPhone for Mac
URL: http://www.planeteers.com/digifone/digimac.htm
comp.speech refs: [1]
DigiPhone for Third Planet Publishing
URL: http://www.planeteers.com/
comp.speech refs: [1]
DigiPhone Global Directory of users
URL: http://www.planeteers.com/digiphon/global.htm
comp.speech refs: [1]
DigiPhone trial download page
URL: http://www.planeteers.com/download/download.htm
comp.speech refs: [1]
DigiPhone v1.03

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (3 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

URL: http://www.planeteers.com/digiphon/dpjr.htm
comp.speech refs: [1]
Digital Signal Processing (DSP) group at Rice University
URL: http://www-dsp.rice.edu/
comp.speech refs: [1]
Duncan M. Forrest's Speech Recognition Resource List
URL: http://www.skye.co.za/dmf/speech/
comp.speech refs: [1]
Dynastat, Inc: Speech Intelligibility and Quality Testing
URL: http://www.bga.com/dynastat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus
URL: http://www.cis.rl.ac.uk/proj/psych/eat/eat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus: WWW Interactive version
URL: http://www.cis.rl.ac.uk/proj/psych/eat.html
comp.speech refs: [1]
Elsevier Science: Speech Communication journal
URL: http://www.elsevier.com/
comp.speech refs: [1]
Entropic Research Laboratory home page
URL: http://www.entropic.com/
comp.speech refs: [1] - [2] - [3]
Entropic Signal Processing System (ESPS)
URL: http://www.entropic.com/esps.html
comp.speech refs: [1]
ESCA: European Speech Communication Association list of research sites
URL: http://ophale.icp.grenet.fr/esca/labos.html
comp.speech refs: [1]
European Language Resources Association
URL: http://www.icp.grenet.fr/ELRA/home.html
comp.speech refs: [1]
European Speech Communication Association (ESCA) home page
URL: http://ophale.icp.grenet.fr/esca/esca.html
comp.speech refs: [1]
FAQ: How can I use the Internet as a telephone?
URL: http://rpcp.mit.edu/~asears/voice-faq.html
comp.speech refs: [1]
Free Speech Journal
URL: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html
comp.speech refs: [1]
George L. Dillon's Consonant sounds of English
URL: http://weber.u.washington.edu/~dillon/consonants.html
comp.speech refs: [1]
George L. Dillon's list of phonetic resources
URL: http://weber.u.washington.edu/~dillon/PhonResources.html
comp.speech refs: [1]
George L. Dillon's Vowel Quadrilaterals for American and British English
URL: http://weber.u.washington.edu/~dillon/newstart.html
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (4 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

George L. Dillon's Vowel sounds of American English


URL: http://weber.u.washington.edu/~dillon/vowels.html
comp.speech refs: [1]
GoldWave digital audio editor for Microsoft Windows
URL: http://web.cs.mun.ca/~chris3/goldwave/goldwave.html
comp.speech refs: [1]
ICSLP '98: Sydney, Australia
URL: http://cslab.anu.edu.au/icslp98/
comp.speech refs: [1]
IEEE Home Page
URL: http://www.ieee.org/
comp.speech refs: [1]
IEEE Signal Processing Society
URL: http://www.ieee.org/sp/index.html
comp.speech refs: [1]
Institute for Language Speech and Hearing, the University of Sheffield
URL: http://www.dcs.shef.ac.uk/research/ilash/
comp.speech refs: [1]
Institute for Perception Research: Speech on the Web
URL: http://www.tue.nl/ipo/hearing/webspeak.htm
comp.speech refs: [1]
InterFACE from Hijinx: Internet phone software
URL: http://www.hijinx.com.au/
comp.speech refs: [1]
International Phonetic Alphabet cassette of sounds
URL: http://www.phon.ucl.ac.uk/home/wells/cassette.htm
comp.speech refs: [1]
International Phonetic Alphabet (IPA)
URL: http://www.arts.gla.ac.uk/IPA/ipachart.html
comp.speech refs: [1] - [2] - [3]
International Phonetic Alphabet (IPA) chart of symbols
URL: http://www.arts.gla.ac.uk/IPA/fullchart.html
comp.speech refs: [1]
International Phonetic Association
URL: http://www.arts.gla.ac.uk/IPA/ipa.html
comp.speech refs: [1]
Internet Phone from VocalTec
URL: http://www.vocaltec.com/
comp.speech refs: [1]
Internet Phone from VocalTec ordering information
URL: http://www.vocaltec.com/order.html
comp.speech refs: [1]
Intl. Phonetic Alphabet transcriptions in ASCII
URL: http://weber.u.washington.edu/~dillon/ipaascii.html
comp.speech refs: [1]
Introduction to Computational Phonology by Steven Bird
URL: http://www.cogsci.ed.ac.uk/phonology/comp-phon-intro.ps.Z
comp.speech refs: [1]
Journal of the Acoustical Society of America (JASA)
URL: http://asa.aip.org/jasa.html

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (5 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

comp.speech refs: [1]


Kay Elemetrics home page
URL: http://www.kayelemetrics.com/
comp.speech refs: [1]
Khoros signal and image processing environment from Khoral Research Inc.
URL: http://www.khoral.com/
comp.speech refs: [1]
Learning Company's Language Training
URL: http://www.learningco.Inter.net/foreign.html
comp.speech refs: [1]
Linguistic associations and linguistic WWW links
URL: http://engserve.tamu.edu/files/linguistics/linguist/associations.html
comp.speech refs: [1]
Linguistic Data Consortium home page
URL: http://www.ldc.upenn.edu/
comp.speech refs: [1] - [2]
Linguistics Abstracts Online
URL: http://www.blackwellpublishers.co.uk/labs/
comp.speech refs: [1]
List of online dictionaries: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Dictionaries
comp.speech refs: [1]
List of speech conferences and meetings: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Meetings
comp.speech refs: [1]
List of speech research sites: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Phonetics
comp.speech refs: [1]
Louis Pols's List of References on Synthesis Development And Assessment
URL: http://www.itl.atr.co.jp/cocosda/output/synth.refs
comp.speech refs: [1]
Malcolm Crawford's home page
URL: http://www.dcs.shef.ac.uk/~malc/
comp.speech refs: [1]
Malcolm Slaney's home page
URL: http://www.interval.com/~malcolm/
comp.speech refs: [1]
Man-Machine Interfacing
URL: http://www.speechrec.com/
comp.speech refs: [1]
Martin Cooke's home page: auditory modelling and speech recognition in noise
URL: http://www.dcs.shef.ac.uk/~martin/
comp.speech refs: [1]
Martin Ramsch's Englisch-Worterbucher aller Art
URL: http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/englisch.html
comp.speech refs: [1]
Matlab plus Signal Processing Toolbox
URL: http://www.mathworks.com/
comp.speech refs: [1]
Meetings of the Acoustical Society of America (ASA)

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (6 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

URL: http://asa.aip.org/meetings.html
comp.speech refs: [1]
MicMac Recording Software for Macs
URL: http://moof.com/nirvana/
comp.speech refs: [1]
Microsoft Speech API
URL: http://www.microsoft.com/MEDIADEV/AUDIO/MSPEECH1.HTM
comp.speech refs: [1]
Microsoft Speech API: An Overview
URL: http://www.microsoft.com/mediadev/audio/mspover.htm
comp.speech refs: [1]
Microsoft Speech SDK
URL: http://www.research.microsoft.com/research/srg/install.htm
comp.speech refs: [1] - [2]
Microsoft Telephony API
URL: http://www.microsoft.com/ntserver/communications/tapi.htm
comp.speech refs: [1]
Microsoft Telephony API White Paper
URL: http://www.microsoft.com/ntserver/communications/tapi_wp.htm
comp.speech refs: [1]
Mike Noel's home page (CLSU)
URL: http://www.cse.ogi.edu/~noel/
comp.speech refs: [1]
Moby lexical resources
URL: http://www.dcs.shef.ac.uk/research/ilash/Moby/
comp.speech refs: [1]
MRC Psycholinguistic Database: WWW Interface
URL: http://www.psy.uwa.edu.au/uwa_mrc.htm
comp.speech refs: [1]
National Institute of Standards and Technology (NIST).
URL: http://www.nist.gov/
comp.speech refs: [1]
Nautilus home page: Secure Computer Telephony
URL: http://www.lila.com/nautilus/
comp.speech refs: [1]
NetSpeak home page
URL: http://www.netspeak.com/
comp.speech refs: [1]
NOISEX-92 database
URL: http://spib.rice.edu/spib/select_noise.html
comp.speech refs: [1]
N!Power
URL: http://www.silcom.com/~stilarry/
comp.speech refs: [1]
Peter Meijer's home page
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/
comp.speech refs: [1]
Peter Meijer's "the vOICe" Java applet/application
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/javoice.htm
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (7 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

PGPfone secure audio networking


URL: http://web.mit.edu/network/pgpfone/
comp.speech refs: [1]
Project Gutenberg
URL: http://www.prairienet.org/pg/
comp.speech refs: [1]
Quadravox Speech Processing Products - Qbox
URL: http://www.quadravox.com/
comp.speech refs: [1]
RELATOR project: linguistic resources
URL: http://cristal.icp.grenet.fr/Relator/homepage.html
comp.speech refs: [1] - [2]
Russ Wilcox's list of Commercial Speech Recognition
URL: http://www.tiac.net/users/rwilcox/speech.html
comp.speech refs: [1] - [2]
ShATR: A Multi-simultaneous-speaker corpus
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.html
comp.speech refs: [1]
Shikano's WWW site on Speech and Acoustics
URL: http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html
comp.speech refs: [1] - [2] - [3]
Signalyze 3.0 from InfoSignal
URL: http://www.agoralang.com:2410/pubdirsoftware.html
comp.speech refs: [1]
Signalyze 3.0 from InfoSignal
URL: http://www.agoralang.com:2410/signalyze.html
comp.speech refs: [1]
SIGPHON Computational Phonology home page
URL: http://www.cogsci.ed.ac.uk/sigphon/
comp.speech refs: [1]
Some things about studying Speech
URL: http://www.ccp.uchicago.edu/grad/Francis_Alex/speech.html
comp.speech refs: [1]
Sound conversion software
URL: http://peace.wit.com/sounds/SoundConversion/
comp.speech refs: [1]
Sound Processing Kit
URL: http://www.music.helsinki.fi/research/spkit/documentation/SPKit.html
comp.speech refs: [1]
Sound Processing Kit software
URL: http://www.music.helsinki.fi/research/spkit/distribution/spkit.tar.Z
comp.speech refs: [1]
Speak Freely audio networking software
URL: http://www.fourmilab.ch/netfone/windows/speak_freely.html
comp.speech refs: [1]
Speech and Hearing Research Group, University of Sheffield
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/
comp.speech refs: [1]
Speech and Hearing Research Group, University of Sheffield: Links to Speech Sites
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclinks.html

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (8 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

comp.speech refs: [1]


Speech and Language Technology Club
URL: http://salt.essex.ac.uk/salt/
comp.speech refs: [1]
Speech Communications journal home page
URL: http://www.elsevier.nl:80/eee/specom/contents.html
comp.speech refs: [1]
Speech Groups List from Leeds University Cognitive Psychology Research Group
URL: http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html
comp.speech refs: [1]
Speech Research List
URL: http://mambo.ucsc.edu/psl/speech.html
comp.speech refs: [1]
Speech Technology Research Ltd.
URL: http://www.speechtech.com/home/speechtech/
comp.speech refs: [1] - [2]
SpeechViewer II: speech therapy tool
URL: http://www.austin.ibm.com/pspinfo/snsspv2.html
comp.speech refs: [1]
SRAPI: Speech Recognition API home page
URL: http://www.srapi.com/
comp.speech refs: [1]
Stoneridge Technical Services: VoiceNews newsletter
URL: http://www.stoneridgetech.com/
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts
URL: http://www.sil.org/
comp.speech refs: [1]
The Acoustical Society of America (ASA) home page
URL: http://asa.aip.org/
comp.speech refs: [1]
The University of Sheffield
URL: http://www.shef.ac.uk/
comp.speech refs: [1]
TMA Associates: speech technology consulting
URL: http://www.tmaa.com/
comp.speech refs: [1]
Tony Robinson's home page
URL: http://svr-www.eng.cam.ac.uk/~ajr/
comp.speech refs: [1] - [2] - [3] - [4] - [5]
University of Edinburgh
URL: http://www.ed.ac.uk/
comp.speech refs: [1]
University of Victoria Phonetic Database
URL: http://www.speechtech.com/home/speechtech/csl3.html
comp.speech refs: [1]
Verbmobil project home page
URL: http://www.dfki.uni-sb.de/verbmobil/
comp.speech refs: [1] - [2]
Voice E-Mail from Bonzi Software

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (9 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

URL: http://www.bonzi.com/
comp.speech refs: [1]
Voice Information Associates, Inc.
URL: http://www.tiac.net/users/asrnews/
comp.speech refs: [1]
Voice Users mailing list
URL: http://voicerecognition.com/voice-users/
comp.speech refs: [1]
WebPhone
URL: http://www.netspeak.com/about.html
comp.speech refs: [1]
WebPhone availability
URL: http://www.netspeak.com/getphone.html
comp.speech refs: [1]
Webster's dictionary online
URL: http://c.gp.cs.cmu.edu:5103/prog/webster
comp.speech refs: [1]
Webster's Revised Unabridged Dictionary, 1913
URL: http://humanities.uchicago.edu/forms_urest/webster.form.html
comp.speech refs: [1]
WebTalk from Quarterdeck
URL: http://www.quarterdeck.com/
comp.speech refs: [1]
Wildfire - an Electronic Assistant
URL: http://www.wildfire.com/
comp.speech refs: [1]
WordNet home page
URL: http://www.cogsci.princeton.edu/~wn/
comp.speech refs: [1]
WordNet: WWW interface
URL: http://www.cogsci.princeton.edu/~wn/w3wn.html
comp.speech refs: [1]
Yamada Language Center Fonts
URL: http://babel.uoregon.edu/yamada/fonts.html
comp.speech refs: [1]
Yamada Language Center IPA fonts
URL: http://babel.uoregon.edu/yamada/fonts/phonetic.html
comp.speech refs: [1]
Yamada Language Center (Phonetic fonts)
URL: http://babel.uoregon.edu/yamada.html
comp.speech refs: [1]
Yamada Language Center windows fonts
URL: http://babel.uoregon.edu/yamada/winfonts.html
comp.speech refs: [1]

FTP Links

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (10 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

A comprehensive list of American words


URL: ftp://wocket.vantage.gte.com/pub/standard_dictionary
comp.speech refs: [1]
AF Audio Networking System
URL: ftp://crl.dec.com/pub/DEC/AF
comp.speech refs: [1]
Answers to Frequently Asked Questions about Usenet
URL:
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to_Frequently_Asked_Questions_about_Usenet
comp.speech refs: [1] - [2]
Audio file formats guide by Guido van Rossum
URL: ftp://ftp.cwi.nl/pub/audio/
comp.speech refs: [1]
Auditory Toolbox for Matlab
URL: ftp://ftp.apple.com/pub/malcolm
comp.speech refs: [1]
BEEP pronunciation dictionary
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep-0.7.README
comp.speech refs: [1]
BEEP pronunciation dictionary
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep.tar.gz
comp.speech refs: [1]
Center for Spoken Language Understanding (CSLU): speech database
URL: ftp://speech.cse.ogi.edu/pub/releases
comp.speech refs: [1]
CMU dictionary
URL: ftp://ftp.cs.cmu.edu/project/fgdata/dict/
comp.speech refs: [1]
comp.ai FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai/
comp.speech refs: [1] - [2]
comp.compression FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.compression/
comp.speech refs: [1] - [2]
comp.dsp FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
comp.speech refs: [1] - [2]
comp.speech ftp site
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/
comp.speech refs: [1] - [2]
comp.speech ftp site: Analysis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/
comp.speech refs: [1] - [2]
comp.speech ftp site: Auditory modelling software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/auditory/
comp.speech refs: [1] - [2]
comp.speech ftp site: dictionaries and lexical resources
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech coding software

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (11 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech data
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/data/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech processing tools and software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/
comp.speech refs: [1]
comp.speech ftp site: speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
comp.speech refs: [1] - [2] - [3]
comp.speech ftp site: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/
comp.speech refs: [1] - [2]
comp.speech ftp site: Useful information
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/
comp.speech refs: [1]
comp.speech newsgroup archives
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/
comp.speech refs: [1] - [2]
CyberPhone internet voice communication
URL: ftp://magenta.com/pub/cyberphone
comp.speech refs: [1]
ECTL mailing list archives
URL: ftp://snowhite.cis.uoguelph.ca/pub/ectl
comp.speech refs: [1]
FAQ: How can I use the Internet as a telephone?
URL: ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_How_can_I_use_the_Internet_as_a_telephone?
comp.speech refs: [1]
FAQs about FAQs
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about_FAQs
comp.speech refs: [1]
Homophone list
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/homophones-1.01.txt
comp.speech refs: [1]
Human Audio Perception document
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/HumanAudioPerception
comp.speech refs: [1]
Internet Phone from VocalTec
URL: ftp://ftp.vocaltec.com/pub/iphone09.exe
comp.speech refs: [1]
IPA for LaTeX: Washington State University International Phonetic Alphabet fonts
URL: ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/
comp.speech refs: [1]
IPA for LaTeX: Washington State University International Phonetic Alphabet fonts
URL: ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/
comp.speech refs: [1]
John Holdsworth's Auditory Modeller
URL: ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (12 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

Khoros signal and image processing environment from Khoral Research Inc.
URL: ftp://ftp.khoral.com/
comp.speech refs: [1]
Lowel O'Mard's Auditory Modeller
URL: ftp://suna.lut.ac.uk/public/hulpo/lutear
comp.speech refs: [1]
Math Works Inc.: Matlab plus Signal Processing Toolbox
URL: ftp://ftp.mathworks.com
comp.speech refs: [1]
Mirror of SIMTEL sound directory
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/simtel_sound/
comp.speech refs: [1] - [2]
Mirror of SIMTEL voice directory
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/simtel_voice/
comp.speech refs: [1] - [2]
MixViews Unix sound editor
URL: ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews/
comp.speech refs: [1]
Moby Hyphenator: 185,000 entries fully hyphenated
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mhyph.tar.Z
comp.speech refs: [1]
Moby Language: Word lists in five major languages
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mlang.tar.Z
comp.speech refs: [1]
Moby lexical resources
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
comp.speech refs: [1]
Moby Part-of-Speech: 230,000 entries with part(s) of speech listed in priority order
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mpos.tar.Z
comp.speech refs: [1]
Moby Pronunciator: 175,000 entries fully International Phonetic Alphabet coded
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mpron.tar.Z
comp.speech refs: [1]
Moby Shakespeare: The complete unabridged works of Shakespeare
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mshak.tar.Z
comp.speech refs: [1]
Moby Thesaurus: 30,000 root words, 2.5 million synonyms and related words
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mthes.tar.Z
comp.speech refs: [1]
Moby Words: 610,000+ words and phrases
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mwords.tar.Z
comp.speech refs: [1]
MRC Psycholinguistic Database
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/
comp.speech refs: [1]
MRC Psycholinguistic Database and Oxford Advanced Learner's Dictionary
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info
comp.speech refs: [1]
MRC Psycholinguistic Database: Readme
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (13 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

comp.speech refs: [1]


Natural Language Processing FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language_Processing_FAQ
comp.speech refs: [1] - [2]
Nautilus: Secure Computer Telephony (access instructions)
URL: ftp://ftp.csn.org/mpj/README
comp.speech refs: [1]
Nautilus: Secure Computer Telephony (access instructions)
URL: ftp://ripem.msu.edu/pub/crypt/README
comp.speech refs: [1]
Network Audio System
URL: ftp://ftp.x.org:/contrib/audio/nas
comp.speech refs: [1]
Neural Networks FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai.neural-nets/
comp.speech refs: [1]
NEVOT audio conferencing tool
URL: ftp://gaia.cs.umass.edu/pub/hgschulz/nevot
comp.speech refs: [1]
NIST SPeech HEader REsources Package (SPHERE)
URL: ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z
comp.speech refs: [1]
NIST SPeech HEader REsources Package (SPHERE)
URL: ftp://jaguar.ncsl.nist.gov/pub/sphere.README
comp.speech refs: [1]
NIST Speech Recognition Scoring Package
URL: ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z
comp.speech refs: [1]
NIST Speech Recognition Scoring Package
URL: ftp://jaguar.ncsl.nist.gov/pub/score.README
comp.speech refs: [1]
OGI Speech Tools
URL: ftp://speech.cse.ogi.edu/pub/tools/
comp.speech refs: [1]
Oxford Advanced Learner's Dictionary
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/
comp.speech refs: [1]
Oxford Advanced Learner's Dictionary: documentation
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc
comp.speech refs: [1]
Phonemic Samples
URL: ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes
comp.speech refs: [1]
Phonemic Samples
URL: ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes
comp.speech refs: [1]
Ptolemy signal processing software
URL: ftp://ptolemy.berkeley.edu/pub/
comp.speech refs: [1]
Rules for posting to Usenet

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (14 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_posting_to_Usenet
comp.speech refs: [1]
sci.lang FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/sci.lang
comp.speech refs: [1]
ShATR: A Multi-simultaneous-speaker corpus
URL: ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
comp.speech refs: [1]
Speech File Formats guide by Guido van Rossum
URL: ftp://ftp.cwi.nl/pub/audio/index.html
comp.speech refs: [1]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/
comp.speech refs: [1] - [2]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/README
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts (for Mac)
URL: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts (for Windows)
URL: ftp://ftp.sil.org/fonts/win/silip12a.exe
comp.speech refs: [1]
TCPPlay: a Mac-based audio server
URL: ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx
comp.speech refs: [1]
TCPPlay: Macintosh audio server
URL: ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
comp.speech refs: [1]
TIPA: LaTeX IPA font
URL: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
comp.speech refs: [1]
TIPA: LaTeX IPA font manual
URL: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
comp.speech refs: [1]
What is Usenet?
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Usenet?
comp.speech refs: [1]
WordNet: home page
URL: ftp://clarity.princeton.edu/pub/wordnet/
comp.speech refs: [1]
WordNet: README
URL: ftp://clarity.princeton.edu/pub/wordnet/README
comp.speech refs: [1]
WordNet: Technical Papers
URL: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps
comp.speech refs: [1]
Yamada Language Center phonetic fonts
URL: ftp://yftp@www-vms.uoregon.edu/fonts
Refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (15 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (16 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: General

Soundcard discussion group


URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
comp.speech refs: [1]
Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 18:41 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/speechlinks.html (17 of 17) [10/31/2003 8:41:44 AM]


SpeechLinks: Signal Processing for Speech

SpeechLinks - Signal Processing for Speech


Speech Technology Hyperlinks Page
A list of hyperlinks from the comp.speech FAQ related to signal processing for speech. Links are provided to WWW
references, ftp sites, and newsgroups. Cross-references to the comp.speech WWW pages are also provided.

SpeechLinks Pages

SpeechLinks: The Complete List: 500+ speech technology links


SpeechLinks: General Speech Technology
SpeechLinks: Speech Coding
SpeechLinks: Speech Synthesis
SpeechLinks: Speech Recognition

comp.speech WWW Availability

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

WWW Links
comp.dsp newsgroup FAQ
URL: http://www.bdti.com/faq/dsp_faq.htm
comp.speech refs: [1] - [2] - [3]
Comprehensive list of FFT software
URL: http://tjev.tel.etf.hr/josip/DSP/fft.html
comp.speech refs: [1]
CRC Press: Scientific and Technical Publisher
URL: http://www.crcpress.com/
comp.speech refs: [1]
Digital Signal Processing Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1995/ee_4773/
comp.speech refs: [1]
Digital Signal Processing course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1995/ee_4773/SYLLABUS.ps

http://mi.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html (1 of 6) [10/31/2003 8:41:45 AM]


SpeechLinks: Signal Processing for Speech

comp.speech refs: [1]


Eg3 Communications: DSP Internet Resources
URL: http://www.eg3.com/dsp.htm
comp.speech refs: [1]
Eg3 Communications: Engineering Information Online
URL: http://www.eg3.com/
comp.speech refs: [1]
FFTW software
URL: http://theory.lcs.mit.edu/~fftw
comp.speech refs: [1]
Fundamentals of Speech Recognition Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course lecture notes by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/lectures/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/SYLLABUS.ps
comp.speech refs: [1] - [2]
Institute for Signal and Information Processing (ISIP) at Mississippi State University
URL: http://www.isip.msstate.edu/
comp.speech refs: [1] - [2] - [3] - [4]
List of Links Relating to Sound Computation
URL: http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html
comp.speech refs: [1]
Mississippi State University
URL: http://www.msstate.edu/
comp.speech refs: [1] - [2]
Poynton's Digital Signal Processing Resource List
URL: http://www.inforamp.net/~poynton/Poynton-dsp.html
comp.speech refs: [1]
Signal Processing Home page
URL: http://tjev.tel.etf.hr/josip/DSP/sigproc.html
comp.speech refs: [1]
Silicon Graphics audio Frequently Asked Questions (FAQ)
URL: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
comp.speech refs: [1]
SimTel programs for sound and soundcards
URL: http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
comp.speech refs: [1] - [2]
Sound Related Resources
URL: http://pscinfo.psc.edu/~geigel/menus/sound.html
comp.speech refs: [1]
Soundcard WWW Site
URL: http://www.wi.leidenuniv.nl/audio/
comp.speech refs: [1]
SPLIB: Signal Processing url LIBrary
URL: http://jazz.rice.edu/splib/
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html (2 of 6) [10/31/2003 8:41:45 AM]


SpeechLinks: Signal Processing for Speech

Tony Robinson's speech analysis course


URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/
comp.speech refs: [1] - [2]
Tony Robinson's speech analysis course: Filter bank analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node17.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Formant analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node61.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Linear prediction analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node38.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Sampling theory
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node9.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Short-term fourier analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node21.html
comp.speech refs: [1]
Tony Robinson's speech analysis course: Speech coding
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
comp.speech refs: [1] - [2]
Tony Robinson's speech analysis course: Voicing analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node68.html
comp.speech refs: [1]
Wavelet Home Page
URL: http://www.mat.sbg.ac.at/~uhl/wav.html
comp.speech refs: [1]
Yahoo page on Signal and Image Processing
URL: http://www.yahoo.com/Science/Engineering/Electrical_Engineering/Signal_and_Image_Processing/
comp.speech refs: [1]

FTP Links
Aria Soundcard FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_FAQ_v1.05
comp.speech refs: [1]
Aria Soundcard Support List
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_Support_List_v2.09
comp.speech refs: [1]
comp.dsp FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
comp.speech refs: [1] - [2]
comp.sys.ibm.pc.soundcard.misc newsgroup FAQs
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/
comp.speech refs: [1]
FFT Software

http://mi.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html (3 of 6) [10/31/2003 8:41:45 AM]


SpeechLinks: Signal Processing for Speech

URL: ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip
comp.speech refs: [1]
FFT Software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff.tar.gz
comp.speech refs: [1]
FFT Software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.zip
comp.speech refs: [1]
FFT Software
URL: ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
comp.speech refs: [1]
Matlab Sound and Image Toolbox
URL: ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx
comp.speech refs: [1]
Midi files information
URL:
ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Midi_files_software_archives_on_the_Internet
comp.speech refs: [1]
Numerical analysis software: including FFT
URL: ftp://usc.edu/pub/C-numanal/
comp.speech refs: [1]
Signal End-Point Detection software
URL: ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.tar.gz
comp.speech refs: [1]
Silicon Graphics audio Frequently Asked Questions (FAQ)
URL: ftp://viz.tamu.edu/pub/sgi/faq/
comp.speech refs: [1]
Speech End-point detection software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
comp.speech refs: [1]
Turtle Beach sound cards FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Turtle_Beach_sound_cards_FAQ
comp.speech refs: [1]

Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups

http://mi.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html (4 of 6) [10/31/2003 8:41:45 AM]


SpeechLinks: Signal Processing for Speech

URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music

http://mi.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html (5 of 6) [10/31/2003 8:41:45 AM]


SpeechLinks: Signal Processing for Speech

comp.speech refs: [1]


Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to [Q2.1], [Q2.2], [Q2.3], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 18:41 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section2/speechlinks.html (6 of 6) [10/31/2003 8:41:45 AM]


SpeechLinks: Speech Coding

SpeechLinks - Speech Coding


Speech Technology Hyperlinks Page
A list of hyperlinks from the comp.speech FAQ related to speech coding. Links are provided to
WWW references, ftp sites, and newsgroups. Cross-references to the comp.speech WWW pages are
also provided.

SpeechLinks Pages

SpeechLinks: The Complete List: 500+ speech technology links


SpeechLinks: General Speech Technology
SpeechLinks: Signal Processing for Speech
SpeechLinks: Speech Synthesis
SpeechLinks: Speech Recognition

comp.speech WWW Availability

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

WWW Links
32 kbps ADPCM
URL: http://www.cwi.nl/ftp/audio/adpcm.shar
comp.speech refs: [1]
ACELP Codecs from Sipro Lab Telecom Inc.
URL: http://www.sipro.com/acelp.html
comp.speech refs: [1]
Audio and Music Applications for Silicon Graphics Systems

http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (1 of 7) [10/31/2003 8:41:47 AM]


SpeechLinks: Speech Coding

URL: http://reality.sgi.com/employees/cook/audio.apps/public.html
comp.speech refs: [1]
Buddy Software Library: MPEG-1 Audio Layer 3 encoder and player
URL: http://www.buddy.org/softlib.html
comp.speech refs: [1]
Castleton Network Systems - G.729 Voice Coder
URL: http://www.castleton.com/
comp.speech refs: [1]
Ciaran McElroy's Speech Coding Page
URL: http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.html
comp.speech refs: [1]
CyberVoice speech coding
URL: http://www.cybit.com/
comp.speech refs: [1]
G.729 Annex A from Sipro Lab Telecom Inc
URL: http://www.sipro.com/g729a.html
comp.speech refs: [1]
GSM 06.10 Compression
URL: http://www.cs.tu-berlin.de/~jutta/toast.html
comp.speech refs: [1]
How to Install an MPEG Audio Player for your Web Navigator
URL: http://www.mpeg.org/index.html/MPEG-audio-player.html
comp.speech refs: [1]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
International Telecommunications Union standards information
URL: http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html
comp.speech refs: [1]
International Telecommunications Union WWW site
URL: http://www.itu.ch/
comp.speech refs: [1]
Jason Woodard's Speech Coding Page
URL: http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie speech coding
URL: http://www.lhs.com/coding.html

http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (2 of 7) [10/31/2003 8:41:47 AM]


SpeechLinks: Speech Coding

comp.speech refs: [1] - [2]


LPC-10 speech coding software
URL: http://www.arl.wustl.edu/~jaf/lpc/
comp.speech refs: [1]
MPEG FAQ by Chad Fogg
URL: http://www-plateau.cs.berkeley.edu/mpegfaq/MPEG-2-FAQ.html
comp.speech refs: [1]
MPEG FAQ by Frank Gadegast
URL: http://www.powerweb.de/mpeg/mpegfaq/
comp.speech refs: [1]
MPEG FAQ by Luigi
URL: http://www.crs4.it/~luigi/MPEG/mpegfaq.html
comp.speech refs: [1]
MPEG Pointers and Resources
URL: http://www.mpeg.org/
comp.speech refs: [1]
MPEG-1 Audio Layer 3 encoder, decoder and FAQ
URL: http://www.iis.fhg.de/departs/amm/layer3/index.html
comp.speech refs: [1]
MPEG-2 Audio FAQ from Philips
URL: http://www.keymodules.philips.com/MD/mpeg/faqmpeg2.htm
comp.speech refs: [1]
Online bibiliography for Phonetics and Speech Technology
URL: http://www.uni-frankfurt.de/~ifb/bib_engl.html
comp.speech refs: [1] - [2] - [3]
Phil Karn's Digital/Analog Voice Demo
URL: http://www.qualcomm.com/people/pkarn/voicedemo/
comp.speech refs: [1]
Rockwell's DigiTalk
URL: http://www.nb.rockwell.com/ref/digitalk/
comp.speech refs: [1]
Sipro Lab Telecom Inc.: Speech coding technology
URL: http://www.sipro.com/
comp.speech refs: [1]
Speech Coding and Synthesis Book
URL: http://www.elsevier.nl/section/engtech/scs/menu.htm
comp.speech refs: [1] - [2]
Speech Coding Demonstration
URL: http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html
comp.speech refs: [1]
Speech Technology Research Ltd.
URL: http://www.speechtech.com/home/speechtech/
comp.speech refs: [1] - [2]

http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (3 of 7) [10/31/2003 8:41:47 AM]


SpeechLinks: Speech Coding

StarAudio Compressor/Player shareware software


URL: http://www.speechtech.com/home/speechtech/loadview.html
comp.speech refs: [1]
Tony Robinson's speech analysis course
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/
comp.speech refs: [1] - [2]
Tony Robinson's speech analysis course: Speech coding
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
comp.speech refs: [1] - [2]
ToolVox from Voxware
URL: http://www.voxware.com/
comp.speech refs: [1]
TrueSpeech capability for WWW pages
URL: http://www.dspg.com/webpage.htm
comp.speech refs: [1]
TrueSpeech from DSP Group
URL: http://www.dspg.com/index.html
comp.speech refs: [1]

FTP Links
Audio file format conversion for G.723, G.721, A-law, u-law and linear
URL: ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
comp.speech refs: [1]
CELP 3.2a and LPC-10
URL: ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.gz
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.Z
comp.speech refs: [1]
comp.compression FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.compression/
comp.speech refs: [1] - [2]
G711, G721, G723 speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G721_G723.tar.Z
comp.speech refs: [1]
G.728 CELP Compression

http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (4 of 7) [10/31/2003 8:41:47 AM]


SpeechLinks: Speech Coding

URL: ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ddj/gsm-107.zip
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.7.tar.gz
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.mv.com/pub/ddj/1994.12/gsm-105.zip
comp.speech refs: [1]
LPC-10 speech coding software
URL: ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
comp.speech refs: [1]
MPEG-1 and MPEG-2 audio software from Universitaet Hannover
URL: ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/
comp.speech refs: [1]
MPEG-1 Audio Layer 1 & 2 decoder and verifier at CCETT
URL: ftp://ftp.ccett.fr/pub/mpeg/audio_new/
comp.speech refs: [1]
MPEG-1 Audio Layer 1 &2 encoder - decoder
URL: ftp://ftp.iuma.com/audio_utils/converters/source/
comp.speech refs: [1]
MPEG-2 Audio encoder and decoder at CCETT
URL: ftp://ftp.ccett.fr/pub/mpeg/mpeg2/
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.gz
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.Z
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.zip
comp.speech refs: [1]
StarAudio Compressor/Player technical documentation
URL: ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
comp.speech refs: [1]

Newsgroups
http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (5 of 7) [10/31/2003 8:41:47 AM]
SpeechLinks: Speech Coding

Artificial Intelligence newsgroup


URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (6 of 7) [10/31/2003 8:41:47 AM]


SpeechLinks: Speech Coding

Soundcard discussion group


URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
comp.speech refs: [1]
Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]

Back to Section 3 of the comp.speech FAQ Home Page.


Jump to [Q3.1], [Q3.2], [Q3.3]

Administrivia, Copyright, Submit Information : Last Revision: 18:42 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (7 of 7) [10/31/2003 8:41:47 AM]


SpeechLinks: Speech Synthesis

SpeechLinks - Speech Synthesis


Speech Technology Hyperlinks Page
A list of hyperlinks from the comp.speech FAQ related to speech synthesis. Links are provided to WWW references, ftp sites, and
newsgroups. Cross-references to the comp.speech WWW pages are also provided.

SpeechLinks Pages

SpeechLinks: The Complete List: 500+ speech technology links


SpeechLinks: General Speech Technology
SpeechLinks: Signal Processing for Speech
SpeechLinks: Speech Coding
SpeechLinks: Speech Recognition

comp.speech WWW Availability

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

WWW Links
Acuvoice, Inc. speech synthesizer
URL: http://www.acuvoice.com/
comp.speech refs: [1]
An Introduction to Text-to-Speech Synthesis: Thierry Dutoit
URL: http://kapis.www.wkap.nl/kapis/CGI-BIN/WORLD/book.htm?0-7923-4498-7
comp.speech refs: [1]
Andrew Simpson's home page
URL: http://www.phon.ucl.ac.uk/home/andrew/home.html
comp.speech refs: [1]
AsTeR text-to-speech processing
URL: http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html
comp.speech refs: [1]
AT&T Advanced Speech Products Group home page
URL: http://www.att.com/aspg/
comp.speech refs: [1] - [2] - [3]
ATT Bell Laboratories Voices
URL: http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
comp.speech refs: [1]
AT&T Watson: Engineer Training Program
URL: http://www.att.com/aspg/SSI_Class.html
comp.speech refs: [1] - [2]

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (1 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

AT&T Watson: Independent Software Vendor (ISV) Program


URL: http://www.att.com/aspg/ISV_program.html
comp.speech refs: [1] - [2]
AT&T Watson: Licensing Program
URL: http://www.att.com/aspg/ISV_program.html#2
comp.speech refs: [1] - [2]
AT&T Watson: Software Development Kit
URL: http://www.att.com/aspg/ISV_program.html#1
comp.speech refs: [1] - [2]
AT&T Watson Speech Applications Platform FAQ
URL: http://www.att.com/aspg/FAQ.html
comp.speech refs: [1] - [2]
Auditory User Interfaces --Toward The Speaking Computer, by T.V. Raman
URL: http://cs.cornell.edu/home/raman/aui
comp.speech refs: [1]
AVAAZ Home Page
URL: http://www.icis.on.ca/homepages/avaaz/
comp.speech refs: [1] - [2]
Axel Belinfante's home page
URL: http://www.cs.utwente.nl/~belinfan/
comp.speech refs: [1] - [2]
BeSTspeech from Berkeley Speech Technologies, Inc.
URL: http://www.bestspeech.com/index.html
comp.speech refs: [1]
Centre for Speech Technology Research, Edinburgh University
URL: http://www.cstr.ed.ac.uk/
comp.speech refs: [1]
Creative Labs, Inc.
URL: http://www.creaf.com/
comp.speech refs: [1] - [2]
Creative Labs TextAssist
URL: http://www.creaf.com/wwwnew/products/sound/demo/tareader.html
comp.speech refs: [1]
Creative TextAssist description
URL: http://www.creaf.com/wwwnew/tech/devcnr/tassist.html
comp.speech refs: [1]
Creative TextAssist FAQ
URL: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html
comp.speech refs: [1]
DECtalk pricing
URL: http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis-oi.html
comp.speech refs: [1]
DECtalk software
URL: http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.html
comp.speech refs: [1]
DECtalk speech synthesis
URL: http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis.html
comp.speech refs: [1]
Demo of BeSTspeech from Berkeley Speech Technologies, Inc.
URL: http://www.bestspeech.com/weblang.html
comp.speech refs: [1]
Demo of rsynth on the WWW
URL: http://wwwtios.cs.utwente.nl/say/
comp.speech refs: [1] - [2]
Digital Equipment Corporation
URL: http://www.digital.com/

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (2 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

comp.speech refs: [1]


Elan Informatique demo registration
URL: http://www.elan.fr/speech/spe-LITO.htm
comp.speech refs: [1]
Elan Informatique: Proverbe demonstration software
URL: http://www.elan.fr/vocal/technical/demoSE.htm
comp.speech refs: [1]
Elan Informatique: Proverbe sample sound files
URL: http://www.elan.fr/vocal/technical/sndwave.htm
comp.speech refs: [1]
Elan Informatique: Proverbe speech synthesis
URL: http://www.elan.fr/vocal/prod-pse.htm
comp.speech refs: [1]
Elan Informatique: ProVerbe Speech Synthesis Engine
URL: http://www.elan.fr/
comp.speech refs: [1]
Eloquence speech synthesis
URL: http://www.eloq.com/
comp.speech refs: [1]
Emacspeak - A Speech Output Subsystem For Emacs
URL: http://www.research.digital.com/CRL/personal/raman/emacspeak/emacspeak.html
comp.speech refs: [1]
Emacspeak FAQ
URL: http://www.research.digital.com/CRL/personal/raman/emacspeak/faqs.html
comp.speech refs: [1]
Entropic Research Laboratory home page
URL: http://www.entropic.com/
comp.speech refs: [1] - [2] - [3]
Eurovocs speech synthesis
URL: http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
comp.speech refs: [1] - [2]
Festival Speech Synthesis System: download software
URL: http://www.cstr.ed.ac.uk/projects/festival/download.html
comp.speech refs: [1]
Festival Speech Synthesis System: home page
URL: http://www.cstr.ed.ac.uk/projects/festival.html
comp.speech refs: [1] - [2]
German speech synthesis from Institut fur Technische Informatik und Kommunikationsnetze
URL: http://www.tik.ee.ethz.ch/cgi-bin/w3svox
comp.speech refs: [1]
Hadifix German speech synthesis
URL: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
comp.speech refs: [1]
Hadifix speech synthesis demo
URL: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
comp.speech refs: [1] - [2]
Haskins Laboratory WWW Site
URL: http://www.haskins.yale.edu/Haskins/MISC/special.html
comp.speech refs: [1]
IGE
URL: http://www.york.ac.uk/~rpf1/IGE.html
comp.speech refs: [1]
Infolingua Bibliographies
URL: http://gomer.mlink.net/infolingua.html
comp.speech refs: [1] - [2] - [3]
Infovox Multi-Lingual Speech Synthesis Products

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (3 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

URL: http://www.promotor.telia.se/NYA/cc/t-s/index.html
comp.speech refs: [1]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
Institute for Communications Research and Phonetics, University of Bonn: Hadifix speech synthesis
URL: http://asl1.ikp.uni-bonn.de/Welcome.html
comp.speech refs: [1]
Institute of Phonetic Sciences, University of Amsterdam
URL: http://fonsg3.let.uva.nl/Welcome.html
comp.speech refs: [1]
IPOX: All Prosodic Speech Synthesis Architecture
URL: http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Jon Iles' Speech Synthesis "Museum"
URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
comp.speech refs: [1] - [2]
JTS Micro Consulting Ltd: PAM, JTS Reader and Listen2
URL: http://www.islandnet.com/jts/
comp.speech refs: [1]
Kevin Lenzo's page of Speech Applications for the Macintosh
URL: http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Laureate speech synthesis from British Telecom
URL: http://www.labs.bt.com/innovate/speech/laureate/
comp.speech refs: [1]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie text-to-speech
URL: http://www.lhs.com/tts.html
comp.speech refs: [1] - [2]
Listen2 web page
URL: http://www.islandnet.com/jts/listen2.htm
comp.speech refs: [1]
Lucent Technologies Bell Labs Text-to-Speech
URL: http://www.bell-labs.com/project/tts/
comp.speech refs: [1] - [2]
Lucent Technologies Bell Labs Text-to-Speech: system description
URL: http://www.bell-labs.com/project/tts/tts-overview.html
comp.speech refs: [1]
Lyricos singing speech synthesis
URL: http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
comp.speech refs: [1]
Macintosh Speech Page: Speech Manager and PlainTalk
URL: http://www.speech.apple.com/
comp.speech refs: [1] - [2] - [3] - [4]
MacYack Pro Speech Synthesis software
URL: http://www.lowtek.com/macyack/
comp.speech refs: [1]
MBROLA speech synthesis demonstration
URL: http://tcts.fpms.ac.be/synthesis/modelcmp.html
comp.speech refs: [1] - [2]

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (4 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

MBROLA speech synthesis project home page


URL: http://tcts.fpms.ac.be/synthesis/mbrola.html
comp.speech refs: [1]
Monologue for Windows from First Byte
URL: http://www.firstbyte.davd.com/
comp.speech refs: [1] - [2]
Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
URL: http://www.fb9-ti.uni-duisburg.de/demos/speech.html
comp.speech refs: [1]
Musee sonore de la synthese de la Parole en francais
URL: http://ophale.icp.grenet.fr/exemples_synthese/ex.html
comp.speech refs: [1]
Museum of Speech Analysis and Synthesis
URL: http://mambo.ucsc.edu/psl/smus/smus.html
comp.speech refs: [1]
National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/ncvs_home.html
comp.speech refs: [1] - [2]
OGI Synthesis using Festival
URL: http://www.cse.ogi.edu/CSLU/research/TTS
comp.speech refs: [1]
Online bibiliography for Phonetics and Speech Technology
URL: http://www.uni-frankfurt.de/~ifb/bib_engl.html
comp.speech refs: [1] - [2] - [3]
Online Speech Synthesis: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/IFA-Features.html
comp.speech refs: [1]
Orator from Bellcore: home page
URL: http://www.bellcore.com/ORATOR/
comp.speech refs: [1] - [2]
PAM - A Text-To-Speech Application
URL: http://www.islandnet.com/~tslemko/
comp.speech refs: [1]
Pavarobotti synthesis technology from the National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
comp.speech refs: [1]
Plaintalk mailing list
URL: http://cgi.skyweyr.com/Plaintalk.Home
comp.speech refs: [1] - [2]
Scantron Quality Computers: for MacYack Pro Speech Synthesis software
URL: http://www.sqc.com/
comp.speech refs: [1]
Sensimetrics Corporation: SENSYN speech synthesizer
URL: http://www.sens.com/
comp.speech refs: [1]
SimTel programs for sound and soundcards
URL: http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
comp.speech refs: [1] - [2]
Speech Coding and Synthesis Book
URL: http://www.elsevier.nl/section/engtech/scs/menu.htm
comp.speech refs: [1] - [2]
Speech Toys
URL: http://www.speechtoys.com/
comp.speech refs: [1] - [2]
Speech Toys page on Speech Synthesis
URL: http://www.speechtoys.com/spchtoys/spsyn.html

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (5 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

comp.speech refs: [1]


Survey of the State of the Art in Human Language Technology
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/HLTsurvey.html
comp.speech refs: [1] - [2] - [3]
Survey of the State of the Art in Human Language Technology: Text-to-Speech Technologies.
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html
comp.speech refs: [1]
T. V. Raman's home page
URL: http://www.research.digital.com/CRL/personal/raman/raman.html
comp.speech refs: [1]
TCTS Home Page: MBROLA speech synthesis and SPRACH speech recognition
URL: http://tcts.fpms.ac.be/
comp.speech refs: [1]
TCTS-Multitel: Speech Synthesis research group home page
URL: http://tcts.fpms.ac.be/synthesis/synthesis.html
comp.speech refs: [1]
Text, Speech and Language Technology series
URL: http://kapis.www.wkap.nl/kapis/CGI-BIN/WORLD/series.htm?TLTB
comp.speech refs: [1]
Thierry Dutoit's home page
URL: http://tcts.fpms.ac.be/synthesis/dutoit.html
comp.speech refs: [1]
TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan
URL: http://www.speech.kth.se/info/software.html
comp.speech refs: [1]
Trainable text-to-phoneme software by Antonio Lucca
URL: http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
comp.speech refs: [1]
TrueTalk from Entropic
URL: http://www.entropic.com/truetalk.html
comp.speech refs: [1]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/centigram/Products/Technology/Truvoice/TruVoice_Brochure.html
comp.speech refs: [1]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/centigram/TruVoice/index.html
comp.speech refs: [1] - [2]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/
comp.speech refs: [1]
WATSON FlexTalk from AT&T Advanced Speech Products Group: WWW Demonstration
URL: http://www.att.com/aspg/demo.html
comp.speech refs: [1]
WinSpeech text-to-speech application
URL: http://www.pcww.com/index.html
comp.speech refs: [1]
Yahoo page on speech generation/synthesis
URL:
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Generation/
comp.speech refs: [1]
ZMD PCMCIA Speech Synthesis Card
URL: http://www.zmd-gmbh.de/assps/u2450app.htm
comp.speech refs: [1]
ZMD "Speaky" Speech Synthesis
URL: http://www.zmd-gmbh.de/assps/u2450.htm
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (6 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

ZMD: Zentrum Mikroelektronik Dresden speech synthesis


URL: http://www.zmd-gmbh.de/
comp.speech refs: [1]

FTP Links
Elan Informatique
URL: ftp://ftp.elan.fr/
comp.speech refs: [1]
Elan Informatique: Proverbe documentation
URL: ftp://ftp.elan.fr/Voice_products/Text-To-Speech_Synthesis_Products/ProVerbe_Speech_Engine/SDKEN.DOC
comp.speech refs: [1]
Festival Speech Synthesis System: source
URL: ftp://ftp.cstr.ed.ac.uk/pub/festival/1.1.1/
comp.speech refs: [1]
Hadifix speech synthesis demo software
URL: ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
comp.speech refs: [1]
Hadifix speech synthesis demo software
URL: ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.gz
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps
comp.speech refs: [1]
Narrator Translator Library
URL: ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha
comp.speech refs: [1]
Narrator Translator Library
URL: ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
comp.speech refs: [1]
PAM: a talking personal assistant and text reader application
URL: ftp://ftp.islandnet.com/jts/pam_en3c.zip
comp.speech refs: [1]
Personal TrueTalk from Entropic
URL: ftp://ftp.entropic.com/pub/truetalk/README.ptt
comp.speech refs: [1]
rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (7 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

comp.speech refs: [1]


rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.Z
comp.speech refs: [1]
SIMTEL speech software
URL: ftp://ftp.coast.net/SimTel/msdos/voice/
comp.speech refs: [1]
spchsyn.exe: Speech synthesis
URL: ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
comp.speech refs: [1]
"Speak" - a Text to Speech Program
URL: ftp://wilma.cs.brown.edu/pub/speak.tar.Z
comp.speech refs: [1]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk
URL: ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk Spanish speech synthesis
URL:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/Mexican_Spanish_TTS.hqx
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk: Speech Recognition
URL:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/English_Speech_Recognition.hqx
comp.speech refs: [1] - [2]
Speech Manager and PlainTalk Speech Synthesis
URL: ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk%201.4.1/English_Text-to-
Speech.hqx
comp.speech refs: [1] - [2]
Text to phoneme program
URL: ftp://shark.cse.fau.edu/pub/src/phon.tar.Z
comp.speech refs: [1]
Text to phoneme program
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phoneme.tar.gz
comp.speech refs: [1]
Text to phoneme software
URL: ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz
comp.speech refs: [1]
The Big Mouth: NeXT speech synthesizer
URL: ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z
comp.speech refs: [1]
Tinytalk shareware screen reader
URL: ftp://ftp.netcom.com/pub/eb/ebohlman/
comp.speech refs: [1]
Voicemaker speech synthesis
URL: ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip
comp.speech refs: [1]
WreadFiles: File reader for Commodore Amiga
URL: ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha
comp.speech refs: [1]

Newsgroups
http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (8 of 10) [10/31/2003 8:41:49 AM]
SpeechLinks: Speech Synthesis

Artificial Intelligence newsgroup


URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (9 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Synthesis

URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
comp.speech refs: [1]
Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]

Back to Section 5 of the comp.speech FAQ Home Page.


Jump to [Q5.1], [Q5.2], [Q5.3], [Q5.4], [Q5.5]

Administrivia, Copyright, Submit Information : Last Revision: 18:44 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (10 of 10) [10/31/2003 8:41:49 AM]


SpeechLinks: Speech Recognition

SpeechLinks - Speech Recognition


Speech Technology Hyperlinks Page
A list of hyperlinks from the comp.speech FAQ related to speech recognition. Links are provided to WWW references, ftp sites, and
newsgroups. Cross-references to the comp.speech WWW pages are also provided.

SpeechLinks Pages

SpeechLinks: The Complete List: 500+ speech technology links


SpeechLinks: General Speech Technology
SpeechLinks: Signal Processing for Speech
SpeechLinks: Speech Coding
SpeechLinks: Speech Synthesis

comp.speech WWW Availability

Australia: Sydney University


Britain: Cambridge University
Japan: ATR Interpreting Telecommunications Research Laboratories
USA: Carnegie Mellon University

WWW Links
1stVoice Dragon Systems reseller
URL: http://www.1stvoice.com/
comp.speech refs: [1]
21st Century Eloquence: speech recognition reseller
URL: http://www.voicerecognition.com/
comp.speech refs: [1]
Advanced Recognition Technologies, Inc: smARTspeak
URL: http://www.artcomp.com/speak.htm
comp.speech refs: [1]
Applied Language Technologies, Inc.: SpeechWorks
URL: http://www.altech.com/
comp.speech refs: [1]
ART: Advanced Recognition Technologies, Inc
URL: http://www.artcomp.com/
comp.speech refs: [1]
Articulate Systems PowerSecretary speech recognition
URL: http://www.artsys.com/
comp.speech refs: [1]
AT&T Advanced Speech Products Group home page
URL: http://www.att.com/aspg/
comp.speech refs: [1] - [2] - [3]

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (1 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

AT&T Watson: Engineer Training Program


URL: http://www.att.com/aspg/SSI_Class.html
comp.speech refs: [1] - [2]
AT&T Watson: Independent Software Vendor (ISV) Program
URL: http://www.att.com/aspg/ISV_program.html
comp.speech refs: [1] - [2]
AT&T Watson: Licensing Program
URL: http://www.att.com/aspg/ISV_program.html#2
comp.speech refs: [1] - [2]
AT&T Watson: Software Development Kit
URL: http://www.att.com/aspg/ISV_program.html#1
comp.speech refs: [1] - [2]
AT&T Watson Speech Applications Platform FAQ
URL: http://www.att.com/aspg/FAQ.html
comp.speech refs: [1] - [2]
Auscript speech technology vendor
URL: http://www.auscript.com.au/
comp.speech refs: [1]
BBN Hark's home page
URL: http://www.bbn.com/bbn_hark/HarkHome.html
comp.speech refs: [1]
Berkeley Restaurant Project (BeRP)
URL: http://www.icsi.berkeley.edu/real/berp.html
comp.speech refs: [1]
Brite: Computer Telephony Integration & Interactive Voice Response
URL: http://www.brite.com/
comp.speech refs: [1]
CAVE: Caller Verification in Banking and Telecommunications
URL: http://www.ptt-telecom.nl/cave/
comp.speech refs: [1]
Creative Labs, Inc.
URL: http://www.creaf.com/
comp.speech refs: [1] - [2]
Creative Labs VoiceAssist
URL: http://www.creaf.com/wwwnew/products/sound/demo/vassist.html
comp.speech refs: [1]
CustomVoice and CustomTelephone from A&G Graphics Interface Inc.
URL: http://www.customvoice.com/
comp.speech refs: [1]
DAX Systems, Inc.: Computer Telephony and Integrated Voice Response
URL: http://www.daxsystems.com/
comp.speech refs: [1]
Digital Dreams Speech Recognition Plug-Ins
URL: http://www.surftalk.com/
comp.speech refs: [1]
Discrete HMM demonstration software
URL: http://www.isip.msstate.edu/software/
comp.speech refs: [1]
Dragon Developers Page
URL: http://www.dragonsys.com/marketing/dragondeveloper.html
comp.speech refs: [1]
Dragon home page
URL: http://www.dragonsys.com/
comp.speech refs: [1] - [2]
Dragon NaturallySpeaking
URL: http://www.naturallyspeaking.com/

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (2 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

comp.speech refs: [1]


Dragon PowerSecretary
URL: http://www.dragonsys.com/marketing/powersecretary.html
comp.speech refs: [1]
Dragon Telephony Products
URL: http://www.dragonsys.com/marketing/telephony.html
comp.speech refs: [1]
Entropic Research Laboratory home page
URL: http://www.entropic.com/
comp.speech refs: [1] - [2] - [3]
Entropic's HTK (Hidden-Markov Model Toolkit)
URL: http://www.entropic.com/htk.html
comp.speech refs: [1]
Ficomp Inc. Interpreter 6000
URL: http://www.ficompsystems.com/
comp.speech refs: [1]
Fundamentals of Speech Recognition Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course lecture notes by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/lectures/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/SYLLABUS.ps
comp.speech refs: [1] - [2]
Hangai Lab: demo of speaker identification
URL: http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech1.html
comp.speech refs: [1]
Hangai Lab: demo of speaker verification
URL: http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech2.html
comp.speech refs: [1]
Hangai Lab: demos of speaker recognition
URL: http://miya8f05.ee.kagu.sut.ac.jp/index.html
comp.speech refs: [1]
IBM VoiceType Control
URL: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
comp.speech refs: [1]
IBM VoiceType Dictation
URL: http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
comp.speech refs: [1]
IBM VoiceType Dictation FAQ
URL: http://www.infi.net/~ums/ibmfaq.htm
comp.speech refs: [1]
IBM VoiceType Dictation from UltraMedia Systems International
URL: http://www.infi.net/~ums/
comp.speech refs: [1]
IBM VoiceType Ordering
URL: http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
comp.speech refs: [1]
IBM VoiceType System Requirements
URL: http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html
comp.speech refs: [1]
IN CUBE for Windows 95
URL: http://www.commandcorp.com/cci/win95.html
comp.speech refs: [1]
IN CUBE from Command Corp. Inc.

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (3 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

URL: http://www.commandcorp.com/incube_welcome.html
comp.speech refs: [1]
IN CUBE Mark II Pro for Windows NT
URL: http://www.commandcorp.com/cci/pront.html
comp.speech refs: [1]
IN CUBE Voice Command for Sun SPARCstations
URL: http://www.commandcorp.com/cci/in3sparc.html
comp.speech refs: [1]
Infolingua Bibliographies
URL: http://gomer.mlink.net/infolingua.html
comp.speech refs: [1] - [2] - [3]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
Institute for Signal and Information Processing (ISIP) at Mississippi State University
URL: http://www.isip.msstate.edu/
comp.speech refs: [1] - [2] - [3] - [4]
International Computer Science Institute in Berkeley, CA
URL: http://www.icsi.berkeley.edu/
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Kevin Lenzo's page of Speech Applications for the Macintosh
URL: http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Keyware S2 Security Service
URL: http://www.keywareusa.com/Products/S2SecurityServer/main.html
comp.speech refs: [1]
Keyware Technologies Biometric Verificaton
URL: http://www.keywareusa.com/
comp.speech refs: [1]
Keyware VoiceGuardian
URL: http://www.keywareusa.com/Products/VoiceGuardian/main.html
comp.speech refs: [1]
Keyware VoiceGuardian online demo
URL: http://www.keywareusa.com/Demos/
comp.speech refs: [1]
Kurzweil Clinical Reporter speech recognition
URL: http://www.kurzweil.com/medical/
comp.speech refs: [1]
Kurzweil Voice for Windows: speech recognition
URL: http://www.kurzweil.com/
comp.speech refs: [1]
LawTalk from WildCard
URL: http://www.wildcardtech.com/speech/info/lawtalk.htm
comp.speech refs: [1]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie speech recognition
URL: http://www.lhs.com/asr.html
comp.speech refs: [1] - [2]
Lists of References on Automatic Speaker Verification
URL: http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (4 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

Macintosh Speech: Developer's Information


URL: http://www.speech.apple.com/speech/dev/dev.html
comp.speech refs: [1]
Macintosh Speech Page: Speech Manager and PlainTalk
URL: http://www.speech.apple.com/
comp.speech refs: [1] - [2] - [3] - [4]
Massachusetts Institute of Technology
URL: http://web.mit.edu/
comp.speech refs: [1]
Microsoft Speech API SDK
URL: http://www.research.microsoft.com/research/srg/install.htm#SDK
comp.speech refs: [1]
Microsoft Speech SDK
URL: http://www.research.microsoft.com/research/srg/install.htm
comp.speech refs: [1] - [2]
Microsoft Speech Technology home page
URL: http://www.research.microsoft.com/research/srg/
comp.speech refs: [1]
Mississippi State University
URL: http://www.msstate.edu/
comp.speech refs: [1] - [2]
National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/ncvs_home.html
comp.speech refs: [1] - [2]
Netscape Communications Corporation
URL: http://home.netscape.com/
comp.speech refs: [1]
NICO Artificial Neural Network Toolkit
URL: http://www.speech.kth.se/NICO/index.html
comp.speech refs: [1]
NICO Artificial Neural Network Toolkit download page
URL: http://www.speech.kth.se/NICO/download.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications
URL: http://www.nortel.com/entprods/multimedia/
comp.speech refs: [1]
Nortel: Multimedia Network Applications: AudioGram Delivery Service
URL: http://www.nortel.com/entprods/multimedia/applications/audiogrm.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Auto Attendant
URL: http://www.nortel.com/entprods/multimedia/applications/autoattd.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Dialing
URL: http://www.nortel.com/entprods/multimedia/applications/vad.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Premier Dialing
URL: http://www.nortel.com/entprods/multimedia/applications/premdial.html
comp.speech refs: [1]
Nortel: Network Applications Vehicle
URL: http://www.nortel.com/entprods/multimedia/nav.html
comp.speech refs: [1]
Nortel: Northern Telecom, provider of network voice applications
URL: http://www.nortel.com/
comp.speech refs: [1]
Nuance Communications: Speech recognition
URL: http://www.nuance.com/

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (5 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

comp.speech refs: [1]


O'Brien Resources: Speech Recognition Sales
URL: http://www.crosslink.net/~obrien/
comp.speech refs: [1]
OfficeTalk and LawTalk from WildCard
URL: http://www.wildcardtech.com/
comp.speech refs: [1]
OfficeTalk from WildCard
URL: http://www.wildcardtech.com/speech/info/offtalk.htm
comp.speech refs: [1]
Online bibiliography for Phonetics and Speech Technology
URL: http://www.uni-frankfurt.de/~ifb/bib_engl.html
comp.speech refs: [1] - [2] - [3]
Phillips Speech home page
URL: http://www.speech.be.philips.com/
comp.speech refs: [1]
Phillips Speech Processing System 6000s: Radiology dictation
URL: http://www.speech.be.philips.com/sp6000.htm
comp.speech refs: [1]
Phillips SpeechMagic dictation sstem
URL: http://www.speech.be.philips.com/sp-magic.htm
comp.speech refs: [1]
Pronotes Speech Recognition
URL: http://www.pronotes.com/
comp.speech refs: [1]
PureSpeech, Inc. WWW Site
URL: http://www.speech.com/
comp.speech refs: [1]
Russ Wilcox's list of Commercial Speech Recognition
URL: http://www.tiac.net/users/rwilcox/speech.html
comp.speech refs: [1] - [2]
SCI VoiceAutomated: speech recognition reseller
URL: http://www.voiceautomated.com/
comp.speech refs: [1]
Search Alta Vista for speech recognition
URL: http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=.&q=%2Bspeech+%2Brecognition
comp.speech refs: [1]
Search Lycos for speech recognition
URL: http://www.lycos.com/cgi-bin/pursuit?query=speech+recognition&ab=the_catalog
comp.speech refs: [1]
Sensory Circuits: Integrated Circuits for Speech Synthesis, Recognition and Verification
URL: http://www.sensoryinc.com/
comp.speech refs: [1]
Simon Crosby's FAQ for DragonDictate
URL: http://www.cl.cam.ac.uk/users/sac/dd-faq.html
comp.speech refs: [1] - [2]
Simon Crosby's home page (maintains an FAQ for DragonDictate)
URL: http://www.cl.cam.ac.uk/users/sac/
comp.speech refs: [1] - [2]
Speaker Identification And Verification: LIMSI Report
URL: http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html
comp.speech refs: [1]
SpeakerKey Speaker Verification: FAQ
URL: http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.html
comp.speech refs: [1]
Speech Recognition Course Notes

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (6 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

URL: http://www.isip.msstate.edu/publications/1996/speech_recognition_short_course
comp.speech refs: [1]
Speech Recognition List: Applied Speech Technology Laboratory of CLSI at Stanford
URL: http://csli-www.stanford.edu/users/bscott/SRTech.html
comp.speech refs: [1]
Speech Systems Phonetic Engine speech recognition
URL: http://www.speechsys.com/
comp.speech refs: [1]
Speech Toys
URL: http://www.speechtoys.com/
comp.speech refs: [1] - [2]
Speech Toys page on Speech Recognition
URL: http://www.speechtoys.com/spchtoys/sprec.html
comp.speech refs: [1]
SpeechPrint ID from Voice Control Systems, Inc.
URL: http://www.voicecontrol.com/speechid.html
comp.speech refs: [1]
Spoken Language Systems Group at the Massachusetts Institute of Technology
URL: http://www.sls.lcs.mit.edu/
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/HLTsurvey.html
comp.speech refs: [1] - [2] - [3]
Survey of the State of the Art in Human Language Technology: Speaker Recognition
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology: Spoken Input Technologies
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html
comp.speech refs: [1]
Synapse: speech recognition sales
URL: http://www.synapseadaptive.com/
comp.speech refs: [1]
Talk Technology, Inc.: Speech recognition reseller
URL: http://www.usbusiness.com/talk/
comp.speech refs: [1]
Talk Technology: speech recognition reseller
URL: http://www.talktechnology.com/
comp.speech refs: [1]
Talking to a PC May Be Hazard To Your Throat, by Julie Chao
URL: http://www.bilbo.com/tae/bilbo/wsj.html
comp.speech refs: [1]
Talking to Computers Has its Hazards, by Gordon Arnaut
URL: http://www.bilbo.com/tae/bilbo/globmail.html
comp.speech refs: [1]
T-Netix speaker verification for cellular communications
URL: http://www.t-netix.com/
comp.speech refs: [1]
Tony Robinson's home page
URL: http://svr-www.eng.cam.ac.uk/~ajr/
comp.speech refs: [1] - [2] - [3] - [4] - [5]
ToppCopy Telecom: Speech recognition reseller
URL: http://www.toppcopy.com/
comp.speech refs: [1]
Typing Injuries Page
URL: http://alumni.caltech.edu/~dank/typing-archive.html
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (7 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

Typing Injury FAQ


URL: http://www.cs.princeton.edu:80/~dwallach/tifaq/
comp.speech refs: [1]
VAULT Speaker Verification
URL: http://www.ImagineNation.com/Pavilion/Vault/Vault.htm
comp.speech refs: [1]
VAULT Speaker Verification FAQ
URL: http://www.ImagineNation.com/Xanadu/Vault/Vault.htm
comp.speech refs: [1]
VAULT Speaker Verification from ImagineNation
URL: http://www.ImagineNation.com/
comp.speech refs: [1]
Verbex demonstration speech recognition software
URL: http://www.verbex.com/demo.htm
comp.speech refs: [1]
Verbex Listen for Windows
URL: http://www.verbex.com/lfwspec.htm
comp.speech refs: [1]
Verbex: Listen for Windows speech recognition
URL: http://www.verbex.com/
comp.speech refs: [1] - [2]
Verbex speech recognition ordering page
URL: http://www.verbex.com/basicord.htm
comp.speech refs: [1]
Verbex Verbal Advantage DeskTop speech recognition
URL: http://www.verbex.com/aplncher.htm
comp.speech refs: [1]
Verbex Verbal Advantage Voice Browser speech recognition
URL: http://www.verbex.com/browser.htm
comp.speech refs: [1]
Visual Voice from Stylus Innovation
URL: http://www.stylus.com/
comp.speech refs: [1]
Visual Voice from Stylus Innovation
URL: http://www.stylus.com/stylus/part.htm
comp.speech refs: [1]
Vocal Health information from the National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/hygiene/home.html
comp.speech refs: [1]
Voice Control Systems: Speech recognition
URL: http://www.voicecontrol.com/
comp.speech refs: [1] - [2] - [3]
Voice Processing Corporation Speech Recognition Product Line
URL: http://www.vpro.com/
comp.speech refs: [1]
VoiceCompanion - RemoteAccess from WildCard
URL: http://www.wildcardtech.com/speech/info/vcremote.htm
comp.speech refs: [1]
VoiceCompanion for the Internet from WildCard
URL: http://www.wildcardtech.com/vcibeta/beta2.htm
comp.speech refs: [1]
Voicetek Corp.
URL: http://www.voicetek.com/
comp.speech refs: [1]
VoiceWare Systems speech recognition resellers
URL: http://www.talk2type.com/home.htm

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (8 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

comp.speech refs: [1]


Votan VPC2100 Voice Card and VSP 1010 Speech Processor
URL: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
comp.speech refs: [1]
WorkLink Dragon Systems reseller
URL: http://www.worklink.net/
comp.speech refs: [1]
Yahoo page on Speech Recognition
URL: http://www.yahoo.com/business/corporations/computers/software/voice_recognition/
comp.speech refs: [1]
Yahoo page on speech recognition
URL:
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Recognition/
comp.speech refs: [1]

FTP Links
AbbotDemo speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
comp.speech refs: [1]
comp.speech ftp site: speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
comp.speech refs: [1] - [2] - [3]
Digital Dreams Speech Recognition Plug-Ins
URL: ftp://ftp.surftalk.com/
comp.speech refs: [1]
Do-it-yourself speech recognition
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognition
comp.speech refs: [1]
EARS speech recognition software
URL: ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
comp.speech refs: [1]
EARS speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.tar.gz
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: MSDOS version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkr_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Tool: Sun version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.zip
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spchtool.zip
comp.speech refs: [1]

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (9 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

Jialong He's Speech Recognition Research Tool: Sun version (Germany)


URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spch_sun_v1.tar.gz
comp.speech refs: [1]
Lists of speech recognition products posted to comp.speech
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognitionProducts
comp.speech refs: [1]
Lotec speech recognition software
URL: ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
comp.speech refs: [1]
Myers' Hidden Markov Model software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/hmm-1.03.tar.gz
comp.speech refs: [1]
Myers' Hidden Markov Model software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/hmm.README
comp.speech refs: [1]
recnet: recurrent neural network speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/recnet-1.3.tar.Z
comp.speech refs: [1]
Simon Says speech recognition for NeXT
URL: ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio/audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz
comp.speech refs: [1]
Simon Says speech recognition for NeXT
URL: ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio/audio-apps/SimonSaysDemo.1.5.1.README
comp.speech refs: [1]
Voice Problems, Prevention and Correction
URL: ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-problems
comp.speech refs: [1]
Voice Recognition Processors document
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProcessors
comp.speech refs: [1] - [2] - [3]

Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (10 of 11) [10/31/2003 8:41:52 AM]


SpeechLinks: Speech Recognition

comp.speech refs: [1]


Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
comp.speech refs: [1]
Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to [Q6.1], [Q6.2], [Q6.3], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 17:18 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/speechlinks.html (11 of 11) [10/31/2003 8:41:52 AM]


comp.speech FAQ Section 2

Signal Processing for Speech


comp.speech FAQ Section 2

SpeechLinks: Signal Processing for Speech


Q2.1: What sampling do I need for speech?
Q2.2: Finding the pitch of a speech signal
Q2.3: How do I find the start and end points of a speech signal?
Q2.4: Where can I find FFT software?
Q2.5: Signal processing in speech technology
Q2.6: Speech sampling and signal processing hardware
Q2.7: How do I convert to/from mu-law format?
Q2.8: Signal Processing Software

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 15:42 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/FAQ2.html [10/31/2003 8:42:12 AM]


Q2.1: What sampling do I need for speech?

Q2.1: What sampling do I need for


speech?
For recorded speech to be understood by humans you need an 8kHz sampling rate or more and at least
8 bit sampling. This produces poor quality speech - but in can be understood.

Improvements can be achieved by increasing the number of bits in sampling to 12bits or 16bits, or by
using a non-linear encoding technique such as mu-law or A-law (see Q2.7). This improves the "signal-
to-noise" ratio.

Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz, improves the frequency
response: the higher the sampling frequency the better the high frequency content will be. A 16kHz
sampling rate is a reasonable target for high quality speech recording and playback.

When doing speech recognition you need to remember that the your computer is not as good as your
ear so it will have trouble with poor quality sounds. The choice of an appropriate sampling setup
depends very much on the speech recognition task and the amount of computer power available.

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.2], [Q2.3], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 01:53 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.1.html [10/31/2003 8:42:14 AM]


Q2.2: Finding the pitch of a speech signal

Q2.2: Finding the pitch of a speech


signal
This topic comes up regularly in the comp.dsp newsgroup. Question 2.5 of the FAQ posting for
comp.dsp gives a comprehensive list of references on the definition, perception and processing of
pitch. The comp.dsp FAQ posting is posted regularly to the comp.dsp newsgroup, and is also
available by ftp and on the WWW:

● http://www.bdti.com/faq/dsp_faq.htm
● ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

The following provide pitch tracking software:

● Most of the speech processing environments listed in Q1.9 including CSRE, ESPS, Kay
Elemetrics Computer Speech Lab, OGI Speech Tools, Speech Filing System, Signalyze,
Soundscope.

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.1], [Q2.3], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 01:53 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.2.html [10/31/2003 8:42:14 AM]


Q2.3: How do I find the start and end points of a speech signal?

Q2.3: Finding start and end points


of a speech signal
End-point detection algorithms identify sections in an incoming audio signal that contain speech.
Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be obtained for inputs
which contain only speech surrounded by silence (no other noises). Typical algorithms look at the
energy or amplitude of the incoming signal and at the rate of "zero-crossings". A zero-crossing is
where the audio signal changes from positive to negative or visa versa. When the energy and zero-
crossings are at certain levels, it is reasonable to guess that there is speech. More detailed descriptions
are provided in the papers cited below and in the documentation for the following software.

End-point detection software is available from:

● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
● ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.tar.gz

Plenty of research papers have been presented on end-pointing. Try the following:

● Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints of Isolated
Utterances", Bell System Technical Journal, Vol 54, No. 2, pp 297-315, 1975.
● Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans on Communications, Vol
26, No 1, Jan 78, pp. 140-145.
● Newman, W.C. "Detecting Speech with an Adapative Neural Network." Electronic Design. 22
March 1990.
● Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE Proc. Sci. Meas. Technol.,
Vol 141, No.3, May 1994, pp 153-159.

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.1], [Q2.2], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 14:11 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.3.html [10/31/2003 8:42:15 AM]


Q2.4: Where can I find FFT software?

Q2.4: FFT Software


Comprehensive list of FFT software
Links to over 65 different pieces of one-dimensional FFT code.
http://tjev.tel.etf.hr/josip/DSP/fft.html
FFT Software including optimised fft routines and mixed-radix algorithms
ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
OR, ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff.tar.gz
mixfft03.zip: C-source for a very fast arbitrary N FFT routine
The C-source is ShareWare: read the text file included in the package before using the FFT
routine commercially.
Jens J. Nielsen: jnielsen@internet.dk
Available from ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.zip
OR ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip
FFTW
FFTW is a C subroutine library for computing the FFT in one or more dimensions. It is not
limited to sizes that are powers of two, and includes real-complex and parallel transforms.
Also on the FFTW web site are benchmarks comparing the performance and accuracy of many
public-domain FFT implementations on a variety of platforms, as well as links to other sources
of FFT code and information.
Available from http://theory.lcs.mit.edu/~fftw
Developed by Matteo Frigo and Steven G. Johnson: fftw@theory.lcs.mit.edu

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.1], [Q2.2], [Q2.3], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 18:00 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.4.html [10/31/2003 8:42:16 AM]


Q2.5: Signal processing in speech technology

Q2.5: Signal processing in speech


technology
This question is far to big to be answered in a FAQ posting. Here are some WWW resources and
books which cover the area well.

Tony Robinson's Course Notes

Dr. Tony Robinson of the Engineering Dept of Cambridge University has put his Speech Analysis
course notes on the web. The base page is http://svr-www.eng.cam.ac.uk/~ajr/SA95/. There is
information on the following:

● Sampling theory
● Filter bank analysis
● Short-term fourier analysis
● Linear prediction analysis
● Formant analysis and voicing analysis
● Speech coding
● and more....

Joseph Picone's Course Notes

Joseph Picone of the Institute for Signal and Information Processing (ISIP) at Mississippi State
University has put two sets of course notes on the web:

EE 4773/6773: Digital Signal Processing


The course covers sampling, frequency analysis, z-transforms, filter design and more. The
WWW site provides the syllabus, assignments, some source code data, exams, homework and
solutions, lecture notes and more.
EE 8993: Fundamentals of Speech Recognition
The course covers background probability and phonetics/acoustics, speech signal analysis,
dynamic programming, dynamic time warping, hidden Markov modelling, language
modelling, neural networks, etc. The WWW sites provides the syllabus and lecture notes.

Signal Processing Home page

The Signal Processing Home page has information on a range of DSP issues. It includes references to
a range of software and much more.
http://tjev.tel.etf.hr/josip/DSP/sigproc.html

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.5.html (1 of 2) [10/31/2003 8:42:17 AM]


Q2.5: Signal processing in speech technology

Books and other References

There are many good books which discuss signal processing for speech:

● Digital processing of speech signals; L. R. Rabiner, R. W. Schafer. Englewood Cliffs; London:


Prentice-Hall, 1978
● Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill 1986
● Computer Speech Processing; ed Frank Fallside, William A. Woods Englewood Cliffs:
Prentice-Hall, c1985
● Digital speech processing : speech coding, synthesis, and recognition edited by A. Nejat Ince;
Kluwer Academic Publishers, Boston, c1992
● Speech science and technology; edited by Shuzo Saito pub. Ohmsha, Tokyo, c1992
● Speech analysis; edited by Ronald W. Schafer, John D. Markel, New York, IEEE Press, c1979
● Applied Speech Technology Edited by: Ann Syrdal (AT&T Bell Labs, Holmdel, New Jersey),
Raymond Bennett (Ameritech, Hoffman Estates, Illinois) and Steven Greenspan (AT&T Bell
Labs, Murray Hill, New Jersey). Publisher: CRC Press.
● Speech Communication: Human and Machine Douglas O'Shaughnessy, Addison Wesley series
in Electrical Engineering: Digital Signal Processing, 1987.
● Discrete-time processing of speech signals; John R Deller, John G Proakis, John H L Hansen;
Macmillan 1993.
● Signal processing of speech; F J Owens; Macmillan 1993.

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.1], [Q2.2], [Q2.3], [Q2.4], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 13:18 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.5.html (2 of 2) [10/31/2003 8:42:17 AM]


Q2.6: Speech sampling and signal processing hardware

Q2.6: Speech sampling and signal


processing hardware
In addition to the following information, have a look at the Audio File format document prepared by
Guido van Rossum (see details in Section 1.8).

Information is included on hardware for the following systems:

Macintosh Audio Hardware


PC Audio Hardware
Unix Audio Hardware

Can anyone provide information for SGI, NeXT, other UNIX hardware and any other PC soundcards?

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.1], [Q2.2], [Q2.3], [Q2.4], [Q2.5], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 01:53 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.6.html [10/31/2003 8:42:17 AM]


Q2.7: How do I convert to/from mu-law format?

Q2.7: How do I convert to/from mu-


law format?
Mu-law coding is a form of compression for audio signals including speech. It is widely used in the
telecommunications field because it improves the signal-to-noise ratio without increasing the amount of
data. Typically, mu-law compressed speech is carried in 8-bit samples. It is a companding technqiue. That
means that carries more information about the smaller signals than about larger signals.

On SUN Sparc systems have a look in the directory /usr/demo/SOUND. Included are table lookup macros
for ulaw conversions. [Note however that not all systems will have /usr/demo/SOUND installed as it is
optional - see your system admin if it is missing.]

OR, here is some sample conversion code in C.

/**
** Signal conversion routines for use with Sun4/60 audio chip
**/

#include stdio.h

unsigned char linear2ulaw(/* int */);


int ulaw2linear(/* unsigned char */);

/*
** This routine converts from linear to ulaw
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
** Continuous PCM Companding Law," Villeret, Michel,
** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
** 1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/

#define ZEROTRAP /* turn on the trap as per the MIL-STD */


#define BIAS 0x84 /* define the add-in bias for 16 bit samples */

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.7.html (1 of 3) [10/31/2003 8:42:19 AM]


Q2.7: How do I convert to/from mu-law format?

#define CLIP 32635

unsigned char
linear2ulaw(sample)
int sample; {
static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
int sign, exponent, mantissa;
unsigned char ulawbyte;

/* Get the sample into sign-magnitude. */


sign = (sample >> 8) & 0x80; /* set aside the sign */
if (sign != 0) sample = -sample; /* get magnitude */
if (sample > CLIP) sample = CLIP; /* clip the magnitude */

/* Convert from 16 bit linear to ulaw. */


sample = sample + BIAS;
exponent = exp_lut[(sample >> 7) & 0xFF];
mantissa = (sample >> (exponent + 3)) & 0x0F;
ulawbyte = ~(sign | (exponent << 4) | mantissa);
#ifdef ZEROTRAP
if (ulawbyte == 0) ulawbyte = 0x02; /* optional CCITT trap */
#endif

return(ulawbyte);
}

/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.7.html (2 of 3) [10/31/2003 8:42:19 AM]


Q2.7: How do I convert to/from mu-law format?

** 2) MIL-STD-188-113,"Interoperability and Performance Standards


** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: 8 bit ulaw sample
** Output: signed 16 bit linear sample
*/

int
ulaw2linear(ulawbyte)
unsigned char ulawbyte;
{
static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
int sign, exponent, mantissa, sample;

ulawbyte = ~ulawbyte;
sign = (ulawbyte & 0x80);
exponent = (ulawbyte >> 4) & 0x07;
mantissa = ulawbyte & 0x0F;
sample = exp_lut[exponent] + (mantissa << (exponent + 3));
if (sign != 0) sample = -sample;

return(sample);
}

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.1], [Q2.2], [Q2.3], [Q2.4], [Q2.5], [Q2.6], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 01:53 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.7.html (3 of 3) [10/31/2003 8:42:19 AM]


Q2.8: Signal Processing Software

Q2.8: Signal Processing Software


[Note: Question 1.9 lists speech laboratory environments and audio editors, many of which provide basic
and advanced signal processing capabilities.]

Signal Processing Products


SigLib from Numerix Ltd.

On the Web
The following sites provide lists of useful DSP software. Not all the software is directly applicable to
speech processing.

comp.dsp FAQ
http://www.bdti.com/faq/dsp_faq.htm
DSP Internet Resources
http://www.eg3.com/
http://www.eg3.com/dsp.htm
Poynton's Digital Signal Processing Resource List
http://www.inforamp.net/~poynton/Poynton-dsp.html
WWW Pages Relating to Sound Computation
http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html
Yahoo - Signal and Image Processing
http://www.yahoo.com/Science/Engineering/Electrical_Engineering/Signal_and_Image_Processing/
Sound Related Resources
http://pscinfo.psc.edu/~geigel/menus/sound.html
SPLIB: Signal Processing url LIBrary
http://jazz.rice.edu/splib/
Wavelet's Home Page
http://www.mat.sbg.ac.at/~uhl/wav.html

Back to Section 2 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q2.2], [Q2.3], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 15:31 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section2/Q2.8.html [10/31/2003 8:42:19 AM]


comp.speech WWW site

comp.speech WWW Site


Admin
Minor changes each month. Thanks to all the companies and individuals who send in information.

Acknowledgements
Hundreds of people and companies have made contributions to the comp.speech FAQ over the last
few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo
who have provided a wide range of information and assistance. Tony Robinson also maintains the
comp.speech ftp site which is an excellent resource for all people working with speech technology. I
am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for
supporting the FAQ on their WWW sites.

Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the information presented
here, the author assumes no responsibility for errors or omissions, or for damages resulting from the
use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as representing the views or
products of my employer, Sun Microsystems, Inc.

Copyright and Reproduction


Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.
The comp.speech WWW pages may not be distributed for financial gain and may not be included in
any collections or compilations without express permission from the author.
You may make links to the documents, but you may not make copies without permission of the
author.
Note: hyperlinks to the comp.speech WWW pages are encouraged.

Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by

Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA

http://mi.eng.cam.ac.uk/comp.speech/Admin.html (1 of 2) [10/31/2003 8:42:20 AM]


comp.speech WWW site

Ph: (508) 442 2681


andrew.hunt@east.sun.com

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Submit Information : Last Revision: 18:40 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Admin.html (2 of 2) [10/31/2003 8:42:20 AM]


comp.speech information email

comp.speech FAQ /
WWW
Submission of Information
Any updates of information, corrections or suggestions are welcome. Please note that it may take me a
week or two to respond.

The following are the (flexible) guidelines for FAQ entries:

● Entries are typically limited to around 30 lines.


● Include technical information - avoid "marketing hype".
● Where possible submit in the format used in existing FAQ entries.
● Specify URLs for hyperlinks to existing information on the net.

Click here to submit.

Request for Information


As a general rule, if the information you want is not in the comp.speech FAQ posting, or cannot be
found on the comp.speech WWW site, then I probably don't have the answer. Have you tried to
posting a request for information/help to the comp.speech newsgroup? (See Q1.1)

Andrew Hunt
Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681
Email: andrew.hunt@east.sun.com

http://mi.eng.cam.ac.uk/comp.speech/util/email.html [10/31/2003 8:42:21 AM]


comp.speech FAQ Section 3

Speech Coding and Compression


comp.speech FAQ Section 3

SpeechLinks: Speech Coding


Q3.1: Speech compression techniques
Q3.2: Information on speech coding and compression
Q3.3: Speech Compression / Coding Software

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 01:44 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/FAQ3.html [10/31/2003 8:42:59 AM]


Q3.1: Speech compression techniques

Q3.1: Speech compression


techniques
Provided by Tony Robinson:

The aim of speech compression is to produce a compact representation of speech sounds such that
when reconstructed it is perceived to be close to the original. The two main measures of closeness are
intelligibility and naturalness.

The standard reference point is toll quality speech, this is the same as what would be expected over a
telephone line, for example, speech coded at 8 kHz using 8 bit ulaw coding and a maximum frequency
of about 3.3 kHz. This is a bit rate of 64 kbps, and as such represents a compressed form over (say) 16
bit, 16 kHz speech which is the standard in speech recognition work.

ulaw coding does not exploit the (normally large) sample to sample correlations found in speech.
ADPCM is the next family of speech coding techniques, and does exploit this redundancy by using a
simple linear filter to predict the next sample of speech. The resulting prediction error is typically
quantised to 4 bits thus giving a bit rate of 32 kbps (see, for example, the software in Q3.3: 32 kbps
ADPCM, G.711/721/723 Compression, shorten). The advantages of ADPCM are that is simple to
implement and has very low delay.

To obtain more compression specific properties of the speech signal must be modelling. The main
assumption is known as the source filter model of speech production. This assumes that a source
(voicing or fricative excitation) is passed through a filter (the vocal tract response) to produce the
speech. The simplest implementation of this is known as a LPC synthesiser (e.g. LPC10e). At every
frame the speech is analysed to compute the filter coefficients, the energy of the excitation, a voicing
decision, and a pitch value if voiced. At the decoder a regular set of pulses for voiced speech or white
noise for unvoiced speech is passed through the linear filter and multiplied by the gain to produce the
speech. This is a very efficient system and typically produces speech coded at 1200-2400bps. With
clever acoustic vector prediction this can be reduced to 300-600bps. The disadvantages are a loss of
naturalness over most of the speech and occasionally a loss of intelligibility.

The CELP family of coders compensates for the lack of quality of the simple LPC model by using
more information in the excitation. Each of a set of codebook of excitation vectors is tried and the
index of the one that best matches the original speech is transmitted. This results in an increase in the
bit rate to typically 4800-9600bps. Most speech coding research is currently directed towards CELP
coders. (See, for example, CELP 3.2a, a TMS implementation, a G.728 LD-CELP vocoder, and the
L&H implementation.

http://mi.eng.cam.ac.uk/comp.speech/Section3/Q3.1.html (1 of 2) [10/31/2003 8:42:59 AM]


Q3.1: Speech compression techniques

Back to Section 3 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q3.2], [Q3.3]

Administrivia, Copyright, Submit Information : Last Revision: 02:00 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Q3.1.html (2 of 2) [10/31/2003 8:42:59 AM]


Q3.2: Information on speech coding and compression

Q3.2: Information on speech coding


and compression
Reference Books

The following books cover speech coding/compression.

● Douglas O'Shaughnessy, Speech Communication: Human and Machine, Addison Wesley


series in Electrical Engineering: Digital Signal Processing, 1987.
● Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech Processing. London:
Prentice/Hall International, 1985.
N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, ISBN 0-13-211913-7 01,
1984.
● W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis, Elsevier, Amsterdam,
1995.
Contents, preface etc on the WWW: http://www.elsevier.nl/section/engtech/scs/menu.htm
● Thomas P. Barnwell, Kambiz Nayebi and Craig H Richardson, Speech Coding: A Computer
Laboratory Textbook, John Wiley and Sons Inc, 1996.
● Schuyler R Quackenbush, Tom P Barnwell III, Mark A Clements, Objective Measures of
Speech Quality, Prentice-Hall, 1988.

And the are good tutorial articles.

● Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the IEEE 63 (1975): 561 - 580.

On the WWW

comp.compression FAQ
Includes a few questions and answers on the compression of speech.
ftp://rtfm.mit.edu/pub/usenet/comp.compression/
Tony Robinson's Speech Analysis Course
A complete course on speech analysis, including some stuff on speech coding.
http://svr-www.eng.cam.ac.uk/~ajr/SA95/
http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
ITU Coding Standards
Members of the ITU (International Telecommunications Union) can obtain copies of the Series
G Recommendations (including G.711/721/723/728) from the ITU WWW site
(http://www.itu.ch/) and from http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html.
Jason Woodard's Speech Coding Page
Introduction to speech coding plus information on a series of speech coding standards.

http://mi.eng.cam.ac.uk/comp.speech/Section3/Q3.2.html (1 of 2) [10/31/2003 8:43:00 AM]


Q3.2: Information on speech coding and compression

http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html
WWW searchable online-bibiliography for Phonetics and Speech Technology
Over 8000 entries provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
Ciaran McElroy's Speech Coding Page
Introduction to many types of speech coding.
http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.html

Examples of speech coding

Nam Phamdo's Speech Coding Demonstration


Examples of ADPCM, LD-CELP, CELP, LPC10 and CELP coding and coding over a noisy
channel.
http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html
Phil Karn's Digital/Analog Voice Demo
Examples of several speech coding systems.
http://www.qualcomm.com/people/pkarn/voicedemo/

Back to Section 3 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q3.1], [Q3.3]

Administrivia, Copyright, Submit Information : Last Revision: 01:02 07-Mar-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Q3.2.html (2 of 2) [10/31/2003 8:43:00 AM]


Q3.3: Speech Compression / Coding Software

Q3.3: Speech Compression /


Coding Software
The following speech compression software is described in the FAQ.

32 kbps ADPCM
Castleton Network Systems - G.729 Voice Coder
CELP 3.2a & LPC-10
8 Kbit/s CELP on the TMS320C5x family of DSP chips
CyberVoice
Rockwell's DigiTalk
File format conversion
G.711/721/723 Compression
G.728 LD-CELP vocoder
G.728 Compression
GSM 06.10 Compression
Lernout & Hauspie Speech Coding (5 products)
Lernout & Hauspie Speech Coding SDK
MPEG Audio
shorten - a lossless compressor for speech signals
Sipro Lab Telecom Inc. Coding
Sonarc: Digital Audio Compression
StarAudio Compressor/Player
TrueSpeech from DSP Group
U.S.F.S. 1016 CELP vocoder for DSP56001
ToolVox from Voxware

Back to Section 3 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q3.1], [Q3.2],

Administrivia, Copyright, Submit Information : Last Revision: 13:54 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Q3.3.html [10/31/2003 8:43:01 AM]


comp.speech FAQ Section 5

Speech Synthesis
comp.speech FAQ Section 5

SpeechLinks: Speech Synthesis


Q5.1: What is speech synthesis?
Q5.2: How can speech synthesis be performed?
Q5.3: References/Books on Synthesis
Q5.4: Speech Synthesis on the WWW
Q5.5: Speech Synthesis Software/Hardware

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 01:44 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/FAQ5.html [10/31/2003 8:45:04 AM]


Q5.1: What is speech synthesis?

Q5.1: What is speech synthesis?


Speech synthesis programs convert written input to spoken output by automatically generating
synthetic speech. Speech synthesis is often referred to a "Text-to-Speech" conversion (TTS).

Back to Section 5 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q5.2], [Q5.3], [Q5.4], [Q5.5]

Administrivia, Copyright, Submit Information : Last Revision: 11:17 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.1.html [10/31/2003 8:45:05 AM]


Q5.2: How can speech synthesis be performed?

Q5.2: Performing speech synthesis


There are several algorithms. The choice depends on the task they're used for. The easiest way is to
just record the voice of a person speaking the desired phrases. This is useful if only a restricted
volume of phrases and sentences is used, e.g. messages in a train station, or schedule information via
phone. The quality depends on the way recording is done.

More sophisticated but worse in quality are algorithms which split the speech into smaller pieces. The
smaller those units are, the less are they in number, but the quality also decreases. An often used unit
is the phoneme, the smallest linguistic unit. Depending on the language used there are about 35-50
phonemes in western European languages, i.e. there are 35-50 single recordings. The problem is
combining them as fluent speech requires fluent transitions between the elements. The intellegibility
is therefore lower, but the memory required is small.

A solution to this dilemma is using diphones. Instead of splitting at the transitions, the cut is done at
the center of the phonemes, leaving the transitions themselves intact. This gives about 400 elements
(20*20) and the quality increases.

The longer the units become, the more elements are there, but the quality increases along with the
memory required. Other units which are widely used are half-syllables, syllables, words, or
combinations of them, e.g. word stems and inflectional endings.

The Museum of Speech Analysis and Synthesis has pictures of artificial speech systems going back
over 150 years: worth a visit. ( http://mambo.ucsc.edu/psl/smus/smus.html)

Back to Section 5 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q5.1], [Q5.3], [Q5.4], [Q5.5]

Administrivia, Copyright, Submit Information : Last Revision: 14:32 08-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.2.html [10/31/2003 8:45:06 AM]


Q5.3: References/Books on Synthesis

Q5.3: References/Books on
Synthesis
Books and Papers

● Thierry Dutoit, An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers


(Dordrecht), 1997, ISBN 0-7923-4498-7, 312 pages. Volume 3 in the series on Text, Speech
and Language Technology.
● Douglas O'Shaughnessy, Speech Communication: Human and Machine Addison Wesley series
in Electrical Engineering: Digital Signal Processing, 1987.
● T.V. Raman, Auditory User Interfaces --Toward The Speaking Computer Kluwer Academic
Publishers, Boston, ISBN 0-7923-9984-6, August 1997, 168 pp.
● D. H. Klatt, "Review of Text-To-Speech Conversion for English", Jnl. of the Acoustic Society
of America (JASA), Vol 82, pp 737-793.
● "Talking Machines, Theories, Models and Designs" Eds, G. Bailly & C. Benoit (Elsevier:
North Holland)
● I. H. Witten. Principles of Computer Speech, London: Academic Press, Inc., 1982.
● W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis, Elsevier, Amsterdam,
1995.
Contents, preface etc on the WWW: http://www.elsevier.nl/section/engtech/scs/menu.htm
● John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech: The MITalk System",
Cambridge University Press, 1987.
● J.P.H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, "Progress in Speech
Synthesis", Springer, 1996.

On the WWW

● Survey of the State of the Art in Human Language Technology


Report edited by Ronald A. Cole et. al. with a section on Text-to-Speech Technologies.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html

Bibliographies and Reference Lists

● WWW searchable online-bibiliography for Phonetics and Speech Technology with more than
8000 entries. Provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
● Computational Speech Processing: Speech Analysis, Recognition, Understanding,
Compression, Transmission, Coding, Synthesis ; Text to Speech Systems, Speech to Tactile
Displays, Speaker Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.3.html (1 of 2) [10/31/2003 8:45:06 AM]


Q5.3: References/Books on Synthesis

Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187
Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html

Back to Section 5 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q5.1], [Q5.2], [Q5.4], [Q5.5]

Administrivia, Copyright, Submit Information : Last Revision: 18:18 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.3.html (2 of 2) [10/31/2003 8:45:06 AM]


Q5.4: Speech Synthesis on the WWW

Q5.4: Speech Synthesis on the WWW


Most of the following are links to WWW pages with demonstrations of speech synthesis. Plenty more links are included in the
detailed list of speech synthesis software/hardware in Q5.5.

Speech Synthesis "Museum"


URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the University of Birmingham.
Information and speech samples for
❍ YorkTalk

❍ Loughborough Sound Images

❍ University of Birmingham - FDFS

❍ Eurovocs

❍ DECtalk

❍ AT&T Bell Labs Synthesiser

❍ S.W.A.Ll.C. - Welsh Synthesis from CSTR

❍ All-Prosodic Speech Synthesis - IPOX

❍ Orator from Bellcore

The Festival Speech Synthesis System


http://www.cstr.ed.ac.uk/projects/festival.html
Pre-synthesized examples in English, Welsh and Spanish, and online demo of English.
Pavarobotti
http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
WWW demo of the Pavarobotti synthesis technology developed at the National Center for Voice and Speech
(http://www.shc.uiowa.edu/ncvs_home.html).
Say...
http://wwwtios.cs.utwente.nl/say
WWW demo of the rsynth speech synthesis software. The WWW capability was implemented by Axel Belinfante.
Musee sonore de la synthese de la Parole en francais
http://www.icp.grenet.fr/exemples_synthese/ex.html
Speech synthesis examples from a series of French language speech synthesisers plus links to other speech synthesis demo
pages.
❍ ICP-Grenoble

❍ CNET-Lannion (with TD-PSOLA)

❍ KTH-Stockholm

❍ Universite-Mons - several versions

Lucent Technologies Bell Labs Text-to-Speech


http://www.bell-labs.com/project/tts/
Demos and samples of the latest Lucent Technologies Bell Labs Text-to-Speech system.
WATSON FlexTalk from AT&T Advanced Speech Products Group
http://www.att.com/aspg/demo.html
WWW interface to the WATSON FlexTalk speech synthesis demonstration.
AT&T Bell Laboratories Voices
http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
WWW interface to the AT&T Bell Laboratories text to speech (TTS) synthesizer
Laureate from British Telecom
http://www.labs.bt.com/innovate/speech/laureate/
Demo of the Laureate speech synthesis system - not yet commercially available.
ORATOR from Bellcore
Online demo of the ORATOR system developed at Bellcore.
http://www.bellcore.com/ORATOR/
SVOX from TIK, ETH in Zurich
http://www.tik.ee.ethz.ch/cgi-bin/w3svox
Demo of German speech synthesis from Institut fur Technische Informatik und Kommunikationsnetze.
Speech Synthesis Research at OGI

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.4.html (1 of 2) [10/31/2003 8:45:08 AM]


Q5.4: Speech Synthesis on the WWW

http://www.cse.ogi.edu/CSLU/research/TTS
Examples of diphone speech corpora and algorithms developed at OGI for synthesis of American English and Mexican
Spanish using the Festival framework.
Lyricos
http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
Demos of the Lyricos singing voice synthesis system. Concatenation-based synthesis of singing voice from MIDI input.
Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
http://www.fb9-ti.uni-duisburg.de/demos/speech.html
Synthesis in German, English or Japanese.
TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan
http://www.speech.kth.se/info/software.html
Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish, British and American English, French, German, Italian, Spanish,
LA Spanish and Greek.
Haskins Laboratory WWW Site
http://www.haskins.yale.edu/Haskins/MISC/special.html
Examples of several types of speech synthesis. Articulatory Synthesis by HyperASY. SineWave Synthesis. Gestural
Computational Model. Pattern Playback system of the 1940's!
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
http://www.bestspeech.com/weblang.html
Eurovocs Multilingual Speech Synthesis
http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
Based on Lernout and Hauspie technology.
HADIFIX German Speech Synthesis
http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
Provided by the Instituts fur Kommunikationsforschung und Phonetik, Universitat Bonn.
Centigram's TruVoice Demo
http://www.centigram.com/centigram/TruVoice/index.html
Allows control of speech rate, pitch and other prosodic charateristics.
MBROLA: Free Speech Synthesis Project
http://tcts.fpms.ac.be/synthesis/modelcmp.html
WWW demo of MBROLA which compares the quality of PSOLA, MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic
concatenative synthesizers. Provided by the TCTS Lab, Faculti Polytechnique de Mons, Belgium
Institute of Phonetic Sciences
http://fonsg3.let.uva.nl/IFA-Features.html
Links to lots of on-line speech synthesis demonstrations provided by the Institute of Phonetic Sciences of the Faculty of Arts
of the University of Amsterdam.
Yahoo page on speech generation
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Generation/

Back to Section 5 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q5.1], [Q5.2], [Q5.3], [Q5.5]

Administrivia, Copyright, Submit Information : Last Revision: 17:33 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.4.html (2 of 2) [10/31/2003 8:45:08 AM]


Q5.5: Speech Synthesis Software/Hardware

Q5.5: Speech Synthesis


Software/Hardware
Please email any updates, corrections or additions to the following list. The range of commercially
available synthesis software is growing rapidly so any help in keeping up to date will be appreciated.

Other lists of speech synthesis software on the WWW include:

Kevin Lenzo's list of Macintosh Speech Resources and Apps


http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
Speech Toys Speech Synthesis Information
http://www.speechtoys.com/spchtoys/spsyn.html

In the FAQ...

The following speech recognition software/hardware is described in the comp.speech FAQ.

Apple Macintosh
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
Infovox Product Range
Macintosh Speech Output Applications
Macintosh Speech Synthesis Manager
MacYack Pro
MBROLA: Free Speech Synthesis Project
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
Sound Bytes DeveloperUs Kit
Macintosh Speech Synthesis Manager

Windows (including 95, NT, 3.1)


AcuVoice
AT&T Watson Speech Synthesis
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
Creative TextAssist and TextAssist API
DECtalk: Text-to-Speech from Digital
ETI-Eloquence
HADIFIX
Infovox Product Range
IPOX: All Prosodic Speech Synthesis Architecture

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.5.html (1 of 3) [10/31/2003 8:45:11 AM]


Q5.5: Speech Synthesis Software/Hardware

Lernout and Hauspie Text-To-Speech Windows SDK


Listen2 Text Reader
MBROLA: Free Speech Synthesis Project
Monologue for Windows from First Byte
PAM - A Text-To-Speech Application
ProVerbe Speech Engine from ELAN Informatique
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
Sound Bytes DeveloperUs Kit
Tinytalk
TruVoice from Centigram
WinSpeech
ZMD Speech Synthesis

DOS
CSRE: Computerized Speech Research Environment
Infovox Product Range
MBROLA: Free Speech Synthesis Project
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
spchsyn.exe
Tinytalk
ZMD Speech Synthesis

OS/2
ProVerbe Speech Engine from ELAN Informatique
ProVoice Developer's Speech Toolkit from First Byte
Sound Bytes DeveloperUs Kit

Unix
AcuVoice
AsTeR
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
DECtalk: Text-to-Speech from Digital
ETI-Eloquence
Emacspeak - A Speech Output Subsystem For Emacs
Festival Speech Synthesis System
JSRU
Klatt-style synthesiser
KPE80 - A Klatt Synthesiser and Parameter Editor
"learph": Trainable text-to-phoneme software by Antonio Lucca
Lucent Technologies Bell Labs Text-to-Speech system

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.5.html (2 of 3) [10/31/2003 8:45:11 AM]


Q5.5: Speech Synthesis Software/Hardware

MBROLA: Free Speech Synthesis Project


Orator from Bellcore
ProVerbe Speech Engine from ELAN Informatique
rsynth
SENSYN speech synthesizer
SGI Developers Toolbox Synthesiser
Speak
TrueTalk
TruVoice from Centigram

Integrated Circuits and Dedicated Hardware


Eurovocs
Infovox Product Range
ProVerbe Speech Engine from ELAN Informatique
RC Systems V8600/V8601 Text to Speech synthesizers

Other Platforms
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
TheBigMouth (NeXT)
MBROLA: Free Speech Synthesis Project
Narrator Translator Library (Amiga)
Narrator (Amiga)
TextToSpeech Kit (NeXT)
Orator from Bellcore
SENSYN speech synthesizer
WreadFiles: File reader for Commodore Amiga

Unknown
Lernout and Hauspie Text-To-Speech (3 products)
SIMTEL
Text to Phoneme Program 1
Text to phoneme program 2
Text to phoneme program 3

Back to Section 5 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q5.1], [Q5.2], [Q5.3], [Q5.4],

Administrivia, Copyright, Submit Information : Last Revision: 17:17 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Q5.5.html (3 of 3) [10/31/2003 8:45:11 AM]


comp.speech FAQ Section 6

Speech Recognition
comp.speech FAQ Section 6

SpeechLinks: Speech Recognition


Q6.1: What is speech recognition?
Q6.2: How is speech recognition performed?
Q6.3: How can I build a simple speech recogniser?
Q6.4: References & books on speech recognition
Q6.5: Speech Recognition Hardware/Software
Q6.6: Speaker Recognition (Verification and Identification)
Q6.7: Integrated Speech Products

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5,
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 17:46 18-Jun-1996

http://mi.eng.cam.ac.uk/comp.speech/FAQ6.html [10/31/2003 8:46:45 AM]


Q6.1: What is speech recognition?

Q6.1: What is speech recognition?


Automatic Speech Recognition
Automatic speech recognition is the process by which a computer maps an acoustic speech signal to
text.

Automatic speech understanding is the process by which a computer maps an acoustic speech signal
to some form of abstract meaning of the speech.

What does speaker dependent / adaptive /


independent mean?
A speaker dependent system is developed to operate for a single speaker. These systems are usually
easier to develop, cheaper to buy and more accurate, but not as flexible as speaker adaptive or speaker
independent systems.

A speaker independent system is developed to operate for any speaker of a particular type (e.g.
American English). These systems are the most difficult to develop, most expensive and accuracy is
lower than speaker dependent systems. However, they are more flexible.

A speaker adaptive system is developed to adapt its operation to the characteristics of new speakers.
It's difficulty lies somewhere between speaker independent and speaker dependent systems.

What does small/medium/large/very-large


vocabulary mean?
The size of vocabulary of a speech recognition system affects the complexity, processing
requirements and the accuracy of the system. Some applications only require a few words (e.g.
numbers only), others require very large dictionaries (e.g. dictation machines). There are no
established definitions, however, try

● small vocabulary - tens of words


● medium vocabulary - hundreds of words
● large vocabulary - thousands of words
● very-large vocabulary - tens of thousands of words.

What does continuous speech or isolated-word

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.1.html (1 of 2) [10/31/2003 8:46:45 AM]


Q6.1: What is speech recognition?

mean?
An isolated-word system operates on single words at a time - requiring a pause between saying each
word. This is the simplest form of recognition to perform because the end points are easier to find and
the pronunciation of a word tends not affect others. Thus, because the occurrences of words are more
consistent they are easier to recognise.

A continuous speech system operates on speech in which words are connected together, i.e. not
separated by pauses. Continuous speech is more difficult to handle because of a variety of effects.
First, it is difficult to find the start and end points of words. Another problem is "coarticulation". The
production of each phoneme is affected by the production of surrounding phonemes, and similarly the
the start and end of words are affected by the preceding and following words. The recognition of
continuous speech is also affected by the rate of speech (fast speech tends to be harder).

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.2], [Q6.3], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 17:42 18-Jun-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.1.html (2 of 2) [10/31/2003 8:46:45 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.2.html

Q6.2: How is speech recognition


performed?
A wide variety of techniques are used to perform speech recognition. There are many types of speech
recognition. There are many levels of speech recognition / analysis / understanding.

Typically speech recognition starts with the digital sampling of speech. The next stage is acoustic
signal processing. Most techniques include spectral analysis; e.g. LPC analysis (Linear Predictive
Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling and many more.

The next stage is recognition of phonemes, groups of phonemes and words. This stage can be
achieved by many processes such as DTW (Dynamic Time Warping), HMM (hidden Markov
modelling), NNs (Neural Networks), expert systems and combinations of techniques. HMM-based
systems are currently the most commonly used and most successful approach.

Most systems utilise some knowledge of the language to aid the recognition process.

Some systems try to "understand" speech. That is, they try to convert the words into a representation
of what the speaker intended to mean or achieve by what they said.

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.1], [Q6.3], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 01:03 16-Apr-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.2.html [10/31/2003 8:46:46 AM]


Q6.3: How can I build a simple speech recogniser?

Q6.3: How can I build a simple


speech recogniser?
QUICKY RECOGNIZER sketch:

Doug Danforth provides a detailed account in article 253 in the comp.speech archives. A summary is
provided below. It is also available by anonymous ftp

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognition

This is a simple recognizer that should give you 85%+ recognition accuracy. The accuracy is a
function of the words you have in your vocabulary. Long distinct words are easy. Short similar words
are hard. You can get 98+% on the digits with this recognizer.

Overview:

● Find the begining and end of the utterance.


● Filter the raw signal into frequency bands.
● Cut the utterance into a fixed number of segments.
● Average data for each band in each segment.
● Store this pattern with its name.
● Collect training set of about 3 repetitions of each pattern (word).
● Recognize unknown by comparing its pattern against all patterns in the training set and
returning the name of the pattern closest to the unknown.

Many variations upon the theme can be made to improve the performance. Try different filtering of
the raw signal and different processing methods.

Public Domain Recognition Software

Q6.5 contains information on public domain speech recognition software including: Lotec and Myers'
Hidden Markov Model software.

Discrete Hidden Markov Model Demonstration Software

Hidden Markov Models (HMMs) are widely used in speech recognition systems. Joe Picone has put
together some demonstration software for basic discrete HMMs including Viterbi and Baum-Welch
training and evaluation, random sequence generation (generating data from a model), and model
updating (useful for incremental training). There is a simple demo program that supports all of these
modes from command line arguments. This allows experiments to test the classic coin-toss examples
commonly described in textbooks. The code closely parallels the following textbook:

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.3.html (1 of 2) [10/31/2003 8:46:47 AM]


Q6.3: How can I build a simple speech recogniser?

● J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time Processing of Speech Signals,
MacMillan, 1993, ISBN: 0-02-328301-7.

The code is written in C++ and is intended to facilitate learning and understanding of the algorithms.
The code is available on the ISIP web site:
http://www.isip.msstate.edu/software/

Lecture notes corresponding to the examples are also available:


http://www.isip.msstate.edu/publications/1996/speech_recognition_short_course

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.1], [Q6.2], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 13:13 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.3.html (2 of 2) [10/31/2003 8:46:47 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.4.html

Q6.4: References & books on


speech recognition
● Product Reviews and Comparisons
● Using Speech Recognition: Health Issues
● On the WWW
● Technology: General and Introductory
● Technical
● Course Notes
● Bibliographies and Reference Lists

Product Reviews and Comparisons

● "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
● "Seybold Report on Desktop Publishing" published a nine-page, head-to-head comparison of
Dragon's DOS software with IBM's OS/2 software. March 7, 1994; Volume 8, Number 7;
Pages 3-11; ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA 19063 USA,
phone (610) 565-2480.
● McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration," published a two-page
review of IBM's Personal Dictation System software. May 1994; Volume ?, Number ?; Pages
145-146; ISSN:0360-5280; Editorial, Executive, and Circulation address: One Phoenix Mill
Lane, Peterborough, NH 03458 USA, phone ?

Using Speech Recognition: Health Issues

● The National Center for Voice and Speech provides some basic information on preserving
"Vocal Health" on their WWW site: http://www.shc.uiowa.edu/hygiene/home.html
● Voice Users Mailing List: detail in Q1.4.html of the FAQ.
● Typing Injury FAQ: http://www.cs.princeton.edu:80/~dwallach/tifaq/ has a range of
information on Typing Injuries, avoiding them, alternatives and more.
● Typing Injuries Page: http://alumni.caltech.edu/~dank/typing-archive.html has links to dozens
of useful resources.
● Voice Problems -- Prevention and Correction: advice on preserving your voice with specific
hints for using speech recognition. ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-
problems
● " Talking to a PC May Be Hazard To Your Throat", by Julie Chao in the Wall Street Journal.
● " Talking to Computers Has its Hazards", by Gordon Arnaut in The Globe and Mail

On the WWW

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.4.html (1 of 3) [10/31/2003 8:46:48 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.4.html

● Survey of the State of the Art in Human Language Technology: Report edited by Ronald A.
Cole et. al. with a section on Spoken Input Technologies.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html

Technology: General and Introductory

Some general introduction books on speech recognition technology:

● Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang Juang Englewood


Cliffs NJ: PTR Prentice Hall (Signal Processing Series), c1993, ISBN 0-13-015157-2
● Speech recognition by machine; W.A. Ainsworth London: Peregrinus for the Institution of
Electrical Engineers, c1988
● Speech synthesis and recognition; J.N. Holmes Wokingham: Van Nostrand Reinhold, c1988
● Speech Communication: Human and Machine, Douglas O'Shaughnessy; Addison Wesley
series in Electrical Engineering: Digital Signal Processing, 1987.
● Electronic speech recognition: techniques, technology and applications, edited by Geoff
Bristow, London: Collins, 1986
● Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu Lee. San Mateo: Morgan
Kaufmann, c1990

Technical

● Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki, M.A. Jack. Edinburgh:
Edinburgh University Press, c1990
● Speech Recognition: The Complete Practical Reference Guide; T. Schalk, P. J. Foster:
Telecom Library Inc, New York; ISBN O-9366648-39-2; 377 pages; paperback only. Covers
speech recognition in a telephony environment and wish to use call processing hardware based
in PCs. It is written using Dialogic hardware as the example for the hardware.
● Automatic speech recognition: the development of the SPHINX system; by Kai-Fu Lee; Boston;
London: Kluwer Academic, c1989
● An Introduction to the Application of the Theory of Probabilistic Functions of a Markov
Process to Automatic Speech Recognition, S. E. Levinson, L. R. Rabiner and M. M. Sondhi; in
Bell Syst. Tech. Jnl. v62(4), pp1035--1074, April 1983
● Review of Neural Networks for Speech Recognition, R. P. Lippmann; in Neural Computation,
v1(1), pp 1-38, 1989.
● Automatic Speech and Speaker Recognition: Advanced Topics, C.H. Lee, F.K. Soong and K.K.
Paliwal (Eds.), Kluwer, Boston, 1996.

Course Notes

● Joseph Picone of the Institute for Signal and Information Processing (ISIP) at Mississippi State
University has put the course notes for "Fundamentals of Speech Recognition" on the WWW.
The course covers background probability and phonetics/acoustics, speech signal analysis,
dynamic programming, dynamic time warping, hidden Markov modelling, language

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.4.html (2 of 3) [10/31/2003 8:46:48 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.4.html

modelling, neural networks, etc. The WWW sites provides the syllabus and lecture notes.
WWW: http://www.isip.msstate.edu/publications/1996/ee_8993/

Bibliographies and Reference Lists

● WWW searchable online-bibiliography for Phonetics and Speech Technology with more than
8000 entries. Provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
● Computational Speech Processing: Speech Analysis, Recognition, Understanding,
Compression, Transmission, Coding, Synthesis ; Text to Speech Systems, Speech to Tactile
Displays, Speaker Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187
Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.1], [Q6.2], [Q6.3], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 16:15 12-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.4.html (3 of 3) [10/31/2003 8:46:48 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html

Q6.5: Speech Recognition Hardware and


Software
The number of speech recognition packages, and the information about the software is changing rapidly. Any help with keeping this
information up to date will be appreciated.

● Products in the FAQ


● Speech Recognition Processors (ICs)
● Recognition Information on the WWW
● Speech Recognition Resellers and Value-Add

In the FAQ:

The following speech recognition software/hardware is described in the comp.speech FAQ.

Apple Macintosh
Digital Dreams Speech Recognition Plug-Ins
Dragon Dictation Products
Macintosh Speech Recognition Manager
PowerSecretary

Windows (including 95, NT, 3.1)


AT&T Watson Speech Recognition
Cambridge Voice for Windows
CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
DragonDictate for Windows
Dragon Dictation Products
Dragon Developer Tools
Ficomp Interpreter 6000
IBM VoiceType Dictation and Control
IN CUBE
Kurzweil Speech Recognition (2 products)
Lernout & Hauspie ASR SDK
Listen for Windows 2.0 from Verbex Voice Systems
Microsoft Speech Recognition
NCC Dictate
Phonetic Engine 500 (PE500) from Speech Systems, Inc.
Philips Speech Recognition (2 products)
ProNotes Voice Tools
PureSpeech
smARTspeak from Advanced Recognition Technologies, Inc.
Visual Voice from Stylus Innovation
VoiceAssist for Windows from Creative Labs, Inc.
VoiceServer for Windows
Whisper
WildCard Speech Products

DOS
DATAVOX - French
Dragon Developer Tools
Ficomp Interpreter 6000
Jialong He's Speech Recognition Research Tool
smARTspeak from Advanced Recognition Technologies, Inc.

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html (1 of 4) [10/31/2003 8:46:50 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html

Votan VPC2100 Voice Card and VSP 1010 Speech Processor

OS/2
IBM VoiceType Dictation and Control

Unix
AbbotDemo
BBN Hark Telephony Recognizer
EARS: Single Word Recognition Package
Ficomp Interpreter 6000
Hidden Markov Model Toolkit (HTK) from Entropic
IN CUBE
Jialong He's Speech Recognition Research Tool
Lotec Speech Recognition Package
Myers' Hidden Markov Model software
NICO Artificial Neural Network Toolkit
Nuance Speech Recognition System
PureSpeech
recnet

Integrated Circuits and Dedicated Hardware


HM2007 - Speech Recognition Chip
OKI VRP6679 - Speech Recognition Chip
Sensory Inc. Integrated Circuits
Speech Commander - Verbex Voice Systems
Voice Control Systems Recognition
VCS 2030 & 2060 Voice Dialer

Other Platforms
Simon Says (NeXT)
Voice Command Line Interface (Amiga)
Visus SpeechKit

Unknown
Berkeley Restaurant Project (BeRP)
Lernout & Hauspie ASR (3 products)
Voice-Trek 2.0
Voicetek Corp.
Voice Processing Corporation Speech Recognition Product Line

Speech Recognition Processors (ICs)

Jean-Pierre Lereboullet has put together a detailed list of Voice Recognition Processors which covers about 15 ICs and pieces of
related hardware (including D6106, HM2007, MSM6679, RSC-164, TC8860F/64F/65F, 5A128).
The document is available on the comp.speech ftp server:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProcessors

Recognition Information on the WWW

In addition to the entries on speech recognition in this FAQ, the following WWW sites provide information on speech recognition:

Commercial Speech Recognition: Russ Wilcox of PureSpeech Inc.


http://www.tiac.net/users/rwilcox/speech.html
Macintosh Speech Resources and Apps
http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html (2 of 4) [10/31/2003 8:46:50 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html

Speech Recognition Information: 21st Century Eloquence


http://www.voicerecognition.com/
Applied Speech Technology Laboratory of CLSI at Stanford
http://csli-www.stanford.edu/users/bscott/SRTech.html
Speech Toys Speech Recognition Page
http://www.speechtoys.com/spchtoys/sprec.html
Speech recognition product lists: postings to comp.speech
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognitionProducts
Search Alta Vista for Speech Recognition
Search Lycos for Speech Recognition
Yahoo pages on Speech Recognition
http://www.yahoo.com/business/corporations/computers/software/voice_recognition/
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Recognition/

Speech Recognition Resellers and Value-Added Services

1stVoice
2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
Ph: 415-857-1320, Fax: 415-856-6996
WWW: http://www.1stvoice.com/
Email: mail@1stvoice.com
Dragon Dictation Products
21st Century Eloquence
325-A Royal Poinciana Plaza, Palm Beach, Florida 33480, USA
Ph: 800-245-2133, Fax: 407-835-4901
WWW: http://www.voicerecognition.com/
Kurzweil, IBM VoiceType, Dragon, Kolvox
Auscript (Australia)
Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000, Australia
Ph: +61-2-238 6565, Fax: +61-2-238 6566
WWW: http://www.auscript.com.au/
Dragon Systems
BRITE
WWW: http://www.brite.com/
Computer Telephony Integration & Interactive Voice Response
DAX Systems, Inc.
30 Chapin Road, Unit 1201, P.O. Box 778, Pine Brook, NJ/USA 07058
Ph: +1-201-227-8111, Fax: +1-201-227-8197
Email: info@daxsystems.com
WWW: http://www.daxsystems.com/
Computer Telephony and Integrated Voice Response
HealthCare Resources
1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
Ph: +1-310-937-5156, Fax: +1-310-937-5159
EMail: Scalif@AOL.COM
Power Secretary & Dragon Dictate. Specializing in: Medical/Dental, Motion Picture Industry, Carpal Tunnel related and
Disabled Persons.
O'Brien Resources
Ph: (540) 347-4988 (Address unknown)
Email: obrien@crosslink.net
WWW: http://www.crosslink.net/~obrien/
Kurzweil Voice Recognition Products
SCI VoiceAutomated
215 1/2 Main Street, Huntington Beach, CA 92648, USA
Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
http://www.voiceautomated.com/

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html (3 of 4) [10/31/2003 8:46:50 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html

IBM VoiceType, Kurzweil Voice, DragonDictate and Philips speech.


Synapse
3095 Kerner Blvd., Suite S, San Rafael, CA 94901, USA
Ph: (415) 455-9700, Fax: (415) 455-9801
Email: SYNAPSE_ADAPTIVE@msn.com
WWW: http://www.synapseadaptive.com/
Dragon Systems, Kurzweil and IBM products.
Talk Technology
Ph: 1-800-270-1672, Fax: 1-516-360-1213
Email: info@talktechnology.com
http://www.talktechnology.com/
Talk Technology, Inc.
Tel: +1-718-745-9199, Fax: +1-718-499-6480
Email: mnm@pipeline.com
WWW: http://www.usbusiness.com/talk/
Dragon Dictate and portable (notebook) solutions
ToppCopy Telecom
Email: ffalzett@toppcopy.com
WWW: http://www.toppcopy.com/
Philips Digital Dictation
VoiceWare Systems
230 California Street, Suite 410, San Francisco, CA 94111
Ph: (415) 433-2001, Fax: (415) 433-6909
Email: info@talk2type.com
WWW: http://www.talk2type.com/home.htm
IBM, Dragon Systems, Kurzweil Applied Intelligence, WildCard Technologies
WorkLink
A.D.A. Solutions by WorkLink
2566-A Telegraph Avenue, Berkeley, California 94704 USA
Ph: 510-848-8363, Fax:510-848-7322
WWW: http://www.worklink.net/
Email: wayne@worklink.net
Dragon Dictation Products

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.1], [Q6.2], [Q6.3], [Q6.4], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 18:21 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.5.html (4 of 4) [10/31/2003 8:46:50 AM]


Q6.6: Speaker Recognition (Verification and Identification)

Q6.6: Speaker Recognition


(Verification and Identification)
● Introduction
● In the FAQ
● On the WWW

Introduction

Speaker recognition is the process of automatically recognizing who is speaking on the basis of
individual information included in speech signals. It can be divided into Speaker Identification and
Speaker Verification. Speaker identification determines which registered speaker provides a given
utterance from amongst a set of known speakers. Speaker verification accepts or rejects the identity
claim of a speaker - is the speaker the person they say they are?

Speaker recognition technology makes it possible to a the speaker's voice to control access to
restricted services, for example, phone access to banking, database services, shopping or voice mail,
and access to secure equipment.

Both technologies require users to "enroll" in the system, that is, to give examples of their speech to a
system so that it can characterise (or learn) their voice patterns.

In the FAQ:

ImagineNation: Voice Activated UnLock Technology


Jialong He's Speaker Recognition (Identification) Tool
Keyware Biometric Security Products
SpeakerKey Voice Verifier from ITT
SpeakEZ Voice Print Speaker Verification
Voice Control Systems: Speaker Verification Technology

On the WWW

Survey of the State of the Art in Human Language Technology


Report edited by Ronald A. Cole et. al. with a section on Speaker Recognition.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html
Speaker Identification And Verification: LIMSI Report
A technical description.
http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.6.html (1 of 2) [10/31/2003 8:46:51 AM]


Q6.6: Speaker Recognition (Verification and Identification)

Long Index of References on Automatic Speaker Verification


A list of more than 350 papers on speaker verification in text or BibTeX format. Provided by
G.Matas.
http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html
CAVE: Caller Verification in Banking and Telecommunications
European consortium developing speaker recognition technologies.
http://www.ptt-telecom.nl/cave/
Hangai Lab demonstrations of speaker verification and speaker identification.
Do it yourself demonstrations:
http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech1.html
http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech2.html

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.1], [Q6.2], [Q6.3], [Q6.4], [Q6.5], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 14:07 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.6.html (2 of 2) [10/31/2003 8:46:51 AM]


Q6.7: Integrated Speech Products

Q6.7: Integrated Speech Products


This section lists those products which integrate different speech technologies into a single user
package. For example, speech recognition and speech synthesis can be combined to provide a dialog
management system. Strictly speaking, this doesn't really belong under in Section 6 (Speech
Recognition) but since these products all include speech recognition, it seems a reasonable place to
put it for now!

In the FAQ...

SpeechWorks™from Applied Language Technologies, Inc.


Nortel Speech Technology Products

Back to Section 6 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q6.1], [Q6.2], [Q6.3], [Q6.4], [Q6.5], [Q6.6],

Administrivia, Copyright, Submit Information : Last Revision: 14:49 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Q6.7.html [10/31/2003 8:46:52 AM]


comp.speech FAQ Section 1

General Speech Technology


comp.speech FAQ Section 1

SpeechLinks: General
Q1.1: What is comp.speech?
Q1.2: comp.speech ftp site
Q1.3: Common abbreviations and jargon
Q1.4: Related newsgroups and mailing lists
Q1.5: Associations, publications and conferences
Q1.6: Handicap Aids
Q1.7: Speech Databases
Q1.8: Speech File Formats and Conversion
Q1.9: Speech Laboratory Environments and Audio Editors
Q1.10: Speech Research Sites
Q1.11: Miscellaneous Software and Resources

Back to the comp.speech FAQ Home Page


Jump to Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 18:41 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/FAQ1.html [10/31/2003 8:46:53 AM]


comp.speech FAQ Section 4

Natural Language Processing


comp.speech FAQ Section 4

There is now a newsgroup specifically for Natural Language Processing; comp.ai.nat-lang. A FAQ
posting is available for the group:

ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language_Processing_FAQ

There is also a lot of useful information on Natural Language Processing in the comp.ai FAQ. That
FAQ lists available software and useful references. It includes a substantial list of software,
documentation and other info available by ftp.

The FAQ has information on the following:

Q4.1: NLP References and Books


Q4.2: NLP Software

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 00:35 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/FAQ4.html [10/31/2003 8:46:53 AM]


FAQ: Table of Contents

comp.speech FAQ

Table of Contents
SpeechLinks: Speech Technology Hyperlinks Pages

SpeechLinks: 500+ Speech Technology Links


SpeechLinks: General Speech Technology Links
SpeechLinks: Signal Processing for Speech
SpeechLinks: Speech Coding
SpeechLinks: Speech Synthesis
SpeechLinks: Speech Recognition

List Of Software/Hardware

Update Times

Availability

Odds 'n Ends

FAQ Section 1: General Information on Speech Technology

SpeechLinks: General
Q1.1: What is comp.speech?
Q1.2: comp.speech ftp site
Q1.3: Common abbreviations and jargon
Q1.4: Related newsgroups and mailing lists
Q1.5: Associations, publications and conferences
Q1.6: Handicap Aids
Q1.7: Speech Databases
Q1.8: Speech File Formats and Conversion
Q1.9: Speech Laboratory Environments and Audio Editors
Q1.10: Speech Research Sites
Q1.11: Miscellaneous Software and Resources

FAQ Section 2: Signal Processing

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Contents.html (1 of 3) [10/31/2003 8:46:54 AM]


FAQ: Table of Contents

SpeechLinks: Signal Processing for Speech


Q2.1: What sampling do I need for speech?
Q2.2: Finding the pitch of a speech signal
Q2.3: How do I find the start and end points of a speech signal?
Q2.4: Where can I find FFT software?
Q2.5: Signal processing in speech technology
Q2.6: Speech sampling and signal processing hardware
Q2.7: How do I convert to/from mu-law format?
Q2.8: Signal Processing Software

FAQ Section 3: Speech Coding and Compression

SpeechLinks: Speech Coding


Q3.1: Speech compression techniques
Q3.2: Information on speech coding and compression
Q3.3: Speech Compression / Coding Software

FAQ Section 4: Natural Language Processing

Q4.1: NLP References and Books


Q4.2: NLP Software

FAQ Section 5: Speech Synthesis

SpeechLinks: Speech Synthesis


Q5.1: What is speech synthesis?
Q5.2: How can speech synthesis be performed?
Q5.3: References/Books on Synthesis
Q5.4: Speech Synthesis on the WWW
Q5.5: Speech Synthesis Software/Hardware

FAQ Section 6: Speech Recognition

SpeechLinks: Speech Recognition


Q6.1: What is speech recognition?
Q6.2: How is speech recognition performed?
Q6.3: How can I build a simple speech recogniser?
Q6.4: References & books on speech recognition
Q6.5: Speech Recognition Hardware/Software
Q6.6: Speaker Recognition (Verification and Identification)

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Contents.html (2 of 3) [10/31/2003 8:46:54 AM]


FAQ: Table of Contents

Q6.7: Integrated Speech Products

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 18:47 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Contents.html (3 of 3) [10/31/2003 8:46:54 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

List of Software/Hardware/Information
The comp.speech FAQ provides information on a range of software, hardware and resources.

Q1.6: Handicap Aids


Man-Machine Interfacing
SpeechViewer II

Q1.7: Speech Data


Bavarian Archive for Speech Signals
BUPT Spoken Digit Database (Chinese)
Center for Spoken Language Understanding (CSLU)
Examples of IPA Symbols
Linguistic Data Consortium (LDC)
NOISEX
Oxford Acoustic Phonetic Database
Phonemic Samples
RELATOR project
ShATR
University of Victoria Phonetic Database

Q1.9: Speech Processing Environments


CSRE: Computerized Speech Research Environment
DADiSP from DSP Development Corporation
Entropic Signal Processing System (ESPS) and Waves
GoldWave
Kay Elemetrics Computer Speech Lab
Khoros
Matlab plus Signal Processing Toolbox
MacSpeech Lab II
N!Power
OGI Speech Tools
Ptolemy
Quadravox Speech Processing Products - Qbox
Speech Filing System (SFS)

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (1 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

Signalyze 3.0 from InfoSignal


SoundScope

Q1.11: Miscelaneous Software and Resources


Speech Application Interfaces

ASAPI: Advanced Speech API (AT&T)


SAPI: Microsoft Windows Speech API
SRAPI: Speech Recognition API
TAPI: Microsoft Windows Telephony API

Network "Phone" Software

CUSeeMe
CyberPhone
DigiPhone
InterFACE from Hijinx
FAQ: How can I use the Internet as a telephone?
Nautilus: Secure Computer Telephony
NEVOT (1.4v) from AT&T BL
PGPfone
Speak Freely
Internet Phone from VocalTec
WebPhone
WebTalk

Audio Processing Software

AF version AF3R1
Voice E-Mail from Bonzi Software
MicNotePad Recording Software for Macs
MixViews
Network Audio System Release 1.1
NIST Software - SPHERE and SCORE
Sound Processing Kit
TCPplay

Human Audio Perception

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (2 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

Auditory Modeller 1
Auditory Modeller 2
Auditory Toolbox for Matlab
Human Audio Perception Document

Dictionaries and other Lexical Tools

BEEP dictionary
CMU dictionary
CUVOLAD dictionary (Oxford Dictionary)
Comprehensive Word List
EAT: Edinburgh Associative Thesaurus
Homophone List
Moby Lexical Resources
MRC Psycholinguistic Database
WordNet
Dictionaries on the WWW

Phonetic Fonts and Phonetic Samples

International Phonetic Alphabet


WWW: Phonetic Fonts and Examples Online
Summer Institute of Linguistics IPA Fonts
Phonetic Fonts for TeX and LaTeX
Yamada Language Center

Very Miscellaneous Software

The vOICe
The Learning Company's Language Training
Wildfire - an Electronic Assistant

Q2.6: Audio Hardware


Macintosh Audio Hardware
PC Audio Hardware
Unix Audio Hardware

Q2.8: Signal Processing Software

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (3 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

SigLib from Numerix Ltd.

Q3.3: Compression Software and Hardware


32 kbps ADPCM
Castleton Network Systems - G.729 Voice Coder
CELP 3.2a & LPC-10
8 Kbit/s CELP on the TMS320C5x family of DSP chips
CyberVoice
Rockwell's DigiTalk
File format conversion
G.711/721/723 Compression
G.728 LD-CELP vocoder
G.728 Compression
GSM 06.10 Compression
Lernout & Hauspie Speech Coding (5 products)
Lernout & Hauspie Speech Coding SDK
MPEG Audio
shorten - a lossless compressor for speech signals
Sipro Lab Telecom Inc. Coding
Sonarc: Digital Audio Compression
StarAudio Compressor/Player
TrueSpeech from DSP Group
U.S.F.S. 1016 CELP vocoder for DSP56001
ToolVox from Voxware

Q4.2: Natural Language Processing


● Natural Language Software Registry (NLSR) - NLP Tools
● Part of Speech Tagger

Q5.5: Speech Synthesis


Apple Macintosh
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
Infovox Product Range
Macintosh Speech Output Applications
Macintosh Speech Synthesis Manager
MacYack Pro

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (4 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

MBROLA: Free Speech Synthesis Project


ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
Sound Bytes DeveloperUs Kit
Macintosh Speech Synthesis Manager

Windows (including 95, NT, 3.1)


AcuVoice
AT&T Watson Speech Synthesis
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
Creative TextAssist and TextAssist API
DECtalk: Text-to-Speech from Digital
ETI-Eloquence
HADIFIX
Infovox Product Range
IPOX: All Prosodic Speech Synthesis Architecture
Lernout and Hauspie Text-To-Speech Windows SDK
Listen2 Text Reader
MBROLA: Free Speech Synthesis Project
Monologue for Windows from First Byte
PAM - A Text-To-Speech Application
ProVerbe Speech Engine from ELAN Informatique
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
Sound Bytes DeveloperUs Kit
Tinytalk
TruVoice from Centigram
WinSpeech
ZMD Speech Synthesis

DOS
CSRE: Computerized Speech Research Environment
Infovox Product Range
MBROLA: Free Speech Synthesis Project
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
spchsyn.exe
Tinytalk
ZMD Speech Synthesis

OS/2
ProVerbe Speech Engine from ELAN Informatique

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (5 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

ProVoice Developer's Speech Toolkit from First Byte


Sound Bytes DeveloperUs Kit

Unix
AcuVoice
AsTeR
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
DECtalk: Text-to-Speech from Digital
ETI-Eloquence
Emacspeak - A Speech Output Subsystem For Emacs
Festival Speech Synthesis System
JSRU
Klatt-style synthesiser
KPE80 - A Klatt Synthesiser and Parameter Editor
"learph": Trainable text-to-phoneme software by Antonio Lucca
Lucent Technologies Bell Labs Text-to-Speech system
MBROLA: Free Speech Synthesis Project
Orator from Bellcore
ProVerbe Speech Engine from ELAN Informatique
rsynth
SENSYN speech synthesizer
SGI Developers Toolbox Synthesiser
Speak
TrueTalk
TruVoice from Centigram

Integrated Circuits and Dedicated Hardware


Eurovocs
Infovox Product Range
ProVerbe Speech Engine from ELAN Informatique
RC Systems V8600/V8601 Text to Speech synthesizers

Other Platforms
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
TheBigMouth (NeXT)
MBROLA: Free Speech Synthesis Project
Narrator Translator Library (Amiga)
Narrator (Amiga)
TextToSpeech Kit (NeXT)
Orator from Bellcore
SENSYN speech synthesizer
WreadFiles: File reader for Commodore Amiga

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (6 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

Unknown
Lernout and Hauspie Text-To-Speech (3 products)
SIMTEL
Text to Phoneme Program 1
Text to phoneme program 2
Text to phoneme program 3

Q6.5: Speech Recognition


Apple Macintosh
Digital Dreams Speech Recognition Plug-Ins
Dragon Dictation Products
Macintosh Speech Recognition Manager
PowerSecretary

Windows (including 95, NT, 3.1)


AT&T Watson Speech Recognition
Cambridge Voice for Windows
CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
DragonDictate for Windows
Dragon Dictation Products
Dragon Developer Tools
Ficomp Interpreter 6000
IBM VoiceType Dictation and Control
IN CUBE
Kurzweil Speech Recognition (2 products)
Lernout & Hauspie ASR SDK
Listen for Windows 2.0 from Verbex Voice Systems
Microsoft Speech Recognition
NCC Dictate
Phonetic Engine 500 (PE500) from Speech Systems, Inc.
Philips Speech Recognition (2 products)
ProNotes Voice Tools
PureSpeech
smARTspeak from Advanced Recognition Technologies, Inc.
Visual Voice from Stylus Innovation
VoiceAssist for Windows from Creative Labs, Inc.
VoiceServer for Windows
Whisper
WildCard Speech Products

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (7 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

DOS
DATAVOX - French
Dragon Developer Tools
Ficomp Interpreter 6000
Jialong He's Speech Recognition Research Tool
smARTspeak from Advanced Recognition Technologies, Inc.
Votan VPC2100 Voice Card and VSP 1010 Speech Processor

OS/2
IBM VoiceType Dictation and Control

Unix
AbbotDemo
BBN Hark Telephony Recognizer
EARS: Single Word Recognition Package
Ficomp Interpreter 6000
Hidden Markov Model Toolkit (HTK) from Entropic
IN CUBE
Jialong He's Speech Recognition Research Tool
Lotec Speech Recognition Package
Myers' Hidden Markov Model software
NICO Artificial Neural Network Toolkit
Nuance Speech Recognition System
PureSpeech
recnet

Integrated Circuits and Dedicated Hardware


HM2007 - Speech Recognition Chip
OKI VRP6679 - Speech Recognition Chip
Sensory Inc. Integrated Circuits
Speech Commander - Verbex Voice Systems
Voice Control Systems Recognition
VCS 2030 & 2060 Voice Dialer

Other Platforms
Simon Says (NeXT)
Voice Command Line Interface (Amiga)
Visus SpeechKit

Unknown
Berkeley Restaurant Project (BeRP)
Lernout & Hauspie ASR (3 products)

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (8 of 9) [10/31/2003 8:46:58 AM]


http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html

Voice-Trek 2.0
Voicetek Corp.
Voice Processing Corporation Speech Recognition Product Line

Q6.6: Speaker Verification and Identification


ImagineNation: Voice Activated UnLock Technology
Jialong He's Speaker Recognition (Identification) Tool
Keyware Biometric Security Products
SpeakerKey Voice Verifier from ITT
SpeakEZ Voice Print Speaker Verification
Voice Control Systems: Speaker Verification Technology

Q6.7: Integrated Speech Products


SpeechWorks™from Applied Language Technologies, Inc.
Nortel Speech Technology Products

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 17:19 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/FAQ.Packages.html (9 of 9) [10/31/2003 8:46:58 AM]


comp.speech FAQ: Last Update Times

Document Update Times


Note:
Created or modified in the last month.
Created or modified in the last two months.
Not modified in the last month.

Section 1: General Information on Speech


Technology
05 Sep 1997 : SpeechLinks: General
12 Apr 1996 : Q1.1: What is comp.speech?
12 Apr 1996 : Q1.2: comp.speech ftp site
16 Apr 1997 : Q1.3: Common abbreviations and jargon
05 Sep 1997 : Q1.4: Related newsgroups and mailing lists
05 Sep 1997 : Q1.5: Associations, publications and conferences
27 Sep 1996 : Q1.6: Handicap Aids
14 May 1997 : Q1.7: Speech Databases
12 Apr 1996 : Q1.8: Speech File Formats and Conversion
07 Aug 1996 : Q1.9: Speech Laboratory Environments and Audio Editors
12 Nov 1996 : Q1.10: Speech Research Sites
05 Sep 1997 : Q1.11: Miscellaneous Software and Resources

Q1.7: Speech Data


01 Apr 1996 : Bavarian Archive for Speech Signals
19 Mar 1996 : BUPT Spoken Digit Database (Chinese)
01 Apr 1996 : Center for Spoken Language Understanding (CSLU)
19 Mar 1996 : Examples of IPA Symbols
20 Feb 1997 : Linguistic Data Consortium (LDC)
13 Aug 1996 : NOISEX
19 Mar 1996 : Oxford Acoustic Phonetic Database
19 Mar 1996 : Phonemic Samples
19 Mar 1996 : RELATOR project
12 Nov 1996 : ShATR
15 May 1997 : University of Victoria Phonetic Database
Q1.9: Speech Processing Environments
27 Mar 1996 : CSRE: Computerized Speech Research Environment
29 May 1996 : DADiSP from DSP Development Corporation
19 Mar 1996 : Entropic Signal Processing System (ESPS) and Waves

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (1 of 8) [10/31/2003 8:47:01 AM]


comp.speech FAQ: Last Update Times

19 Mar 1996 : GoldWave


21 Mar 1996 : Kay Elemetrics Computer Speech Lab
02 Oct 1996 : Khoros
19 Mar 1996 : Matlab plus Signal Processing Toolbox
19 Mar 1996 : MacSpeech Lab II
19 Mar 1996 : N!Power
21 Mar 1996 : OGI Speech Tools
19 Mar 1996 : Ptolemy
25 Sep 1996 : Quadravox Speech Processing Products - Qbox
19 Mar 1996 : Speech Filing System (SFS)
16 Oct 1996 : Signalyze 3.0 from InfoSignal
14 Nov 1996 : SoundScope
Q1.11: Miscelaneous Software and Resources
07 Jun 1996 : ASAPI: Advanced Speech API (AT&T)
21 Apr 1997 : SAPI: Microsoft Windows Speech API
10 Jun 1996 : SRAPI: Speech Recognition API
10 Mar 1997 : TAPI: Microsoft Windows Telephony API
19 Mar 1996 : CUSeeMe
19 Mar 1996 : CyberPhone
06 Jan 1997 : DigiPhone
19 Mar 1996 : InterFACE from Hijinx
19 Mar 1996 : FAQ: How can I use the Internet as a telephone?
07 Aug 1996 : Nautilus: Secure Computer Telephony
19 Mar 1996 : NEVOT (1.4v) from AT&T BL
19 Mar 1996 : PGPfone
19 Mar 1996 : Speak Freely
19 Mar 1996 : Internet Phone from VocalTec
19 Mar 1996 : WebPhone
19 Mar 1996 : WebTalk
19 Mar 1996 : AF version AF3R1
19 Mar 1996 : Voice E-Mail from Bonzi Software
25 Sep 1996 : MicNotePad Recording Software for Macs
19 Mar 1996 : MixViews
19 Mar 1996 : Network Audio System Release 1.1
19 Mar 1996 : NIST Software - SPHERE and SCORE
19 Mar 1996 : Sound Processing Kit
27 Mar 1996 : TCPplay
19 Mar 1996 : Auditory Modeller 1
19 Mar 1996 : Auditory Modeller 2
27 Mar 1996 : Auditory Toolbox for Matlab
19 Mar 1996 : Human Audio Perception Document

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (2 of 8) [10/31/2003 8:47:01 AM]


comp.speech FAQ: Last Update Times

12 Aug 1996 : BEEP dictionary


19 Mar 1996 : CMU dictionary
16 Apr 1996 : CUVOLAD dictionary (Oxford Dictionary)
16 Apr 1996 : Comprehensive Word List
16 Apr 1996 : EAT: Edinburgh Associative Thesaurus
19 Mar 1996 : Homophone List
07 Aug 1996 : Moby Lexical Resources
16 Apr 1996 : MRC Psycholinguistic Database
16 Apr 1996 : WordNet
11 Mar 1997 : Dictionaries on the WWW
08 Aug 1996 : International Phonetic Alphabet
11 Apr 1996 : WWW: Phonetic Fonts and Examples Online
08 Aug 1996 : Summer Institute of Linguistics IPA Fonts
08 Aug 1996 : Phonetic Fonts for TeX and LaTeX
19 Mar 1996 : Yamada Language Center
05 Sep 1997 : The vOICe
18 Mar 1996 : The Learning Company's Language Training
18 Mar 1996 : Wildfire - an Electronic Assistant

Section 2: Signal Processing for Speech


05 Sep 1997 : SpeechLinks: Signal Processing for Speech
12 Apr 1996 : Q2.1: What sampling do I need for speech?
12 Apr 1996 : Q2.2: Finding the pitch of a speech signal
13 May 1997 : Q2.3: How do I find the start and end points of a speech signal?
05 Sep 1997 : Q2.4: Where can I find FFT software?
07 Aug 1996 : Q2.5: Signal processing in speech technology
12 Apr 1996 : Q2.6: Speech sampling and signal processing hardware
12 Apr 1996 : Q2.7: How do I convert to/from mu-law format?
13 May 1997 : Q2.8: Signal Processing Software

Q2.6: Audio Hardware


19 Mar 1996 : Macintosh Audio Hardware
08 Aug 1996 : PC Audio Hardware
08 Aug 1996 : Unix Audio Hardware

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (3 of 8) [10/31/2003 8:47:01 AM]


comp.speech FAQ: Last Update Times

Section 3: Speech Coding and Compression


05 Sep 1997 : SpeechLinks: Speech Coding
12 Apr 1996 : Q3.1: Speech compression techniques
07 Mar 1997 : Q3.2: Information on speech coding and compression
05 Sep 1997 : Q3.3: Speech Compression / Coding Software

Q3.3: Compression Software and Hardware


05 Sep 1997 : 32 kbps ADPCM
09 Oct 1996 : Castleton Network Systems - G.729 Voice Coder
26 Sep 1996 : CELP 3.2a & LPC-10
19 Mar 1996 : 8 Kbit/s CELP on the TMS320C5x family of DSP chips
05 Sep 1997 : CyberVoice
19 Mar 1996 : Rockwell's DigiTalk
19 Mar 1996 : File format conversion
19 Mar 1996 : G.711/721/723 Compression
19 Mar 1996 : G.728 LD-CELP vocoder
19 Mar 1996 : G.728 Compression
19 Mar 1996 : GSM 06.10 Compression
13 May 1997 : Lernout & Hauspie Speech Coding (5 products)
13 May 1997 : Lernout & Hauspie Speech Coding SDK
02 Oct 1996 : MPEG Audio
19 Mar 1996 : shorten - a lossless compressor for speech signals
29 May 1996 : Sipro Lab Telecom Inc. Coding
13 Jun 1996 : Sonarc: Digital Audio Compression
14 May 1997 : StarAudio Compressor/Player
19 Mar 1996 : TrueSpeech from DSP Group
19 Mar 1996 : U.S.F.S. 1016 CELP vocoder for DSP56001
19 Mar 1996 : ToolVox from Voxware

Section 4: Natural Language Processing


19 Mar 1996 : Q4.1: NLP References and Books
10 Apr 1996 : Q4.2: NLP Software

Section 5: Speech Synthesis


http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (4 of 8) [10/31/2003 8:47:01 AM]
comp.speech FAQ: Last Update Times

05 Sep 1997 : SpeechLinks: Speech Synthesis


07 Aug 1996 : Q5.1: What is speech synthesis?
08 Aug 1996 : Q5.2: How can speech synthesis be performed?
05 Sep 1997 : Q5.3: References/Books on Synthesis
05 Sep 1997 : Q5.4: Speech Synthesis on the WWW
05 Sep 1997 : Q5.5: Speech Synthesis Software/Hardware

Q5.5: Speech Synthesis Software/Hardware


05 Sep 1997 : AcuVoice
19 Mar 1996 : AsTeR
31 May 1996 : AT&T Watson Speech Synthesis
26 Sep 1996 : BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
16 Oct 1996 : TheBigMouth (NeXT)
26 Sep 1996 : Creative TextAssist and TextAssist API
26 Sep 1996 : CSRE: Computerized Speech Research Environment
12 Nov 1996 : DECtalk: Text-to-Speech from Digital
31 Jan 1997 : ETI-Eloquence
19 Mar 1996 : Emacspeak - A Speech Output Subsystem For Emacs
26 Sep 1996 : Eurovocs
31 Jan 1997 : Festival Speech Synthesis System
21 Mar 1996 : HADIFIX
12 Nov 1996 : Infovox Product Range
26 Sep 1996 : IPOX: All Prosodic Speech Synthesis Architecture
19 Mar 1996 : JSRU
22 Oct 1996 : Klatt-style synthesiser
01 Apr 1996 : KPE80 - A Klatt Synthesiser and Parameter Editor
03 Feb 1997 : "learph": Trainable text-to-phoneme software by Antonio Lucca
13 May 1997 : Lernout and Hauspie Text-To-Speech (3 products)
13 May 1997 : Lernout and Hauspie Text-To-Speech Windows SDK
05 Sep 1997 : Listen2 Text Reader
02 Oct 1997 : Lucent Technologies Bell Labs Text-to-Speech system
26 Sep 1996 : Macintosh Speech Output Applications
05 Sep 1997 : Macintosh Speech Synthesis Manager
26 Sep 1996 : MacYack Pro
26 Sep 1996 : MBROLA: Free Speech Synthesis Project
26 Sep 1996 : Monologue for Windows from First Byte
26 Sep 1996 : Narrator Translator Library (Amiga)
26 Sep 1996 : Narrator (Amiga)
19 Mar 1996 : TextToSpeech Kit (NeXT)
01 May 1996 : Orator from Bellcore

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (5 of 8) [10/31/2003 8:47:01 AM]


comp.speech FAQ: Last Update Times

28 Apr 1997 : PAM - A Text-To-Speech Application


05 Sep 1997 : ProVerbe Speech Engine from ELAN Informatique
19 Mar 1996 : ProVoice Developer's Speech Toolkit from First Byte
26 Sep 1996 : RC Systems V8600/V8601 Text to Speech synthesizers
16 Oct 1996 : rsynth
16 Oct 1996 : SENSYN speech synthesizer
19 Mar 1996 : SGI Developers Toolbox Synthesiser
07 Aug 1996 : SIMTEL
26 Sep 1996 : Sound Bytes DeveloperUs Kit
26 Sep 1996 : spchsyn.exe
26 Sep 1996 : Speak
05 Sep 1997 : Macintosh Speech Synthesis Manager
26 Sep 1996 : Text to Phoneme Program 1
24 Feb 1997 : Text to phoneme program 2
24 Feb 1997 : Text to phoneme program 3
14 Nov 1996 : Tinytalk
26 Sep 1996 : TrueTalk
06 Feb 1997 : TruVoice from Centigram
08 Mar 1997 : WinSpeech
25 Sep 1996 : WreadFiles: File reader for Commodore Amiga
13 May 1997 : ZMD Speech Synthesis

Section 6: Speech Recognition


05 Sep 1997 : SpeechLinks: Speech Recognition
18 Jun 1996 : Q6.1: What is speech recognition?
16 Apr 1997 : Q6.2: How is speech recognition performed?
07 Aug 1996 : Q6.3: How can I build a simple speech recogniser?
12 Nov 1996 : Q6.4: References & books on speech recognition
05 Sep 1997 : Q6.5: Speech Recognition Hardware/Software
05 Sep 1997 : Q6.6: Speaker Recognition (Verification and Identification)
07 Aug 1996 : Q6.7: Integrated Speech Products

Q6.5: Speech Recognition Software/Hardware


19 Mar 1996 : AbbotDemo
31 May 1996 : AT&T Watson Speech Recognition
26 Sep 1996 : BBN Hark Telephony Recognizer
19 Mar 1996 : Berkeley Restaurant Project (BeRP)
07 Aug 1996 : Cambridge Voice for Windows

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (6 of 8) [10/31/2003 8:47:01 AM]


comp.speech FAQ: Last Update Times

25 Sep 1996 : CustomVoice and CustomTelephone: A&G Graphics Interface Inc.


26 Sep 1996 : DATAVOX - French
05 Sep 1997 : Digital Dreams Speech Recognition Plug-Ins
05 Sep 1997 : DragonDictate for Windows
05 Sep 1997 : Dragon Dictation Products
05 Sep 1997 : Dragon Developer Tools
16 Oct 1996 : EARS: Single Word Recognition Package
05 Sep 1997 : Ficomp Interpreter 6000
26 Sep 1996 : HM2007 - Speech Recognition Chip
19 Mar 1996 : Hidden Markov Model Toolkit (HTK) from Entropic
26 Sep 1996 : IBM VoiceType Dictation and Control
19 Mar 1996 : IN CUBE
31 May 1996 : Jialong He's Speech Recognition Research Tool
25 Sep 1996 : Kurzweil Speech Recognition (2 products)
13 May 1997 : Lernout & Hauspie ASR (3 products)
13 May 1997 : Lernout & Hauspie ASR SDK
26 Sep 1996 : Listen for Windows 2.0 from Verbex Voice Systems
26 Sep 1996 : Lotec Speech Recognition Package
25 Sep 1996 : Macintosh Speech Recognition Manager
21 Apr 1997 : Microsoft Speech Recognition
26 Sep 1996 : Myers' Hidden Markov Model software
26 Sep 1996 : NCC Dictate
05 Sep 1997 : NICO Artificial Neural Network Toolkit
02 Oct 1997 : Nuance Speech Recognition System
26 Sep 1996 : OKI VRP6679 - Speech Recognition Chip
26 Sep 1996 : Phonetic Engine 500 (PE500) from Speech Systems, Inc.
18 Nov 1996 : Philips Speech Recognition (2 products)
05 Sep 1997 : PowerSecretary
31 Mar 1996 : ProNotes Voice Tools
16 Apr 1996 : PureSpeech
19 Mar 1996 : recnet
05 Sep 1997 : Sensory Inc. Integrated Circuits
26 Sep 1996 : Simon Says (NeXT)
24 Feb 1997 : smARTspeak from Advanced Recognition Technologies, Inc.
26 Sep 1996 : Speech Commander - Verbex Voice Systems
16 Oct 1996 : Visual Voice from Stylus Innovation
26 Sep 1996 : Voice Command Line Interface (Amiga)
19 Mar 1996 : Voice Control Systems Recognition
19 Mar 1996 : Visus SpeechKit
26 Sep 1996 : VCS 2030 & 2060 Voice Dialer
26 Sep 1996 : Voice-Trek 2.0

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (7 of 8) [10/31/2003 8:47:01 AM]


comp.speech FAQ: Last Update Times

26 Sep 1996 : VoiceAssist for Windows from Creative Labs, Inc.


26 Sep 1996 : VoiceServer for Windows
26 Sep 1996 : Voicetek Corp.
26 Sep 1996 : Votan VPC2100 Voice Card and VSP 1010 Speech Processor
26 Sep 1996 : Voice Processing Corporation Speech Recognition Product Line
21 Apr 1997 : Whisper
25 Sep 1996 : WildCard Speech Products
Q6.6: Speaker Verification and Identification
05 Sep 1997 : ImagineNation: Voice Activated UnLock Technology
31 May 1996 : Jialong He's Speaker Recognition (Identification) Tool
05 Sep 1997 : Keyware Biometric Security Products
31 May 1996 : SpeakerKey Voice Verifier from ITT
25 Sep 1996 : SpeakEZ Voice Print Speaker Verification
31 May 1996 : Voice Control Systems: Speaker Verification Technology
Q6.7: Integrated Speech Products
25 Sep 1996 : SpeechWorks™from Applied Language Technologies, Inc.
07 Aug 1996 : Nortel Speech Technology Products

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List,

Administrivia, Copyright, Submit Information : Last Revision: 17:19 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/Update.Times.html (8 of 8) [10/31/2003 8:47:01 AM]


Q1.9: Speech Laboratory Environments and Audio Editors

Q1.9: Speech Laboratory


Environments and Audio Editors
First, what is a Speech Laboratory Environment? A speech lab is a software package which provides
the capability of recording, playing, analysing, processing, displaying and storing speech. Your
computer will require audio input/output capability. The different packages vary greatly in features
and capability - best to know what you want before you start looking around.

Most general purpose audio editing packages will be able to process speech but do not necessarily
have some specialised capabilities for speech (e.g. formant analysis).

The following article provides a good survey.

● Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An Evaluation" Journal of Speech
and Hearing Research, pp 314-332, April 1992.

The following is a list of the speech labs described in the FAQ.

CSRE: Computerized Speech Research Environment


DADiSP from DSP Development Corporation
Entropic Signal Processing System (ESPS) and Waves
GoldWave
Kay Elemetrics Computer Speech Lab
Khoros
Matlab plus Signal Processing Toolbox
MacSpeech Lab II
N!Power
OGI Speech Tools
Ptolemy
Quadravox Speech Processing Products - Qbox
Speech Filing System (SFS)
Signalyze 3.0 from InfoSignal
SoundScope

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.10], [Q1.11]

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.9.html (1 of 2) [10/31/2003 8:47:03 AM]


Q1.9: Speech Laboratory Environments and Audio Editors

Administrivia, Copyright, Submit Information : Last Revision: 15:51 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.9.html (2 of 2) [10/31/2003 8:47:03 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/csre.html

CSRE: Computerized Speech Research


Environment
● Platform: DOS
● Description: CSRE (pronounced "Caesar") is a speech processing system for the PC. It
provides
❍ Signal recording and playback

❍ Signal editing

❍ Pitch and spectral analysis and formant analysis

❍ Speech synthesis with an implementation of the Klatt-1980 parametric speech

synthesizer
● Requirements: PC compatible (80486DX), 1 Meg RAM (recommend 4M), DOS 3.2
(recommend 6.22), VGA graphics (640x480; 16 colors) 30 Meg of hard disk space (5 Meg for
CSRE plus space for audio recordings), and a supported audio card .
● Cost: See AVAAZ WWW Pages
● Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G 2B0
Ph: +1-519-472-7944, Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/
● Note: See also the CSRE entry in Q5.5 on speech synthesisers.

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 18:17 27-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/csre.html [10/31/2003 8:47:04 AM]


Entropic Signal Processing System (ESPS) and Waves

Entropic Signal Processing System (ESPS) and


Waves
● Platform: Range of Unix platforms.
● Description: ESPS is a comprehensive set of speech analysis/processing tools for the UNIX
environment. The package includes UNIX commands, and a comprehensive C library (which
can be accessed from other languages). Waves is a graphical front-end for speech processing.
Speech waveforms, spectrograms, pitch traces etc can be displayed, edited and processed in X
windows and Openwindows (versions 2 & 3). Waves also includes a signal labelling utility
which provides multiple feature labelling and useful features for fast labelling of large speech
databases. Other Entropic products are HTK (see Q6.5) and TrueTalk (see Q5.5).
● Misc: A more detailed description is provided on the Entropic WWW pages
(http://www.entropic.com/esps.html).
● Cost: On request.
● Contact:
Entropic Research Laboratory, Washington Research Laboratory
600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003
(202) 547-1420
email: info@entropic.com
WWW: http://www.entropic.com/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:08 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/esps.html [10/31/2003 8:47:05 AM]


Kay Elemetrics Computer Speech Lab

Kay Elemetrics CSL (Computer Speech Lab) 4300


● Platform: Minimum IBM PC-AT compatible with extended memory (min 2MB) with at least
VGA graphics. More powerful machines preferable.
● Description: Speech analysis package, with optional separate LPC program for
analysis/synthesis. Uses its own file format for data, but has some ability to export data as
ascii. The main editing/analysis prog (but not the LPC part) has its own macro language,
making it easy to perform repetitive tasks.
Options - more information on the Kay Elemetrics Corp. WWW site:
❍ Multi-Dimensional Voice Program (MDVP)

❍ Voice Range Profile (Phonetograph)

❍ Real-Time Spectrogram

❍ Sona-Match

❍ Palatometer Database

❍ IPA Transcription Tutorial

❍ Delayed Auditory Feedback (DAF)

❍ Disordered Voice Database

❍ Auditory Perception Program and Database

❍ Motor Speech Profile Program

❍ CSL-Pitch

❍ Real-Time EGG Processing

❍ Signal Enhancement in Noise Program

❍ Synthesis Program

❍ DAT Interface and Four Channel Input

❍ Phonetic Database

❍ Direct-to-Disk Program

❍ Programmers Kit

❍ Condenser Microphone

❍ Multi-Speech

● Cost: Contact Kay Elemetrics Corp.


● Contact: Kay Elemetrics Corp.
2 Bridgewater Lane, Lincoln Park, NJ 07035, USA
Ph: +1-201-628-6200, Fax: +1-201-628-6363
Toll free tel. 1-800-289-5297
[WWW: http://www.kayelemetrics.com/ - available soon]

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 23:14 21-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/kay.html [10/31/2003 8:47:05 AM]


OGI Speech Tools

OGI Speech Tools


● Developers from the Center for Spoken Language Understanding (CSLU) at the Oregon
Graduate Institute of Science and Technology (Portland Oregon)
● Platform: Unix
● Description: The OGI Speech tools include :
❍ An X windows display tool (LYRE) for displaying data in a time synchronous fashion

for a. the speech signal b. spectrograms c. phoneme labels, and other information.
❍ A Neural Network (NOPT) training package.

❍ An set of C library routines (LIBNSPEECH) for the manipulation of speech data,

including: a. PLP Analysis, b. Rasta PLP Analysis, c. Linear Predictive Coding, d. Mel
Cepstrum Coding, e. Fast Fourier Transform
❍ A set of utilities for converting file formats such as ADC, NIST, mu-law, binary files,

and ascii. Includes filtering.


❍ A database utility (find_phone) to automate speech database related enquiries. It allows

the user to specify a particular label or set of labels in a given context, display all
occurrences of the label, and relabel the occurrences if desired.
❍ A Vector-Quantizer based on the Linde Buzo and Gray (LBG) algorithm.

❍ A set of PERL Scripts which have been used mainly to automate the use of the OGI

Speech Tools.
❍ MAN Pages for all routines and programs developed, as well as a User manual in both

in postscript and tex format.


● Misc: Software is written in ANSI C.
● Contact: Email: tools@cse.ogi.edu
WWW: http://www.cse.ogi.edu/CSLU/
ftp: ftp://speech.cse.ogi.edu/pub/tools/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 22:44 21-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/ogi.html [10/31/2003 8:47:06 AM]


Speech Filing System (SFS)

Speech Filing System (SFS)


● Platform: Unix and DOS
● Description: SFS provides a computing environment for conducting speech research. It
comprises software tools, file and data formats, subroutine libraries, graphics, standards and
special programming languages. It performs standard operations such as recording, replay,
waveform editing and labelling, spectrographic and formant analysis and fundamental
frequency estimation. For more information, see
ftp://pitch.phon.ucl.ac.uk/pub/sfs/README
● Misc: SFS is copyrighted University College London, but is currently supplied free of charge
to research establishments for non-profit use.
● Availability: SFS source code is available by anonymous FTP from:
ftp://pitch.phon.ucl.ac.uk/pub/sfs/
● Contact: Mark Huckvale
University College London, Gower Street, London WC1E 6BT, UK
Email: SFS@phonetics.ucl.ac.uk
ftp: ftp://pitch.phon.ucl.ac.uk/pub/sfs/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 02:01 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/sfs.html [10/31/2003 8:47:07 AM]


Signalyze 3.0 from InfoSignal

Signalyze 3.0 from InfoSignal


● Platform: Macintosh
● Description: Signalyze is an interactive program for the analysis of speech and other acoustic
material. Signalyze's basic concept revolves around the display of up 100 signals in HyperCard
fashion. The program offers a range of signal editing features, spectral analysis tools, manual
scoring tools, pitch extraction routines, signal manipulation tools, and extensive input-output
capacity. It also has a range of capabilities for creating, editing and manipulating label files
with flexibility in labelling format.
Signalyze handles the following file formats: Signalyze, MacSpeech Lab, AudioMedia,
SoundDesigner II, SoundEdit/MacRecorder, SoundWave, sound resource formats, and ASCII-
text.
Sound I/O: Direct sound input from Apple 8- or 16-bit sound input Sound output via
Macintosh 8- or 16-bit sound.
● Compatibility: MacPlus and higher. Takes advantage of large screens, multiple screens and
16/256 color/grayscales. System 7.0 compatible. Runs in background with adjustable priority.
● Misc: Manuals and tutorials included (250 pp.). Program is switchable to English, French, and
German. For more information and demo:
WWW: http://www.agoralang.com:2410/pubdirsoftware.html
WWW: http://www.agoralang.com:2410/signalyze.html
Gopher: gopher://uldns1.unil.ch:70/11/unilgophers/gopher_lett/LAIP
● Cost: Individual licence US$450, departmental license US$750, organisational license
US$1250, plus shipping. Upgrades from version 2.0 are available.
● Contact: The Americas: Network Technology Corporation
91 Baldwin St., Charlestown, MA 02129, USA
Phone: +1-617-241-9205, Fax: +1-617-241-5064
---
Elsewhere: InfoSignal Inc.
C.P. 73, 1015 LAUSANNE, Switzerland,
Fax: +41 21 691-1372,
Email: 76357.1213@COMPUSERVE.COM

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 11:57 16-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/signalyze.html [10/31/2003 8:47:07 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/soundscope.html

SoundScope
● Platform: Macintosh: 68K and PowerPC native
● Description: The SoundScope product family is used primarily in speech teaching & research,
with some applications in animal sounds, forensics, and general acoustic analysis. It can
record, view, analyze, play, copy, paste, store and print sound waveforms. Analysis functions
include spectrogram, fundamental frequency (Fo), Linear Predictive Coding (LPC) including
formant tracking, LPC residual, jitter (pitch perturbation), shimmer (amplitude perturbation),
HNR, frequency spectrum, spectral slice, envelope, energy and zero crossing. Includes limited
built-in filtering, runs any filter created with WLFDAP. An integrated text editor stores notes
and calculation results. SoundScope lets you design your own custom "instrument" screen,
tasks (macros) and menus. Supplied instruments include 1 channel analyser (dual snap, dual
time, spectrogram, spectrum), 2 channel analyser, segment analyser, multi-channel recorder,
etc.
● Note: Supercedes MacSpeech Lab II.
● Price: $490 to $4990, less educational discount
● Availability: In North America, directly from GW Instruments. Contact the company for
international distributors.
● Contact: GW Instruments
35 Medford Street, Somerville, MA 02143, USA
Ph: +1-617-625-4096, Fax: +1-617-625-1322
Email: info@gwinst.com

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:21 14-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/soundscope.html [10/31/2003 8:47:08 AM]


Q1.8: Speech File Formats and Conversion

Q1.8: Speech File Formats and


Conversion
Q2.7 of this FAQ has information on mu-law coding.

A very good and very comprehensive list of audio file formats is prepared by Guido van Rossum. The
list is posted regularly to comp.dsp and alt.binaries.sounds.misc, amongst others. It includes
information on sampling rates, hardware, compression techniques, file format definitions, format
conversion, standards, programming hints and lots more. It is also available by ftp from

WWW: ftp://ftp.cwi.nl/pub/audio/index.html
Text: ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2

A useful source of software (Sox, ulaw conversion, SoundKit etc) is:

http://peace.wit.com/sounds/SoundConversion/

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 01:52 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.8.html [10/31/2003 8:47:08 AM]


Macintosh Audio Hardware

Macintosh Audio Hardware - an overview

● Description: ALL Macintosh computers come with the ability to play back sounds at any
sample rate (sample rate conversion is done in software.) Older machines have 8 bit stereo
output (hardware runs at 22254 samples/second). The newer machines have 16 bit stereo
hardare running at 44100 samples/second.

Most of the recent Macintosh computers come with sound input hardware. There are probably
exceptions to this, but the older and some of the current low-end machines have 8 bit (linear)
mono hardware running at 22254.54 samples/second. All of the PowerPC, AV, and the 500
series notebook computers come with 16 bit 44kHz stereo sampling hardware. They can also
record at 22050 samples/second. The sound manager implements an AGC (Automatic Gain
Control) function for the 8 bit hardware. The drivers have a switch to turn off the AGC.

There are a number of DSP vendors that support high quality audio. Generally this means
quieter analog sections, and more IO formats (AES/IBU, for example). Try DigiDesign and
Spectral Innovations.

The software drivers for sound are described in "Inside Macintosh: Sound". If you want to see
some sample code check out the sources for the Matlab "Sound and Image Toolbox". They can
be found at

ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx

Routines that play and record sounds using the toolbox are included (and interfaced to Matlab).

Back to Q2.6 of Section 2 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:15 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Hardware/mac.html [10/31/2003 8:47:10 AM]


PC Audio Hardware

PC Audio Hardware

Note: new soundcards are becoming available all the time - the information below is definately not up
to date. Check out the following newsgroups for up-to-date information.

● comp.sys.ibm.pc.soundcard
● comp.sys.ibm.pc.soundcard.GUS
● comp.sys.ibm.pc.soundcard.advocacy
● comp.sys.ibm.pc.soundcard.games
● comp.sys.ibm.pc.soundcard.misc
● comp.sys.ibm.pc.soundcard.music
● comp.sys.ibm.pc.soundcard.tech

The Soundcard WWW Site is an excellent source of information:

● http://www.wi.leidenuniv.nl/audio/

An good source of programs and information for soundcards is SimTel:

● http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

Additional information on PC soundcards is provided by the FAQ postings for the


comp.sys.ibm.pc.soundcard.misc newsgroup. These are available by anonymous ftp from:
ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/

● Aria Soundcard FAQ


● Aria Soundcard Support List
● MIDI files software archives on the Internet
● Turtle Beach sound cards FAQ

Back to Q2.6 of Section 2 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:41 08-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Hardware/pc.html [10/31/2003 8:47:10 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section2/Hardware/unix.html

Unix Audio Hardware

Could someone please provide information on the audio capabilities of other Unix platforms?

Sun standard audio port: SPARC I & II

● Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample rate. This provides telephone
quality sampling.

Sun DBRI audio port (SPARC 10 & 20)

● Input and Output: Stereo (2 channels). 16-bit linear sampling. Multiple sample rates (48000,
44100, 37800, 32000, 22050, 18900, 16000, 11025, 9600, 8000 Hz)

Silicon Graphics Audio

The Silicon Graphics audio Frequently Asked Questions (FAQ) is the best place to get information on
SGI audio capabilities and programming. It provides information on connecting the audio output,
using the DSP capabilities, controlling the audio output, programming, useful software and more. It is
available from:

● WWW: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
● News: comp.sys.sgi.misc
● Ftp: ftp://viz.tamu.edu/pub/sgi/faq/

Ariel Signal Processors

● Platform: Various
● Description: A range of signal I/O, A/D, D/A and DSP products are available. There are too
many to list.
● Contact: Ariel Corp.
433 River Road, Highland Park, NJ 08904.
Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124

Back to Q2.6 of Section 2 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:05 08-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section2/Hardware/unix.html [10/31/2003 8:47:12 AM]


SigLib from Numerix Ltd.

SigLib from Numerix Ltd.


● Platform: Windows, Unix and all major DSPs
● Description: SigLib is an ANSI C Source DSP Library and includes functions for the
following areas : spectrum analysis, windowing, filtering (fixed and adaptive coefficient),
convolution, correlation, covariance, signal generation, statistical analysis, regression analysis,
communications and modulation, digital effects, vectors processing, control, graphics and file
I/O.
Detailed product information and a description of the application of SigLib to speech
processing is provided on the Numerix Ltd. WWW site.
● Availability: A free demonstration of SigLib V2.0 is available from the Numerix Ltd. WWW
site. Educational discount is available for SigLib.
● Contact: Numerix Ltd.,
157 Sileby Road, Barrow-on-Soar, Leics, LE12 8LW, UK.
Phone/Fax : +44 (0)1509 413195
Email: numerix@numerix.co.uk
WWW: http://www.numerix.co.uk/

Back to Q2.8 of Section 2 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:28 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section2/Software/numerix.html [10/31/2003 8:47:13 AM]


Q1.1: What is comp.speech?

Q1.1: What is comp.speech?


Comp.speech is an unmoderated newsgroup for discussion of speech technology and speech science. It covers a wide
range of issues from the application of speech technology, to research, to products and lots more. By its nature, speech
technology is an inter-disciplinary field and the newsgroup reflects this. However, computer application is the basic
theme of the group.

Note: If you don't know what a newsgroup is, then talk to your local system administration about how to get access. A
useful newsgroups for beginners is news.announce.newusers. You might also find the following documents useful.

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Usenet?
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to_Frequently_Asked_Questions_about_Usenet
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_posting_to_Usenet
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about_FAQs

The following is a list of some of the topics covered by comp.speech.

● Speech Recognition - discussion of methodologies, training, techniques, results and applications. This should
cover the application of techniques including HMMs, neural-nets and so on to the field.

● Speech Synthesis - discussion concerning theoretical and practical issues associated with the design of speech
synthesis systems.

● Speech Coding and Compression - both research and application matters.

● Phonetic/Linguistic Issues - coverage of linguistic and phonetic issues which are relevant to speech technology
applications. Could cover parsing, natural language processing, phonology and prosodic work.

● Speech System Design - issues relating to the application of speech technology to real-world problems. Includes
the design of user interfaces, the building of real-time systems and so on.

● Other matters - relevant conferences, jobs, books, software, hardware, and products.

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 01:52 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.1.html [10/31/2003 8:47:14 AM]


32 kbps ADPCM

32 kbps ADPCM
● Platform: SGI and Sun Sparcs
● Description: 32 kbps ADPCM C-source code (G.721 compatibility is uncertain)
● Contact: Jack Jansen
● Availablity: http://www.cwi.nl/ftp/audio/adpcm.shar

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:14 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/adpcm.html [10/31/2003 8:47:15 AM]


G.711/721/723 Compression

G.711/721/723 Compression
● Description:
❍ G.711 : CCITT u-law and A-law compression

❍ G.721 : CCITT 32 kbps ADPCM coder

❍ G.723 : CCITT 24 kbps and 40 kbps ADPCM coders

● Availability: By email to itudoc@itu.ch, with

GET ITU-3022

as the *only* line in the body of the message.


It is also available by anonymous ftp from:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G721_G723.tar.Z

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/g.711.721.723.html [10/31/2003 8:47:15 AM]


shorten - a lossless compressor for speech signals

shorten - a lossless compressor for speech


signals
● Platform: UNIX/DOS
● Description: A fast waveform coder suitable for a speech and music signals in a wide variety
of file formats. The degree of compression is adjustable from lossless to three bits a sample.
16bit 16kHz speech generally attains 50% lossless compression and 16:3 compression of
CDROM quality speech is obtainable with only minor audiable degredation.
● Availability: Anonymous ftp - UNIX and DOS versions
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.gz
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.zip

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/shorten.html [10/31/2003 8:47:16 AM]


CELP 3.2a & LPC-10

CELP 3.2a & LPC-10


● Platform: Sun (the makefiles and source can be modified for other platforms)
● Description: CELP is lossy compression technqiue. The US Department of Defences's Federal-
Standard-1016 based 4800 bps code excited linear prediction voice coder version 3.2a (CELP
3.2a). Fortran and C simulation source codes.
● Availability: By anonymous ftp from:
ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
Or from the comp.speech ftp server
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.gz
LPC-10 Fortran source code is also available:
ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
Here is a modified LPC-10 release that includes ANSI C source:
http://www.arl.wustl.edu/~jaf/lpc/
● Documentation: The following articles describe the Federal-Standard-1016 4.8-kbps CELP
coder:
❍ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The Federal

Standard 1016 4800 bps CELP Voice Coder," Digital Signal Processing, Academic
Press, 1991, Vol. 1, No. 3, p. 145-155.
❍ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The DoD 4.8 kbps

Standard (Proposed Federal Standard 1016)," in Advances in Speech Coding, ed. Atal,
Cuperman and Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.
The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400 bps linear
prediction coder (LPC-10) was republished as a Federal Information Processing Standards
Publication 137 (FIPS Pub 137). It is described in:
❍ Thomas E. Tremain, "The Government Standard Linear Predictive Coding Algorithm:

LPC-10," Speech Technology Magazine, April 1982, p. 40-49.


There is also a section about FS-1015 in the book:
❍ Panos E. Papamichalis, Practical Approaches to Speech Coding, Prentice-Hall, 1987.

The voicing classifier used in the enhanced LPC-10 (LPC-10e) is described in:
❍ Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced Classification of Speech

with Applications to the U.S. Government LPC-10E Algorithm," Proceedings of the


IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1986, p. 473-6.
● Vendors:
Realtime DSP code for FS-1015 and FS-1016 is sold by:
❍ John DellaMorte, DSP Software Engineering

165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA


Ph: 1-617-275-3733 Fax: 1-617-275-4323
Email: dspse.bedford@channel1.com
DSP Software Engineering's FS-1016 code can run on a DSP Research's Tiger 30 (a PC board
with a TMS320C3x and analog interface suited to development work).
❍ DSP Research

1095 E. Duane Ave, Sunnyvale, CA 94086, USA


Ph: (408)773-1042 Fax: (408)736-3451

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/celp-3.2a.html (1 of 2) [10/31/2003 8:47:17 AM]


CELP 3.2a & LPC-10

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:44 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/celp-3.2a.html (2 of 2) [10/31/2003 8:47:17 AM]


8 Kbit/s CELP on the TMS320C5x family of DSP chips

8 Kbit/s CELP on the TMS320C5x family of DSP


chips
● Description: For low bandwidth transmission of voice, compact voice storage for archival
purposes, low-cost digital answering machines and efficient storage for voice mail. Features :
❍ near toll quality at 8 Kb/s.

❍ Variable rate option with 1 Kb/s silence encoding.

❍ Implemented on a fixed-point processor for lower system cost.

❍ Attractive licensing scheme.

❍ Future availability of 4 Kb/s.

❍ Custom rates possible.

Capacity :
❍ Two half-duplex or one full duplex channels on the 20 MIPS 'C5x (at 95% and 55%

CPU utilization respectively).


❍ Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU utilization).

❍ Requires 9 K-words program memory and 3 K-words data memory.

❍ Decoding in real-time on a 486 class CPU.

● Contact:
CVI Inc.
443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
Tel: (604) 987 1719 Fax: (604) 986 8139
Email: cvi@extropia.wimsey.com

Last Revision: 14:16 26-May-1995

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/celp.8kbit.html [10/31/2003 8:47:19 AM]


G.728 LD-CELP vocoder

G.728 LD-CELP vocoder


● Platform: Analog Devices ADSP-2171
● Description: Real-time, full-duplex G.728 LD-CELP vocoder that runs on a single Analog
Devices ADSP-2171. Source and object code available for a one-time license fee.
● Contact:
Cole Erskine
Analogical Systems
299 California Avenue, Suite 120
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
email: cole@analogical.com

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/g.728.LD-CELP.html [10/31/2003 8:47:19 AM]


Lernout & Hauspie Speech Coding (5 products)

Lernout & Hauspie Speech and Music Coding


Product Range
● Product name: L&H.smc650: 32kbps ADPCM Speech coding
❍ Implementation of ADPCM 32 kbps based on CCITT G721 standard.

❍ Estimated quality: 4.1 MOS (Mean Opinion Score)

❍ Hardware Example: Analog Devices ADSP2101

❍ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16

bits per sample; 8 kHz sampling rate


● Product name: L&H.smc550: LD-CELP 16 kbps speech coding
❍ Proprietary implementation of LD-CELP 16 kbps based on CCITT G728 standard.

❍ Estimated quality: 4.0 MOS (Mean Opinion Score)

❍ Hardware Example: Motorola 5600X

❍ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16

bits per sample; 8 kHz sampling rate


● Product name: L&H.smc450: 16-17.5 kbps speech coding
❍ Estimated Quality: 3.9 MOS (Mean Opinion Score)

❍ Hardware Examples: Analog Devices ADSP2101, Intel 486 DX2/66 MHz

❍ Input / Output Signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16

bits per sample; 8 kHz sampling rate.


● Product name: L&H.smc350: 4.8-9.6 kbps speech coding
❍ Proprietary CELP based software for compression rates of 4.8 kbps to 9.6 kbps

❍ Estimated Quality: 3.5 MOS (Mean Opinion Score)

❍ Hardware Examples: AT&T DSP32C

❍ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16

bits per sample; 8 kHz or 11.025kHz sampling rate.


● Product name: L&H.smc250: 2.4 kbps speech coding
❍ Combination of multi band excitation and code book excited linear prediction.

❍ Estimated Quality: 3.0 MOS (Mean Opinion Score).

❍ Hardware Examples: Intel 486 DX2/66 MHz, Analog Devices ADSP2101

❍ Input signal: A-Law or mu-Law PCM (64 kbps); Linear signal with 12-15 bits per

sample; 8 kHz sampling rate.


❍ Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with 12-15 bits per

sample; 8 kHz sampling rate.


● See also: L&H Speech Coding SDK
● More Information: On the WWW: http://www.lhs.com/coding.html
● Cost: Unknown
● Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/lh.html (1 of 2) [10/31/2003 8:47:20 AM]


Lernout & Hauspie Speech Coding (5 products)

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:30 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/lh.html (2 of 2) [10/31/2003 8:47:20 AM]


Castleton Network Systems - G.729 Voice Coder

Castleton Network Systems - G.729 Voice Coder


● Platform: TI TMS320C5x DSP
● Description: G.729, also called CS-ACELP (Conjugate-Structure Algebraic Code Excited
Linear Prediction), is a state-of-the-art voice compression ITU (International
Telecommunications Union) standard that can be used in a wide range of applications
including wireless communications, digital satellite systems, packetized speech and digital
leased lines. G.729 provides 8000 bits/s bandwidth for compressed speech at toll quality
(equivalent to G.726 32 kbit/s ADPCM under clean channel condition). Also, G.729 has lower
complexity and lower bit rate than G.728.
The Castleton G.729 implementation provides a bit-exact implementation of the ITU standard
on a single TI TMS320C5x DSP. The software is C callable and fully re-entrant, which allows
easy interfacing and multi-channel capability. The encoder and decoder are fully independent,
therefore, a DSP device can run a number of full-duplex or half-duplex channels. The coder
and the decoder are able to operate under a real-time task switching kernel.
● Cost and Availablity: Contact Castleton Network Systems.
● Contact: Castleton Network Systems Corporation
350 Terry Fox Drive, Kanata, Ontario, Canada K2K 2W5
Ph: 613-591-8786, Fax: 613-591-8783
Email: inquire@castleton.com
WWW: http://www.castleton.com/

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:54 09-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/castleton.html [10/31/2003 8:47:21 AM]


CyberVoice

CyberVoice
● Description: Cybernetics InfoTech, Inc. offers the following products
❍ Telephone voice compression at 1.2, 2.4, 4.8 and 6.0 kbit/s with good-communications-

quality to near-toll-quality coded voice;


❍ Wideband voice (7-kHz bandwidth) compression at 16 kbit/s with near-original-quality

coded voice;
❍ Internet Voice E-mail software with voice editing, high-quality low-data-rate voice

compression, fast/slow voice playback, and more.


● Availablity: C code and Windows .DLL for telephone voice compression and wideband voice
compression are available for licensing.
Real-time DSP codes are under development.
Voice E-mail software is available for purchase and download from the CyberVoice home
page.
● Contact: Cybernetics InfoTech, Inc.
2 Professional Dr., #228, Gaithersburg, MD 20879
WWW: http://www.cybit.com/
E-mail: info@cybit.com
Fax: 301-590-0359

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 18:42 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/cybernetics.html [10/31/2003 8:47:21 AM]


Rockwell's DigiTalk

Rockwell's DigiTalk
● Description: The DigiTalk coder operates at a sampling rate of 8KHz and transmits 223 bits of
coded speech every 26ms, giving an overall bit rate of 8.577Kbps. The algorithm is based on
analysis-by-synthesis predictive coding with vector-coded excitation, in which the excitation
signal is optimized by minimizing the perceptually weighted error between the original and
synthesized speech. More information and results of perceptual tests are available on the
WWW.
● Availablity: See the WWW page: http://www.nb.rockwell.com/ref/digitalk/

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/digitalk.html [10/31/2003 8:47:22 AM]


File format conversion

File format conversion


● Platform: SUN OS?
● Description: Conversion utility able to encode and decode between the the following formats:
G.723, G.721, A-law, u-law and linear.
● Availability: By anonymous ftp from
ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/format.html [10/31/2003 8:47:23 AM]


G.728 Compression

G.728 Compression
● Description: G.728 low delay celp package written by Alex Zatsman of Analog Devices, Inc.
● Availability: By anonymous ftp from
ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/g.728.html [10/31/2003 8:47:23 AM]


GSM 06.10 Compression

GSM 06.10 Compression


● Platform: Unix; faster than real time on most Sun SPARCstations
● Description: GSM 06.10 is a standardized lossy speech compression employed by most
European wireless telephones. It uses RPE/LTP (residual pulse excitation/long term prediction)
coding to compress frames of 160 13-bit samples (8 kHz sampling rate, i.e. a frame rate of 50
Hz) into 260 bits.
● Contact: GSM 06.10 support and implementation jutta@cs.tu-berlin.de, cabo@cs.tu-berlin.de
● Availability: The following configurations are available be anonymous ftp:
gzip compression from Germany: ftp://ftp.cs.tu-
berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.7.tar.gz
MS-DOS compression from Germany: ftp://ftp.cs.tu-
berlin.de/pub/local/kbs/tubmik/gsm/ddj/gsm-107.zip
MS-DOS compression from USA: ftp://ftp.mv.com/pub/ddj/1194.12/gsm-
105.zip
● Misc: The WWW site is
http://www.cs.tu-berlin.de/~jutta/toast.html

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/gsm06.10.html [10/31/2003 8:47:24 AM]


Lernout & Hauspie Speech Coding SDK

Lernout & Hauspie Speech Coding SDK


● Description: Windows based software development kit for integrating speech coding
technology with Windows based PC applications.
● Requirements: IBM-compatible 486 DX/33 MHz + 2MB RAM + MS DOS 5.0 + MS
Windows 3.1 (or higher) + Sound Blaster compatible sound board.
● See also: L&H Speech Coding Products
● More Information: On the WWW: http://www.lhs.com/coding.html
● Cost: Unknown
● Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:30 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/lh.sdk.html [10/31/2003 8:47:25 AM]


MPEG Audio

MPEG Audio
MPEG (Moving Pictures Experts Group) is a standard methods for compression and transmission of
digital video and audio. Detailed FAQs and WWW sites are available for MPEG:

MPEG Pointers and Resources


http://www.mpeg.org/
FAQ by Luigi: http://www.crs4.it/~luigi/MPEG/mpegfaq.html
FAQ by Frank Gadegast
http://www.powerweb.de/mpeg/mpegfaq/
FAQ by by Chad Fogg
http://www-plateau.cs.berkeley.edu/mpegfaq/MPEG-2-FAQ.html
How to Install an MPEG Audio Player for your Web Navigator
http://www.mpeg.org/index.html/MPEG-audio-player.html

MPEG Audio Software on the WWW


Audio and Music Applications for Silicon Graphics Systems
Lists 4 MPEG audio players for SGI machines.
http://reality.sgi.com/employees/cook/audio.apps/public.html
MPEG-1 Audio Layer 3 encoder, decoder and FAQ
From the Fraunhofer Institute
http://www.iis.fhg.de/departs/amm/layer3/index.html
MPEG-2 Audio FAQ from Philips
http://www.keymodules.philips.com/MD/mpeg/faqmpeg2.htm
MPEG-1 and MPEG-2 audio software
Universitaet Hannover
ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/
MPEG-1 Audio Layer 1 &2 encoder - decoder
Internet Underground Music Archive (IUMA)
ftp://ftp.iuma.com/audio_utils/converters/source/
Buddy Software Library: MPEG-1 Audio Layer 3 encoder and player
http://www.buddy.org/softlib.html
MPEG-1 Audio Layer 1 & 2 decoder and verifier at CCETT
ftp://ftp.ccett.fr/pub/mpeg/audio_new/
MPEG-2 Audio encoder and decoder at CCETT
ftp://ftp.ccett.fr/pub/mpeg/mpeg2/

MPEG Audio - MetaSound

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/mpeg.audio.html (1 of 2) [10/31/2003 8:47:26 AM]


MPEG Audio

● Platform: MS Windows/3.1 and Windows/95


● Description: MetaSound is a partial MPEG-1 software decoder which is designed to work with
hardware video decoders. It can reduce the hardware cost by eliminating the need for a
hardware audio decoder. Currently, MetaSound has been successfully incorporated to work
with three hardware video decoders. Features
❍ Performance: For 486 DX4-100 machines or above, MetaSound can deliver FM quality

(22 KHz) sound. For Pentium-90 or above machines, MetaSound requires 40% CPU
bandwidth to deliver CD quality (44.1 KHz) sound.
❍ Portability: it can take less than one month to port to new hardware video decoders.

❍ CD standard supports including Video CD 1.0, Video CD 2.0, and CDI.

❍ User interface with full set of functions: volume control, stop, pause, forward,

backward, mute, resume, select the previous/next program track (Video CD 2.0),
randomly select a program track (Video CD 2.0).
❍ Error Recovery: can automatically skip error bitstreams.

● Contact: Meta Media, Inc.


F8, #10-1, Ho-Ping East Rd. Sec. 1, Taipei, Taiwan, R.O.C.
Ph: 011-886-2-369-3330, Fax: 011-886-2-369-3331
Email: mmedia@ms4.hinet.net.tw

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:55 02-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/mpeg.audio.html (2 of 2) [10/31/2003 8:47:26 AM]


Sipro Lab Telecom Inc. Coding

Sipro Lab Telecom Inc. Coding


● Platform: Various processors
● Description: Coding software for several International Standards plus two Proprietary
standards.
International Standards
1. PCS 1900 (a 13 kbps codec, established as a North American PCS standard)
2. Enhanced GSM (a 13 kbps codec)
3. G.723 (8 kbps codec established as a multi-purpose international standard)
4. G.729 (a dual-rate codec for the video phone market)
5. G.729 Annex A (8 kbps codec made for Digital Simultaneous Voice & Data
transmission in the modem industry).

Proprietary Standards
1. ACELP 8 v2.0 codec (flexible dual rate codec equipped with a VAD)
2. ACELP 4.8 codec
● Contact: Sipro Lab Telecom Inc.
770, Chemin Lucerne, Ville Mont-Royal (Quebec), H3R 2H6 CANADA
Ph: (514) 737-5874, Fax: (514) 737-2327
E-mail: sales@sipro.com
WWW: http://www.sipro.com/

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:57 29-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/sipro.html [10/31/2003 8:47:27 AM]


Sonarc: Digital Audio Compression

Sonarc: Digital Audio Compression


● Platform: DOS and Windows
● Description: Sonarc provides reversable, variable-rate compression of audio signals. Obtains
compression ratio which averages about 2:1. Supports monaural and stereo files, 8-bit and 16-
bit files, and WAVE and VOC formats.
● Availablity: Shareware by Richard P. Sprague
Speech Compression
P.O. Box 1785, Wilsonville, OR, 97070-1785, USA
Ph: (503) 263-3102
Email: 76635.3652@compuserve.com

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:39 13-Jun-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/sonarc.html [10/31/2003 8:47:27 AM]


StarAudio Compressor/Player

StarAudio Compressor/Player
● Platform: Win95
● Description: Using a time-domain process delivers lossless decompressed data. Processes any
source of .wav file format, high quality 16-bit audio data at any sampling rate. Requires no
special hardware and decompression speed is real-time on most 486's and on any Pentium. The
higher the sampling rate the higher the compression ratio; minimum compression of 4:1 for
11k data, and usually exceeding 7:1 for 44k data. Full bandwidth of signal is preserved with
default compression options. Compression options allow increase of compression ratio further
with a slight trade off in the reduction of the output quality. A decompression library is
available for application development.
● Demo: Download the shareware version of the program from the STR WWW site.
● Misc: A technical paper is available in Word 6.0 format:
ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
● Contact: Speech Technology Research Ltd.,
Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
Ph: +1-250-477-0544
Email: products@speechtech.com
WWW: http://www.speechtech.com/home/speechtech/

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:10 14-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/staraudio.html [10/31/2003 8:47:28 AM]


TrueSpeech from DSP Group

TrueSpeech from DSP Group


● Description: TrueSpeech is a family of speech compression and decompression algorithms and
software. It is designed for personal computers and personal communications devices. With the
high compression ratios ranging from 15:1 to 27:1, TrueSpeech improves the storage and
communications transmission of digital voice information and can be used in the integration of
personal computers and telephones. TrueSpeech can be utilized in many products and
applications such as:
❍ Multimedia PCs

❍ Sound cards and modems

❍ Computer/telephony and teleconferencing

❍ Voice mail systems and PBX systems

❍ Wireless/cellular applications

❍ Personal digital assistants

❍ Games, Education

❍ Video/cable and on-line services

The TrueSpeech encoder is available for free in the Sound System of Windows 95 and
Windows NT. The DSPG WWW pages have information on how to add TrueSpeech capability
to your WWW pages.
● Contact: DSP Group, Inc.
3120 Scott Boulevard, Santa Clara, CA 95054-3317, USA
Phone: (408) 986-4300 Fax: (408) 986-4323
Email: Webster@dspg.com
WWW: http://www.dspg.com/index.html

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:17 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/truespeech.html [10/31/2003 8:47:29 AM]


U.S.F.S. 1016 CELP vocoder for DSP56001

U.S.F.S. 1016 CELP vocoder for DSP56001


● Platform: DSP56001
● Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a single 27MHz Motorola
DSP56001. Free demo software available for PC-56 and PC-56D. Source and object code
available for a one-time license fee.
● Contact:
Cole Erskine
Analogical Systems
299 California Avenue, Suite 120
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
Email: cole@analogical.com

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:18 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/usfs.1016.html [10/31/2003 8:47:29 AM]


ToolVox from Voxware

ToolVox from Voxware


● Platform: Windows and soon available on Mac (in Beta now) and Unix
● Description: ToolVox is a proprietary frequency domain speech coder. 11 KHz speech is
coded to an average rate of between 5,000 bits per second and 9,000 bps. Real-time
compression algorithms available for 2,400 bps. 22 KHz playback, as well as a ultra low bit
rate 8 KHz codec are coming soon. On playback, the time scale can be changed by a 5x factor,
pitch can be modified over a 3 octave range, and vocal personality can be modified using a
tranformation function called VoiceFonts(tm).
● Misc 1: A SDK for Windows is available.
● Misc 2: Demo software is available from the Voxware Inc WWW page:
http://www.voxware.com/
● Price: Basic toolkit is $895 US. OEM and mass distribution licenses are separate. Ordering
information is provided on the Voxware WWW server.
● Contact:
Voxware, Inc.
Ph: (609) 497-1212 Fax: (609) 497-2490
Sale information: sales@voxware.com
WWW: http://www.voxware.com/

Back to Q3.3 of Section 3 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:18 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section3/Software/voxware.html [10/31/2003 8:47:30 AM]


rsynth

rsynth
● Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI Irix4.x, Linux)
● Description: Public domain text-to-speech systm assembled from a variety of sources. It
supports CMU and BEEP format dictionaries (as described in Q1.10) and now utilises stress
marks in the dictionary in synthesising intonation.
● Price: Free
● Misc: Axel Belinfante has implemented a WWW rsynth demo:
http://wwwtios.cs.utwente.nl/say.
● Availability: by anonymous ftp from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:02 16-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/rsynth.html [10/31/2003 8:47:31 AM]


Orator from Bellcore

Orator Text-to-Speech Synthesizer


● Platform: SUN SPARC, Decstation 5000. Written in C, and therefore portable to other UNIX
platforms. Some successful ports: HP, RS-6000, PC-Unix [Linux].
● Description: Sophisticated speech synthesis package. Has text preprocessing (for
abbreviations, numbers), acronym rules, and human-like spelling routines. Natural-sounding
synthesis based on demisyllable concatenation. Has high accuracy for pronunciation of names
of people, places and businesses in America; good accuracy for English text; rules for stress
and intonation marking; various methods of user control and customization at most stages of
processing.
A new version of the ORATOR system is under development. Both ORATOR and this new
"ORATOR II" system are capable of general text synthesis. The ORATOR II system has a
more natural-sounding voice.
● Hardware: Runs on common SPARC or Decstation workstations, using their internal audio
output capability. Recommend at least 16M of memory.
● WWW: More detailed information plus examples of ORATOR synthesis are available on the
ORATOR WWW pages:
http://www.bellcore.com/ORATOR/
● Misc 1: A free demo cassette is available.
● Misc 2: Examples of Orator are also available on the University of Birmingham Speech
Synthesis "Museum" WWW site (see Q5.4).
● Availability and Pricing: Contact Bellcore's Licensing Office
Tel: 1-800-521-CORE (521-2673)
Fax: 1-908-336-2559
Email: Anthony Lindsey: alin1@panix.com
WWW: http://www.bellcore.com/ORATOR/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 01:52 01-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/orator.html [10/31/2003 8:47:31 AM]


BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

BeSTspeech from Berkeley Speech Technologies,


Inc., (BST)
● Platform: available for Macintosh, Sun, Silicon Graphics, Windows PC and IBM RS/6000
platforms, and can be ported to others.
● Description: BeSTspeech reads ASCII text no vocabulary limits. Available for Dutch, English
(male and female), French, German, Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese,
Korean, Malay, Mandarin and Russian.
● Availability: Berkeley Speech Technologies, Inc does not sell end user toolkits or products.
● Contact: Berkeley Speech Technologies, Inc.
2246 Sixth Street, Berkeley, California 94710, USA
Ph: (510) 841-5083, Fax: (510) 841-5093
Email: webmaster@bst.com
WWW: http://www.bestspeech.com/index.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:58 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/bestspeech.html [10/31/2003 8:47:32 AM]


Infovox Product Range

Infovox Product Range


● Description: Multilingual Text-to-speech systems, languages available: American English,
British English, German, French, Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
Finnish.

● Product name:INFOVOX 500, PC BOARD


❍ Product description: Half length expansion board for IBM PC, XT, AT, PS/2 model 30

or compatible personal computers. The board can also be connected via the serial port.
Language and control program for downloading into RAM or mounted on EPROMs
❍ Platform: DOS/Windows with IBM PC, XT, AT, PS/2 model 30 or compatible

❍ Delivered standard interface: MS DOS I/O driver

● Product name: INFOVOX 600, OEM BOARD


❍ Product description: OEM board built with CMOS IC's. Language and control program

are stored in on-board fixed memory.


❍ Platform: any, hardware interface: 9-pole D-SUB (RS 232-C) 300-9600 Baud.

❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech

manager.
● Product name: INFOVOX 700, DESKTOP UNIT
❍ Product description: Desktop unit with built in Infovox 600 to be connected to any

computer or terminal via an RS 232-C serial interface. Built in loudspeaker and


rechargable battery for 4 hours use, and control knobs for continuous control of speech
volume and speed.
❍ Platform: various through hardware interface

❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech

manager
● Product name: INFOVOX 650, OEM BOARD
❍ Product description: OEM-board built with CMOS IC's. Language and control program

are stored in on-board memory.


❍ Platform: any, hardware interface: 9 pole D-SUB (RS 232-C) 300-9600 Baud

❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech

manager
● Product name: INFOVOX 750, DESKTOP UNIT
❍ Product description: Desktop unit with built in Infovox 650 to be connected to any

computer or terminal via an RS 232-C serial interface. Built in loudspeaker and


rechargable battery for 5 hours use, and a control knob for continuous control of speech
volume.
❍ Platform: various through hardware interface. Delivered standard interfaces include MS

DOS I/O driver and interface to Apple Speech manager


● Product name: Infovox 210, software for Apple Macintosh
❍ Product description: Software based text-to-speech conversion. Produces 16 bit and 8

bit sound. Delivered on 3.5" diskettes with user lexicon and a complete documentation.
❍ Platform: Apple Macintosh with minimum 68030, 33 MHz microprocessor.

❍ Delivered standard interfaces: Standard interface to Apple Speech manager

● Product name: Infovox 220, software for Microsoft Windows.

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/infovox.html (1 of 2) [10/31/2003 8:47:33 AM]


Infovox Product Range

❍Product description: Software based text-to-speech conversion. Produces 16 bit sound


and conforms to Microsoft Windows multimedia standard MCI. Delivered on 3.5"
diskettes with user lexicon and a complete documentation.
❍ Platform: Windows on IBM compatible PC with minimum 486/25MHz

microprocessor.
❍ Delivered standard interfaces: Standard interface to Microsoft Windows 3.1 and sound

boards supporting Microsoft Windows multimedia driver for audio.


● Contact: Telia Promotor Infovox AB
TTS Sales Division
P.O. Box 2069, S-171 02 Solna, Sweden
Ph: +46 8 764 35 00, Fax: +46 8 735 78 76
Email: tts-sales@infovox.se
WWW: http://www.promotor.telia.se/NYA/cc/t-s/index.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:46 12-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/infovox.html (2 of 2) [10/31/2003 8:47:33 AM]


Macintosh Speech Output Applications

Macintosh Speech Output Applications


● Platform: Macintosh
● Description: A comprehensive list of Macintosh Speech Applications is provided by Kevin
Lenzo at CMU:
http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
The Apple Speech WWW Site also has some useful information:
http://www.speech.apple.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:08 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/macapp.html [10/31/2003 8:47:34 AM]


Macintosh Speech Synthesis Manager

Speech Manager and PlainTalk


● Platform: Macintosh
● Description: Apple's text-to-speech system extensions that enable applications to perform text-
to-speech conversion. The Speech Manager runs on most Macs, but PlainTalk (and the high
quality voices) requires a 68020 Mac or better.
● Availability: By anonymous ftp from:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk 1.4.1/
This directory contains subdirectories for recent versions of PlainTalk. The current release
(PlainTalk 1.4.1) contains the English Text-To-Speech with about a dozen voices
(English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish (Mexican_Spanish_TTS.hqx: 2.8
MByte), and the English Speech Recognition software (English_Speech_Recognition.hqx:
2.3MByte).
● Cost: Free
● WWW: The latest information is available from Apple's WWW page for speech recognition
and synthesis:
http://www.speech.apple.com/
● Note 1: Check out Kevin Lenzo's list of Macintosh Speech Applications.
● Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for Plaintalk. For subscription
and other information visit the Plaintalk Discussion List Home page
● Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:19 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/macintosh.html [10/31/2003 8:47:34 AM]


MacYack Pro

MacYack Pro
● Platform: Macintosh
● Description: MacYack Pro is a commercial speech package for Macintosh that uses the
PlainTalk Text-to-Speech synthesis software. Features include:
❍ Add speech to any word processor.

❍ Hear notification dialogs and other dialog boxes.

❍ See and hear a customized message at startup or shutdown.

❍ Hear calculations instantly.

❍ Correct pronounciation errors.

❍ Create custom double-clickable "speech files."

❍ Have speaking alert sounds.

❍ Add speech to HyperCard stacks.

❍ Use AppleScript to add speech to other programs.

● Price: $29.95 for a limited time, reduced from $49.95 regular price. 30 days money back
guarantee.
● Contact: Scantron Quality Computers
20200 Nine Mile Rd. St. Clair Shores, MI 48080
Ph: 1-800-777-3642, Fax: 810-774-2698
E-mail: sales@sqc.com
WWW: http://www.sqc.com/
Product Info: http://www.lowtek.com/macyack/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:17 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/macyack.html [10/31/2003 8:47:35 AM]


MBROLA: Free Speech Synthesis Project

MBROLA: Free Speech Synthesis Project


● Platform: Sun4, Sun/SunOS5.4, HP, VAX/VMS, DEC Alpha/VMS, PS/DOS, PS/Windows
3.1, PS/Windows 95, PC/Solaris2.4, PC/Linux, SGI INDY/IRIX, NeXT, and soon for
Macintosh.
● Description: MBROLA is a high-quality, diphone-based speech synthesizer which is available
for free. It is provided by the TCTS Lab of the Faculte Polytechnique de Mons (Belgium)
which aims to obtain a set a speech synthesizers for as many languages as possible which will
be free of use for non-commercial, non-military applications.
MBROLA 2.00 takes a list of phonemes as input, together with prosodic information (duration
of phonemes and a piecewise linear description of pitch), and produces 16bit speech samples at
the sampling frequency of the diphone database (typically 16kHz). (It is therefore NOT a Text-
To-Speech (TTS) synthesizer, since it does not accept raw text as input.) Databases are now
being prepared for English, Spanish, Italian, Dutch, and Romanian. Collaborations are
welcome. More information can be found at the MBROLA project homepage.
● Demonstration: WWW demo of MBROLA which compares the quality of PSOLA, MBR-
PSOLA, LPC, and Hybrid Harmonic/Stochastic concatenative synthesizers is available at
http://tcts.fpms.ac.be/synthesis/modelcmp.html.
● Contact: Dr Thierry Dutoit
Faculte Polytechnique de Mons, TCTS Lab,
31, bvd Dolez, B-7000 Mons, Belgium.
Ph: +32-65-374133, Fax: +32-65-374129
e-mail: mbrola@tcts.fpms.ac.be
WWW: http://tcts.fpms.ac.be/synthesis/mbrola.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:30 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/mbrola.html [10/31/2003 8:47:36 AM]


ProVoice Developer's Speech Toolkit from First Byte

ProVoice Developer's Speech Toolkit from First


Byte
● Platform: ProVoice Developer's Toolkits are available for DOS, Windows 3.1, Windows 95,
Windows NT, OS/2, and Macintosh.
● Description: ProVoice allows programmers to add synthesized speech to their applications.
Your program passes text strings to the ProVoice speech engine that translates text into audible
speech. Male and/or female "SpeechFonts" are available for many languages; English, French,
German, UK British English, Italian, and Spanish.
ProVoice converts text to speech in two phases using a set of phonetic translation and
pronunciation rules. First, the software analyzes and translates text into "sound descriptors", a
phonetic language with pitch, duration, and amplitude codes which are needed to produce
stress patterns in phrases and sentences. Rules are used to analyze words, numbers, and
punctuation. The second phase converts the intermediate phonetic language in speech signals;
algorithms drive distinct speech signals into smooth flowing, continuous, clear speech. Real
time synchronization of mouth movement and word boundaries allows animation of a
graphical talking character, or highlighting of displayed text as it is spoken.
Necessary tools and examples are provided for programmers to manipulate the ProVoice
speech technology; including installation instructions, extensive samples programs, and
complete documentation. In addition, sample code is provided on disk to illustrate speech
programming techniques.
● Note 1: First Byte will perform custom work for embedded systems.
● Note 2: ProVoice Windows includes support for the Microsoft SAPI. It will speak through any
Windows-supported wave audio device.
● Note 3: Distribution of ProVoice for commercial use is subject to execution of a Commercial
Product Distribution License Agreement.
● WWW: For more detailed information and examples go to the First Byte WWW page:
http://www.firstbyte.davd.com/
● See also: Monologue for Windows from First Byte
● Price and Availability: Contact First Byte
● Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610, Fax: 310-793-0611
Email: info@firstbyte.davd.com
WWW: http://www.firstbyte.davd.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:21 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/provoice.html [10/31/2003 8:47:37 AM]


SENSYN speech synthesizer

SENSYN speech synthesizer


● Platform: PC/DOS/Windows, Macintosh, Sun, and NeXT
● Rough Cost: $300
● Description: This formant synthesizer produces speech waveform files based on the (Klatt)
KLSYN88 synthesizer. It is intended for laboratory and research use. Note that this is NOT a
text-to-speech synthesizer, but creates speech sounds based upon a large number of input
variables (formant frequencies, bandwidths, glottal pulse characteristics, etc.) and would be
used as part of a TTS system. Includes full source code.
● Availability: Sensimetrics Corporation
Sidney Street, Cambridge MA 02139.
Fax: (617) 225-0470; Tel: (617) 225-2442.
Email: sensimetrics@sens.com
WWW: http://www.sens.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:02 16-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/sensyn.html [10/31/2003 8:47:37 AM]


Sound Bytes DeveloperUs Kit

Sound Bytes DeveloperUs Kit


● Platform: Subroutine library for Windows, OS/2 and Macintosh
● Hardware: Windows - 16 MHz 80386 (minimum) running Windows 3.1; 4 Mb RAM with at
least 1.4 Mb RAM free. Disk space 1.4 Mb.
OS/2 - 16 MHz 80386 (minimum) running OS/2 2.0 or above; 8 Mb RAM with at least 1.4 Mb
RAM free.
Mac - Any Mac with at least 2.5 Mb of RAM running 6.0.4 or higher. Telephone compatible.
Compatible with commonly used sound cards.
● Description: SBDK is a software-only sentence-level synthesizer that converts unrestricted
English text (ASCII) into synthesized voice through diphone concatenation. SBDK utlizes
parsing to incorporate the intonational and rhythmic patterns of normal speech. The
developerUs kit includes two voices, one female and one male. The product has a 55,000-word
built-in dictionary and a tool for creating customized user dictionaries. It converts numbers,
dates, dollars, phone numbers and times to words, and has a SoundOut facility that provides a
choice of pronouncing unknown words phonetically or spelling them out. Developers can vary
voice pitch (130-220 Hz) and rate (65-200 wpm), synchronize speech to other events, have
multiple channels of speech to the same or different boards, etc. Speech sampling options: 8-
bit linear; 8-bit companded at 11 kHz (Windows); 8-bit mu-law PCM at 8 or 11 kHz; 16-bit
linear at 11 kHz.
● Cost: Sound Bytes may be licensed for internal use or resale. Site license fee= $3750. Resale
or Internal runtime fees= 2% of net sales price per runtime sold, OR $150 per telephone port,
OR per unit pricing for internal use determined case-by-case.
● Misc: Demo disks are available for Windows and the Mac.
● Availability: Natural Speech Technologies, Inc.
Ph: (619) 457-2526.

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:45 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/sound.bytes.html [10/31/2003 8:47:38 AM]


Macintosh Speech Synthesis Manager

Speech Manager and PlainTalk


● Platform: Macintosh
● Description: Apple's text-to-speech system extensions that enable applications to perform text-
to-speech conversion. The Speech Manager runs on most Macs, but PlainTalk (and the high
quality voices) requires a 68020 Mac or better.
● Availability: By anonymous ftp from:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/System/PlainTalk 1.4.1/
This directory contains subdirectories for recent versions of PlainTalk. The current release
(PlainTalk 1.4.1) contains the English Text-To-Speech with about a dozen voices
(English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish (Mexican_Spanish_TTS.hqx: 2.8
MByte), and the English Speech Recognition software (English_Speech_Recognition.hqx:
2.3MByte).
● Cost: Free
● WWW: The latest information is available from Apple's WWW page for speech recognition
and synthesis:
http://www.speech.apple.com/
● Note 1: Check out Kevin Lenzo's list of Macintosh Speech Applications.
● Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for Plaintalk. For subscription
and other information visit the Plaintalk Discussion List Home page
● Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:48 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/speechmgr.html [10/31/2003 8:47:39 AM]


AcuVoice

AcuVoice
● Platform: Windows, Solaris
● Description: AcuVoice is a natural sounding text-to-speech system built using a concatenative
approach. Currently it is available for an American English Male Voice. Software Developer
Kits are available for the Windows Platform (32-Bit) and also for the Solaris Platform. More
information and samples are available on the Acuvoice web site.
● Contact: AcuVoice, Inc.
84 W. Santa Clara Street, Suite 720, San Jose, CA 95113-1810
Ph: 1(408)289-1661, Fax: 1(408)289-1201
Demo: 1(408)289-1177
Email: AcuVoice1@AOL.COM
WWW: http://www.acuvoice.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 18:11 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/acuvoice.html [10/31/2003 8:47:39 AM]


AT&T Watson Speech Synthesis

AT&T Watson Speech Synthesis


● Platform: Windows 95/NT on a Pentium 75 Mhz or higher
● Description: Watson is a software implementation of AT&T Bell Laboratories voice
processing technology. Watson includes BLASR Speech Recognition (see Q6.6) and FlexTalk
speech synthesis. It requires no special hardware to run other than a standard sound card and/or
phone card. Technical details for the FlexTalk speech synthesis include:
❍ Compliant with MS Speech API.

❍ Male and Female Voices available

❍ 8 KHz and 11 KHz output

❍ SoundBlaster compatible sound card and drivers required

❍ Context sensitive abbreviation expansion

❍ Accurate pronunciation of most proper names

❍ Adjustable vocal tract size, speed, volume, pitch, etc.

❍ American English only - other languages in development

The AT&T Advanced Speech Products Group home page provides more detailed information
including a Frequently Asked Questions list, information for application developers on the
Independent Software Vendor (ISV) Program (including info on the SDK, licensing, and the
training program).
● Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz or higher (uses <
50% CPU).
● Cost and Availability: WATSON is a software-based speech platform with a Software
Developers Kit (SDK) that allows application developers to use voice processing in their
applications. It is not available as a stand-alone product.
Licensing information (inc. price) is provided in the AT&T Advanced Speech Products Group
home page
● See also: Watson BLASR speech recognition in Q6.5, Microsoft Speech API, and Advanced
Speech API.
● Contact: AT&T Advanced Speech Products Group
Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
Email: aspg@attmail.com
WWW: http://www.att.com/aspg/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:49 31-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/att.html [10/31/2003 8:47:40 AM]


Creative TextAssist and TextAssist API

Creative TextAssist
● Platform: Windows
● Description: Based on DECtalk speech synthesis. A detailed description of TextAssist is
provided on the Creative WWW pages. TextAssist TextReader provides a convenient
Windows user interface for text reading.
● Availability: Creative TextAssist is bundled with most (all?) Creative Sound Blaster audio
cards. TextAssist preview software is available from the Creative Labs TextAssist home page.
● Contact: Creative Labs, Inc.
Address, phone, email etc unknown
WWW: http://www.creaf.com/ : http://www.creaf.com/wwwnew/tech/devcnr/tassist.html

Creative TextAssist API


● Platform: Windows
● Description: The TextAssist API (TAAPI) is created for Microsoft Windows 3.1x and
Windows 95 developers who intend to develop 16-bit Text-to-Speech software applications
using Creative's TextAssist speech engine. It supports direct control of speech output
characteristics, concurrent playback of text-to-speech and wave files, foreign language support,
speech synchronization, exception dictionaries. It also includes a voice editing tool for creating
new custom voices, a Visual Basic Custom Control for high-level support in Visual Basic and
other languages
● Availability: The TextAssist API is released to registered developers at no cost.
● Contact: WWW: http://www.creaf.com/
FAQ: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:37 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/creative.html [10/31/2003 8:47:41 AM]


DECtalk: Text-to-Speech from Digital

DECtalk Speech Synthesis


● Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
● Description: Converts ordinary text into natural-sounding, intelligible speech. Provides
personalized voices, and extensive user controls. DECtalk technology is available for the
following packaging options.
❍ DECtalk PC card option: An industry-standard ISA/EISA bus card implementation that

can be integrated with any Intel 486 processor-based system running DOS or Windows.
Applications can be interfaced to the bus via a DOS Terminate and Stay Resident (TSR)
driver or a Windows Dynamic Link Library (DLL). This option is available with an
external speaker with volume control and headphone jack.
❍ DECtalk Express external package: An external, portable package that you can plug in

to any PC or serial port. The external package includes a built-in speaker and
headphone jack, plus combined on/off and volume controls and a rechargeable battery
pack.
❍ DECtalk Software solution: Software-only text to speech for Alpha or Intel systems

running Windows NT or Alpha systems running Digital UNIX. Provides complete


speech synthesis capabilities so developers can enhance applications with DECtalk
technology. DECtalk Software output can be directed to audio devices, into WAVE
files, or into memory buffers.
● Pricing: ://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis-oi.html
● More Information:
Digital Equipment Corporation WWW pages: http://www.digital.com/
DECtalk page: http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.html
Ph: 1-800-DIGITAL

DECtalk Software
● Platform: Digital UNIX and Windows NT
● Description: DECtalk converts standard ASCII text into natural, intelligible speech. Speech
output through any audio device is supported by Microsoft Video for Windows or Multimedia
Services for Digital UNIX. An API gives developers direct access to text-to-speech functions.
Provides nine voice personalities (4 female, 4 male, 1 child). Provides punctuation and tonal
control, supports customized pronunciation of trade jargon and acronyms. Common
programming interface works with both Alpha and Intel platforms.
● More Information:
Digital Equipment Corporation WWW pages: http://www.digital.com/
DECtalk Software page: http://www.systems.digital.com/DIcatalog/html/DECtalk-
Software.html
WWW: http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis.html
Ph: 1-800-DIGITAL

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/dectalk.html (1 of 2) [10/31/2003 8:47:42 AM]


DECtalk: Text-to-Speech from Digital

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:01 12-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/dectalk.html (2 of 2) [10/31/2003 8:47:42 AM]


ETI-Eloquence

ETI-Eloquence
● Platform: MS Windows (Win95,NT,3.1), Solaris, SunOS, SGI, RS/6000
● Description: ETI-Eloquence is a software based text-to-speech system. It generates waveforms
completely algorithmically instead of by concatenating waveforms, for maximum flexibility
and naturalism. For instance, when the user requests a deeper voice, the software simulates a
larger vocal tract, instead of simply pitch-shifting samples. It uses high-level linguistic parsing,
which obviates the need for a huge dictionary. It handles numbers, acronyms, currency, etc. It
includes a set of annotation symbols, for placing stress on particular words, expressing
excitement/boredom, etc. Also allows phonetic input. Supports MS SAPI.
Produces male and female voices for General American English. Dialects under development
include Alabama and Brooklyn.
● Price: Flexible license agreements on application.
● Availability:Eloquent Technology, Inc.
2389 North Triphammer Road, Ithaca, NY 14850 , USA
Ph: (607) 266-7025, Fax: (607) 266-7030
Email: info@eloq.com
WWW: http://www.eloq.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 11:57 31-Jan-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/eloquence.html [10/31/2003 8:47:42 AM]


HADIFIX

HADIFIX
● Platform: Windows
● Description: German speech synthesis system developed at the Institute for Communications
Research and Phonetics , University of Bonn. Provides conversion of input text to phonemes,
automatic prediction of stress, phrasing and pitch, and speech generation by concatenation of
small units of natural speech. Demisyllables and similar units are used; they comprise all
consonants before the vowel and the beginning of the vowel (initial demisyllable) or the end of
the vowel and the following consonants (final demisyllable). For example, the word 'Strolch' is
formed by concatenating 'Stro' and 'olch'.
● Demo: Windows demo software available. Limited to synthesis of one short text (text.txt) at a
time. Speech format limitations too. 1.3MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
A 1993 version is available with unlimited synthesis from a string of phonemic symbols and
accent markers. 6MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
● WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
● On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 23:42 21-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/hadifix.html [10/31/2003 8:47:43 AM]


IPOX: All Prosodic Speech Synthesis Architecture

IPOX: All Prosodic Speech Synthesis Architecture


● Platform: Windows
● Description: IPOX is an experimental, all-prosodic speech synthesizer, developed by Arthur
Dirksen and John Coleman. IPOX is freely available (after registration) for evaluation and non-
profit research purposes.
● Requirements: PC (preferably a fast 486) running Windows 3.1 or higher. Sound output
requires a 16-bit Windows-compatible sound card
● Availability: By WWW from http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:08 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/ipox.html [10/31/2003 8:47:44 AM]


Lernout and Hauspie Text-To-Speech Windows SDK

Lernout & Hauspie Text-to-Speech Windows SDK


● Platform: Windows
● Description: The L&H Text-to-Speech software developers kit is able to integrate text-to-
speech technology with your own or existing PC applications under Microsoft Windows 3.1.
This software will allow conversion of written text into clear human sounding synthetic
speech.
● Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 + MS Windows
3.1 (or higher) + SoundBlaster compatible sound board.
● See also: L&H TTS Products
● More Information: on the Lernout & Hauspie WWW pages: http://www.lhs.com/tts.html
● Price: Unknown
● Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:31 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/lernout.haupsie.sdk.html [10/31/2003 8:47:45 AM]


Listen2 Text Reader

Listen2 Text Reader


● Platform: Windows
● Description: Listen2 is a multi-voice, multi-language text reader. Listen2 comes in two
versions, English only that uses high quality male and female voices, and the International
version that can speak up to 5 different languages: English, German, French, Spanish or
Italian, all in male voices. The basic International program comes with built-in English and
additional language fonts can be purchased separately. The English version comes complete.
Both programs are dynamically switchable and configurable. This means that you can press a
hot key to speed up the speech, make it louder or quieter, etc., as it is reading a file. You can
also insert flags in text files to make it switch voices or switch languages, depending on what
version you have.
Listen2 has all the features of the JTS Reader shareware program plus a few more. It will voice
your reminder messages or appointment list on start-up. It will also speak a reminder message
on shutting down.
● WWW: A more complete description is available on the Listen2 web page
● Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
WWW: http://www.islandnet.com/jts/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:38 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/listen2.html [10/31/2003 8:47:45 AM]


Monologue for Windows from First Byte

Monologue for Windows from First Byte


● Platform: Windows
● Description: Monologue is a software program that reads text from the clipboard in Windows
16 or 32 bit applications. It can be found as a bundled product with many sound cards and
multimedia general purpose computer systems. Monologue can add the element of speech to
virtually any text oriented application. Any pronounceable combination of letters and numbers
will be spoken clearly. It can be applied to tasks such as eyes-free proofreading, data
verification (e.g. spreadsheets), reading E-mail and more. User-changeable parameters provide
control over the sound quality by allowing for changes in pitch, and the speed of speech. An
exception dictionary saves preferred pronunciation of words and abbreviations.
Monologue Win32 now includes support for the Microsoft SAPI. Monologue male
"SpeechFonts" are available for US English, British English, German, French, Latin American
Spanish, Italian. A US English Female SpeechFont is also available.
For more detailed information and examples go to the First Byte WWW pages.
● Availability: Currently bundled with many sound cards and multimedia general purpose
computer systems. For pricing, licensing details, and release information see the First Byte
WWW pages or email info@firstbyte.davd.com.
● See also: ProVoice Developer's Speech Toolkit from First Byte
● Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610 Fax: 310-793-0611
Email: info@firstbyte.davd.com
WWW: http://www.firstbyte.davd.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:08 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/monologue.html [10/31/2003 8:47:46 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/pam.html

PAM - A Text-To-Speech Application


● Platform: Windows
● Description: PAM is a talking personal assistant and text reader application. It uses the
ProVoice TTS package. PAM will verbally advise about appointments and reminder messages
at specified times during the day. It can read text files, clipboard text, and text sent in DDE
messages. Using the full verbal interface, PAM can be used by visually challenged individuals.
Shareware - thirty day free trial.
● Requirements: Any Windows sound card, speakers or headphones. Min. memory - 4 megs, 8
megs recommended.
● WWW: A more complete description is available on the JTS homepage:
http://www.islandnet.com/~tslemko/
● Availability: The shareware can be downloaded by ftp from
ftp://ftp.islandnet.com/jts/pam_en3c.zip. The file size is approx. 1 MByte.
● Price: $US40 for the registered version.
● Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:48 28-Apr-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/pam.html [10/31/2003 8:47:47 AM]


ProVerbe Speech Engine from ELAN Informatique

ProVerbe Speech Engine from ELAN Informatique


● Platform: Windows 3.x, NT, 95, OS/2, Unix Solaris, Unix SCO and hardware
● Description: The ProVerbe Speech Engine from ELAN Informatique produces natural
sounding speech from written text. Naturalness is achieved by using the TD-PSOLA process
from the CNET (France telecom's research lab.) which is based on the concatenation of
elementary speech units (including diphones). Supported languages are British English,
American English, Russian, German, French and Spanish. For multi-channel applications Elan
Informatique also provides hardware platforms.
Elan Informatique provides a SDK reference document (sdken.doc: WinWord6 format).

● Demo versions: Telephone demonstration: +33-561 17 67 01


Sample sound files and demonstration software available.
A CD-ROM with all these demonstrations is available by registration.
● Contact: Elan Informatique
4 rue Jean Rodier, 31400 TOULOUSE FRANCE
Contact person: Pierre Delrat
Phone: +33-561-36-0777 Fax: +33-61-36-0770
BBS: +33-561-36-0788
E-mail: sales@elan.fr
ftp: ftp://ftp.elan.fr
WWW: http://www.elan.fr/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:47 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/proverbe.html [10/31/2003 8:47:47 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/tinytalk.html

Tinytalk
● Platform: DOS / Windows???
● Description: Shareware package is a speech 'screen reader' which is used by many blind users.
● Price: Tinytalk is now $150. There are package deals on Tinytalk with various speech
synthesizers.
● Availability: Tinytalk is available by anonymous ftp from the following site
Files: ttexe167.zip and ttdoc167.zip (executable and documenation)
ftp://ftp.netcom.com/pub/eb/ebohlman/
(Note: it is a busy ftp server.)
● Contact: Eric Bohlman
OMS Development
610-B Forest Ave., Wilmette, IL 60091
Ph: (800)831-0272 Fax: 708-251-5793
Outside North America: (708)-251-5787
Email: ebohlman@netcom.com

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:21 14-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/tinytalk.html [10/31/2003 8:47:48 AM]


TruVoice from Centigram

TruVoice from Centigram


● Platform: Windows-NT, Windows 95, Windows 3.1 (limited release), Sun Solaris 2.x
● Description: TruVoice., an advanced text-to-speech converter, is available for multiple
environments. TruVoice converts text into spoken language. TruVoice adds intelligible, natural-
sounding speech to sound enabled platforms.
❍ Small, 1.5MB, memory footprint

❍ Advanced text pre-processing

❍ No vocabulary restrictions

❍ User-definable pronunciation dictionary

❍ Accurately pronounces surnames and place names

❍ Preprocessor provides e-mail and spreadsheet reading capabilities and expands

abbreviations.
❍ Multiple languages available: American English, Latin American Spanish, German,

French, Italian
❍ Flexible pitch, volume and speech rate

❍ Intonation support for punctuation

❍ Supports navigational capabilities such as, pause, resume and jump forward / jump back

with sentence or word boundaries


More detailed information is provided in the brochure page on the Centigram WWW site.
A demonstration of TruVoice is available on the Centigram WWW pages.
● Cost:
❍ Windows versions are $495 for the SDK

❍ Solaris versions are $995

❍ Contact Centigram for other pricing.

● Contact: TruVoice Sales


Centigram Communications Corporation
91 East Tasman Drive, San Jose, CA 95134
Ph: (408) 944-0250 Fax: (408) 428-3732
Demo: 800-746 1632
Email: webmaster@centigram.com
WWW: http://www.centigram.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:43 06-Feb-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/truvoice.html [10/31/2003 8:47:49 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/winspeech.html

WinSpeech
● Platform: Windows
● Description: WinSpeech is a text-to-speech application that reads text and produces speech to
the audio output. Features basic text editing tools, talk from editing window, DDE server
allows other Windows applications to send text for talking, coach mode for providing audio
instructions throughout the program, dictionary editing tools for customizing pronunciation.
WSPLIB text-to-speech DLL is a speech functions library for developers. More information
available by email.
● Requirements: System requirements: IBM PC or compatible computer with Windows 3.1 or
higher. Sound card is recommended but not required.
● Availability: Freeware available through the PC WholeWare WWW page.
● Contact: PC WholeWare
33 Justin Street, Lexington, MA 02173, U.S.A.
Email: info@pcww.com
WWW: http://www.pcww.com/index.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 01:02 08-Mar-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/winspeech.html [10/31/2003 8:47:50 AM]


ZMD Speech Synthesis

ZMD Speech Synthesis


"Speaky" Speech Synthesis from ZMD

● Platform: DSP solution for platform independent speech synthesis implementation


● Description: "Speaky" provides German speech synthesis system in a DSP solution. It includes
pre-processing of input ASCII text with unlimited vocabulary, both parametric and non-
parametric speech synthesis algorithms, and prosody modelling. More detailed information and
audio samples can be found at the ZMD WWW Site.
● Contact: Zentrum Mikroelektronik Dresden GmbH
Grenzstrasse 28, D-01109 Dresden, Germany
Ph: +49-351-8822-306, Fax: +49-351-8822-337
Email: assp@zmd-gmbh.de
WWW: http://www.zmd-gmbh.de/

ZMD PCMCIA Speech Synthesis Card

● Platform: MS-DOS, Windows


● Description: Complete text-to-speech synthesis system for the German language with
unlimited vocabulary using VOICE Processor "Speaky". The required pre-processing of the
input ASCII text is performed by a software programm that is downloaded automatically from
the PCMCIA Speech Synthesis Card during the card's initialising routine. Headphone or active
loudspeaker can be connected directly for signal output. More detailed information and audio
samples can be found at the ZMD WWW Site.
● Requirements: PC Card slot, Card & Socket Services Software
● Contact: Zentrum Mikroelektronik Dresden GmbH
Grenzstrasse 28, D-01109 Dresden, Germany
Ph: +49-351-8822-306, Fax: +49-351-8822-337
Email: assp@zmd-gmbh.de
WWW: http://www.zmd-gmbh.de/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:21 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/zmd.html [10/31/2003 8:47:51 AM]


CSRE: Computerized Speech Research Environment

CSRE: Computerized Speech Research


Environment
● Platform: DOS
● Description: CSRE is a software system which includes in an implementation of the Klatt
speech synthesizer. See the CSRE entry in Q1.9 and the AVAAZ WWW pages for more detail.
● Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G 2B0
Ph: +1-519-472-7944 , Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:58 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/csre.html [10/31/2003 8:47:52 AM]


spchsyn.exe

spchsyn.exe
● Platform: DOS
● Availability: By anonymous ftp as a self extracting DOS archive.
ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
● Requirements: May require special TI product(s), but all source is there.

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:09 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/spchsyn.html [10/31/2003 8:47:53 AM]


AsTeR

AsTeR
● Platform: UNIX
● Description: TTS front-end program which encodes structural information about documents in
speech synthesis. For more information check out:
http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html
● Operation requirements: Lisp: Lucid, clisp
● Contact: T. V. Raman
WWW: http://www.research.digital.com/CRL/personal/raman/raman.html
Email: raman@adobe.com

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:20 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/aster.html [10/31/2003 8:47:53 AM]


Emacspeak - A Speech Output Subsystem For Emacs

Emacspeak - A Speech Output Subsystem For


Emacs
● Platform: UNIX, Emacs
● Description: Emacspeak is a speech output system that will allow someone who cannot see to
work directly on a UNIX system. Emacspeak is built on top of Emacs. With emacspeak
loaded, Emacs provides spoken feedback for everything you do. Emacspeak currently supports
the new Dectalk Express speech synthesizer, as well as older versions of the Dectalk e.g. the
MultiVoice. See the Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak
distribution for additional details.
● Requirements: Requires GNU FSF Emacs 19 (version 19.23 or later) and TCLX 7.3B
(Extended TCL) to run Emacspeak.
● Availability:
Emacspeak WWW page
http://www.research.digital.com/CRL/personal/raman/emacspeak/emacspeak.html
Emacspeak source
http://www.research.digital.com/CRL/personal/raman/emacspeak/emacspeak.tar.gz
● Contact: T. V. Raman, raman@adobe.com

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:20 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/emacspeak.html [10/31/2003 8:47:54 AM]


Festival Speech Synthesis System

Festival Speech Synthesis System


● Platform: General Unix (including Solaris (2.4,2.5), SunOS, HPUX, SGIs, Linux, Dec Alpha,
FreeBSD)
● Description: Festival is a general multi-lingual speech synthesis system developed at CSTR,
University of Edinburgh. It offers a full text to speech system with various APIs, as well an
environment for development and research of speech synthesis techniques. It is written in C++
with a Scheme-based command interpreter for general control. Festival's home page offers
demos, the full manual and access to the download page. The distribution includes full source
and documentation, and lexicons and speech databases for British English text to speech.
● Price: Free for non-commercial use
● Availability: by anonymous ftp:
WWW: http://www.cstr.ed.ac.uk/projects/festival/download.html
ftp: ftp://ftp.cstr.ed.ac.uk/pub/festival/1.1.1/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:28 31-Jan-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/festival.html [10/31/2003 8:47:55 AM]


JSRU

JSRU
● Platform: UNIX and PC
● Cost: 100 pounds sterling (from academic institutions and industry)
● Description: A C version of the JSRU system, Version 2.3 is available. It's written in Turbo C
but runs on most Unix systems with very little modification. A Form of Agreement must be
signed to say that the software is required for research and development only.
● Contact: Dr. E.Lewis eric.lewis@bristol.ac.uk)

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:20 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/jsru.html [10/31/2003 8:47:55 AM]


Klatt-style synthesiser

Klatt-style synthesiser
● Platform: Unix
● Cost: Free
● Description: Software posted to comp.speech in late 1992.
● Availability: By ftp from the comp.speech ftp site
❍ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.gz

❍ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.Z

● See also: KPE80 - A Klatt Synthesiser and Parameter Editor.

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 11:47 22-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/klatt.html [10/31/2003 8:47:56 AM]


KPE80 - A Klatt Synthesiser and Parameter Editor

KPE80 - A Klatt Synthesiser and Parameter Editor


● Platform: Unix
● Description: The KPE80 program provides a graphical interface for the implementation of the
Klatt 1980 formant synthesiser written by Jon Iles and Nick Ing-Simmons. It was inspired by
IGE, a piece of code written by Rob Fletcher ( http://www.york.ac.uk/~rpf1/IGE.html).
● Technical Desc.: It is comprised of an X-Window interface and version 3.03 of the synthesiser
code. The interface allows users to display and edit Klatt parameters using a graphical display
which includes the time-amplitude waveform of both the original speech and its synthetic
copy, and some signal analysis facilities. Most of the work in choosing the parameter values to
produce the synthetic copy has to be done by the user. KPE will estimate the fundamental
frequency contour from an original token; this estimate will need to be amended where errors
occur. It is possible to specify the formant trajectories with some precision by overlaying the
appropriate formant frequency parameter tracks on the spectrogram of the target waveform. A
number of facilities exist to help in the refinement of parameter values: original and synthetic
waveforms can be compared aurally, spectrally, and spectrographically using built-in speech
analysis facilities.
● File formats: KPE will read RIFF (.wav) files and SFS files. (SFS is a suite of speech-signal
processing programs available free from Phonetics and Linguistics, UCL.)
● Availability:
KPE for SunOs 4.1.3 (statically compiled libraries)
ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z
KPE for Linux (statically compiled libraries)
ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z
The source code (needs gcc and SUIT to compile)
ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z
A postscript overview of KPE
ftp://pitch.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps
The SFS distribution
ftp://pitch.phon.ucl.ac.uk/pub/sfs/
● See also: Public domain Klatt-style speech synthesis code.
● Contact: Andrew Simpson
Department of Phonetics and Linguistics, University College London
Wolfson House, 4 Stephenson Way, London NW1 2HE
Email: a.simpson@ucl.ac.uk
WWW: http://www.phon.ucl.ac.uk/home/andrew/home.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 03:04 01-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/klatt.kpe80.html [10/31/2003 8:47:57 AM]


"learph": Trainable text-to-phoneme software by Antonio Lucca

"learph": Trainable text-to-phoneme software by


Antonio Lucca
● Platform: UNIX
● Description: Experimental software which learns text to phoneme translation from examples
using decision-tree-like data structures. It is based on the assumption that each letter can
correspond to different phoneme strings depending on the context.
● Availability: Examples and source are available on the WWW:
http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
● Contact: Antonio Lucca: toninlcc@tesi.dsi.unimi.it

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:39 03-Feb-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/learph.html [10/31/2003 8:47:57 AM]


Lucent Technologies Bell Labs Text-to-Speech system

Lucent Technologies Bell Labs Text-to-Speech


system
● Platform: UNIX and Win-95/NT
● Description:Lucent Technologies provides a web site with demos and samples of their latest
speech synthesis technology. The site has interactive demos in American English, German, and
Mandarin Chinese, and the capability to adjust voice parameters on the fly. Pre-synthesized
demos for French, Italian, Russian, and Romanian are also provided.
The site includes downloadable papers with detailed system descriptions.
● WWW: http://www.bell-labs.com/project/tts/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:59 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/lucent.html [10/31/2003 8:47:58 AM]


SGI Developers Toolbox Synthesiser

SGI Developers Toolbox Synthesiser


● Platform: SGI
● Description: The SGI Developer Toolbox 4.0 CDROM contains a basicpublic domain text-to-
speech program in the publics/speak directory. The directory includes man pages and source.
● Availability: on the SGI Developer Toolbox 4.0 CDROM

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:21 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/sgi.html [10/31/2003 8:47:59 AM]


Speak

"Speak" - a Text to Speech Program


● Platform: Sun SPARC
● Description: Text to speech program based on concatenation of pre-recorded speech segments.
A function library can be used to integrate speech output into other code.
● Hardware: SPARC audio I/O
● Availability: by anonymous ftp
ftp://wilma.cs.brown.edu/pub/speak.tar.Z

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:48 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/speak.html [10/31/2003 8:47:59 AM]


TrueTalk

TrueTalk
● Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or SGI Indy/Indigo/Indigo2
with IRIX 5.2. More platforms in development.
● Description: Personal TrueTalk, by Entropic Research Laboratory, Inc., is an all-software Text-
to-Speech (TTS) system designed to voice-enable UNIX X-Windows workstations. It
combines a graphical interface with a powerful TTS engine based on technology developed by
AT&T Bell Laboratories. Features include:
❍ Intelligible, prosodically natural speech.

❍ Text taken from file input, highlighted X selections, the interface scratch pad, other

programs connected through a TCP/IP socket, or Tcl/Tk applications via the Tk "send"
mechanism.
❍ Stop, pause and resume while speech is in progress.

❍ Visual indication of corresponding text position when paused.

❍ Nine speaking voices, with Male and Female versions of each voice.

❍ Adjustable speaking rate and volume.

❍ Supports drop-in text filters; "email" and "lively" examples included.

❍ Audio output through workstation headphones or speaker.

❍ Complete on-line documentation, including mouse-activated help windows.

● Misc: A more detailed description of TrueTalk is available on the Entropic WWW server:
http://www.entropic.com/truetalk.com
● Availability: You can obtain Personal TrueTalk through the Internet. For details, see
ftp://ftp.entropic.com/pub/truetalk/README.ptt
Personal TrueTalk is available free of charge for evaluation purposes. You can fully-enable
your evaluation copy at any time by purchasing a license key from Entropic.
● Requirements: 12MB disk space, 8MB process size (24MB system RAM recommended).
● Cost: US$495; US$395 academic
● Contact: Entropic Research Laboratory, Inc.,
Washington, D.C.
Voice: 1-800-ENTROPIC (North America), (202) 547 1420
Fax: (202) 547-6648
Email: truetalk@entropic.com
WWW: http://www.entropic.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:59 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/truetalk.html [10/31/2003 8:48:00 AM]


Eurovocs

Eurovocs
● Platform: Various - RS232 hardware connection
● Description: Eurovocs is a stand-alone text-to-speech synthesizer which uses the text-to-
speech technology of Lernout and Hauspie Speech Products. Available for Dutch, French,
German and American English with other languages planned for release soon. One Eurovocs
device can support two different languages. Eurovocs can be connected to any computer via a
standard serial interface (RS232). It supports personal dictionaries, generation of DTMF tones,
and pronunciation of special character sequences such as digit strings, telephone-numbers, date
and time indications, abbreviations, alphanumeric strings etc.
● Contact: Technologie & Revalidatie
Postbus 128, B-9000 Gent, Belgium
Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
E-mail: noe@elis.rug.ac.be
WWW: http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:08 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/eurovocs.html [10/31/2003 8:48:01 AM]


RC Systems V8600/V8601 Text to Speech synthesizers

RC Systems V8600/V8601 Text to Speech


synthesizers
● Platform 1: IBM PC: ISA card.
● Platform 2: Interface to PC/104 standard microcontrollers.
● Platform 3: Standalone (or embedded) hardware thru RS232 or parallel printer port or
processor bus.
● Description: Converts plain ASCII text to speech. Programmable voices, pitch rate, volume,
etc. Built-in DTMF and tone generators.
● Price: $151-$299 US (qty 1)
● Contact: RC Systems
1609 England Avenue, Everett, WA 98203, USA
Ph: (206) 355-3800 Fax: (206) 355-1098
Europe: +44181 539-0285

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:09 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/rc.systems.html [10/31/2003 8:48:01 AM]


TheBigMouth (NeXT)

TheBigMouth - a Text to Speech Program


● Platform: NeXT
● Description: Text to speech program based on concatenation of pre-recorded speech segments.
● Availability: ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:02 16-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/bigmouth.html [10/31/2003 8:48:02 AM]


Narrator Translator Library (Amiga)

Narrator Translator Library


● Platform: Amiga
● Description: A US English text to phoneme translator, implemented as a resident software
library, for use with the Amiga Narrator Device. This software was supplied as a standard part
of the Amiga operating system software up to O.S version 2.04. (Translator version 37.1,
1991) Approximately 700 translation rules are used to create the 'ARPAbet' phonemes. This
software is functional on all current Amiga systems (O.S. 3.1).
● Availability: limited to pre-owned system software disks and unsold O.S upgrade kits (Pre-
O.S. 2.1).

Replacement Library: Translator42


● Platform: Amiga
● Description: an independent replacement for the Commodore-supplied "translator.library"
which is a part of the Narrator speech synthesis package. It implements multi-lingual text-to-
speech for an Amiga. The translation rules for each language are defined in a plain text
'Accent' file.
There is a provision for the selection of unique languages for text segments by inserting in-line
markup codes in the text: e.g.
"Hello there! \french{Bonjour} \deutsch{gute morgen}".
'Accent' files for American English, British English, Swedish, Maori, Finnish, German,
Icelandic, Klingon, Polish, Italian, and Welsh languages included in the archive.
● Availability: Amiga The most current version, 42.4, of the library and source are available by
anonymous ftp from Aminet:
ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:59 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/narrator-translator.html [10/31/2003 8:48:03 AM]


Narrator (Amiga)

Narrator
● Platform: Amiga
● Description: Formant based speech synthesis. Includes a Engish-to-phoneme translation
library, and a SPEAK: pseudo-device for speech output.
● Hardware: Standard Amiga hardware
● Availability: Part of AmigaOS
● See Also: The Narrator Translation library

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:59 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/narrator.html [10/31/2003 8:48:03 AM]


TextToSpeech Kit (NeXT)

TextToSpeech Kit
● Platform: NeXT Computers
● Description: The TextToSpeech Kit does unrestricted conversion of English text to
synthesized speech in real-time. The user has control over speaking rate, median pitch, stereo
balance, volume, and intonation type. Text of any length can be spoken, and messages can be
queued up, from multiple applications if desired. Real-time controls such as pause, continue,
and erase are included. Pronunciations are derived primarily by dictionary look-up. The Main
Dictionary has nearly 100,000 hand-edited pronunciations which can be supplemented or
overridden with the User and Application dictionaries. A number parser handles numbers in
any form. A letter-to-sound knowledge base provides pronunciations for words not in the Main
or customized dictionaries. Dictionary search order is under user control. Special modes of text
input are available for spelling and emphasis of words or phrases. The actual conversion of text
to speech is done by the TextToSpeech Server. The Server runs as an independent task in the
background, and can handle up to 50 client connections.
● Misc: The TextToSpeech Kit comes in two packages: the Developer Kit and the User Kit. The
Developer Kit enables developers to build and test applications which incorporate text-to-
speech. It includes the TextToSpeech Server, the TextToSpeech Object, the pronunciation
editor PrEditor, several example applications, phonetic fonts, example source code, and
developer documentation. The User Kit provides support for applications which incorporate
text-to-speech. It is a subset of the Developer Kit.
● Hardware: Uses standard NeXT Computer hardware.
● Cost:
❍ TextToSpeech User Kit: $175 CDN ($145 US)

❍ TextToSpeech Developer Kit: $350 CDN ($290 US)

❍ Upgrade from User to Developer Kit: $175 CDN ($145 US)

● Availability: Trillium Sound Research


1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:21 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/next.tts.html [10/31/2003 8:48:04 AM]


WreadFiles: File reader for Commodore Amiga

WreadFiles: File reader for Commodore Amiga


● Platform: Commodore Amiga
● Description: WreadFiles is a vocal text file reader program for use on the Commodore Amiga.
The text is printed to the screen and spoken. Features include:
❍ Text is read in sentences rather than lines.

❍ Dynamic Speech Correction on over 4000 word or word fragments.

❍ Pronunciations for many place names, personal names, foreign names, foreign

expressions and abbreviations.


❍ Run from Workbench or CLI.

❍ Used with A1000 (OS 1.3), A3000 (OS 2.04-2.1), and A4000 (OS 3.0)

● Requirements: Standard Amiga Translator.library and Narrator.device required. 2.04 versions


recommended. 1 Meg or more ram recommended. External speakers required.
● Availability: No fee requested for non-commercial use. From:
❍ GEnie: Page 555,3 File Number 24627

❍ Aminet ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha

● Contact: Written by Michael L. Barlow


Email: M.Barlow1@GEnie.geis.com or mbarlow@pacific.telebyte.com or
MikeB@cuix.pscu.com

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:03 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/wreadfiles.html [10/31/2003 8:48:05 AM]


Lernout and Hauspie Text-To-Speech (3 products)

Lernout & Hauspie Text-to-Speech (3 products)


Lernout & Hauspie have three TTS products. The functionality of the products is similar, however,
they differ in hardware implementation and other details where described below.

● L&H tts2000/T: TTS for the Telephony and Telecommunications Market


● L&H tts2000/M: TTS for the Computer and Multimedia Market
● L&H tts3000/C: TTS for the Buisness and Consumer Electronics Market

● Description: Text to Speech (TTS) software based on parameterized segment concatenation


(diphones, triphones and tetraphones) algorithms. Available for US English, German, Dutch,
French, Spanish (Castilian), Italian and Korean. General features include:
❍ The control of volume, speech rate and speech pitch.

❍ The use of control sequences to customize TTS output (adding pauses, using phonetic

input, etc.).
❍ Switching between languages at run time.

❍ A personal vocabulary editor is available for building exception dictionaries.

❍ Readout modes: letter by letter, word by word or sentence by sentence.

❍ Input formats: orthographic input, phonetic input, phonetic input with prosodic

information.
● tts2000/T
❍ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.

❍ Sampling Frequency: 8kHz

❍ Single channel platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI

TMS320C31, AT&T DSP3210


❍ Multi-channel platform examples: TI TMS320C31, AT&T DSP3210

● tts2000/M
❍ Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit

linear PC.
❍ Sampling Frequency: 8/10/11.025 kHz

❍ Single processor platform examples: ARM6/ARM7, Intel 386/486/Pentium, Motorola

68040
❍ Two processor platform examples: {Intel 386/486/Pentium or Motorola 68030} and

{ADI ADSP21XX or Motorola 5600X or TI TMS320C25/20C5X}


● tts3000/C
❍ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.

❍ Sampling Frequency: 10kHz

❍ Single processor platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI

TMS320C31, AT&T DSP3210


❍ Two processors platform examples: { SHARP SH7000 or ARM6/ARM7 or Intel

386EX or Motorola 683XX} and {ADI ADSP21XX or Motorola 5600X or TI


TMS320C25/C5X or TI TSP50C10}
● See also: L&H Windows TTS SDK
● More Information: on the Lernout & Hauspie WWW pages: http://www.lhs.com/tts.html
● Price: Unknown

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/lernout.haupsie.html (1 of 2) [10/31/2003 8:48:05 AM]


Lernout and Hauspie Text-To-Speech (3 products)

● Contact: Lernout and Hauspie Speech Products


20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:31 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/lernout.haupsie.html (2 of 2) [10/31/2003 8:48:05 AM]


SIMTEL

SIMTEL
A wide range of speech related software, sound-blaster software and signal processing software for
PCs is available on SimTel and its mirror sites. It can be obtained by ftp from:

ftp://ftp.coast.net/SimTel/msdos/voice/

and is now on the WWW:

http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

Voicemaker

The archives include the program Voicemaker which synthesises speech from phonemes using
"concatenation" of phonemes recorded by the user. Voicemaker is a freeware program. It requires an
IBM or compatible, 512KB RAM, sound blaster compatible sound card.

ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:26 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/simtel.html [10/31/2003 8:48:06 AM]


Text to Phoneme Program 1

Text to phoneme program (1)


● Platform: unknown
● Description: Text to phoneme program. Based on Naval Research Lab's set of text to phoneme
rules.
● Availability: by anonymous ftp
ftp://shark.cse.fau.edu/pub/src/phon.tar.Z

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:09 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/text.phoneme.1.html [10/31/2003 8:48:07 AM]


Text to phoneme program 2

Text to phoneme program (2)


● Platform: unknown
● Description: Text to phoneme program.
● Availability: by anonymous ftp
ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:08 24-Feb-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/text.phoneme.2.html [10/31/2003 8:48:08 AM]


Text to phoneme program 3

Text to phoneme program (3)


● Description: A public domain version of the same Naval Research Lab text to phoneme rules.
● Availability: By anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phoneme.tar.gz

Back to Q5.5 of Section 5 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:02 24-Feb-1997

http://mi.eng.cam.ac.uk/comp.speech/Section5/Synth/text.phoneme.3.html [10/31/2003 8:48:08 AM]


Lotec Speech Recognition Package

Lotec Speech Recognition Package


● Platform: Sun
● Description: Public domain speech recognition software. Operates from input in Sun audio
format (.au files) and outputs word hypotheses and time labelling data. The software includes
programs to collect speech samples, a labeller, a "featurizer" which parameterises speech files,
a word spotter and the recogniser. The software can real time recognition on a Sparc 10 for
small vocabularies.
● Requirements: Sun SPARC audio input and a "decent" microphone Sun multimedia demo
software (in /usr/demo/SOUND) and X.
● Availability: By anonymous ftp
ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
● Contact: Nigel Ward: nigel@sanpo.t.u-tokyo.ac.jp

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:38 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/lotec.html [10/31/2003 8:48:09 AM]


Myers' Hidden Markov Model software

Myers' Hidden Markov Model software


● Platform: Unix
● Description: Hidden Markov model software for automatic speech recognition. C++ code that
implements a basic left-right hidden Markov model and corresponding Baum-Welch (ML)
training algorithm. It is meant as an example of the HMM algorithms described by L.Rabiner
and others. The code was built in order to learn how HMM systems work and we are now
offering it to the net so that others can learn how to use HMMs for speech recognition. Keep in
mind that ease of understanding was our primary concern, not efficiency. The code can be used
to build an experimental speech recognition systems using "train_hmm" and "test_hmm", and
can be used in conjunction with written tutorials on HMMs to understand how they work.
● Availability: By anonymous ftp from the comp.speech archive site. There are two files in the
directory
❍ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/

The files are


❍ hmm.README

❍ hmm-1.03.tar.gz

● Contact: Richard Myers: rmyers@isx.edu

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:58 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/myers.hmm.html [10/31/2003 8:48:10 AM]


Q1.4: Related newsgroups and mailing lists

Q1.4: Related newsgroups and


mailing lists.
Newsgroups
comp.ai - Artificial Intelligence newsgroup.
Postings on general AI issues, language processing and AI techniques. The comp.ai FAQ
covers NLP, NN and other AI information.

comp.ai.nat-lang - Natural Language Processing Group


Postings regarding Natural Language Processing. Set up to cover a broard range of related
issues and different viewpoints. A comp.ai.nat-lang FAQ posting is available.

comp.ai.nlang-know-rep - Natural Language Knowledge Representation


Moderated group.

comp.ai.neural-nets - discussion of Neural Networks and related issues.


There are often posting on speech related matters - phonetic recognition, connectionist
grammars and so on. A comp.ai.neural-nets FAQ posting is available.

comp.compression - occasional articles on compression of speech.


The comp.compression FAQ has some info on audio compression standards.

comp.dcom.telecom - Telecommunications newsgroup.


Has occasional articles on voice products.

comp.dsp - discussion of signal processing - hardware and algorithms and more.


Has a good FAQ posting which is also available on the WWW and by ftp (addresses below).
Has a regular posting of a comprehensive list of Audio File Formats.
❍ http://www.bdti.com/faq/dsp_faq.htm

❍ ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

comp.multimedia - Multi-Media discussion group.


Has occasional articles on voice I/O.

sci.lang - Language.
Discussion about phonetics, phonology, grammar, etymology and lots more. A sci.lang FAQ is
available.

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.4.html (1 of 3) [10/31/2003 8:48:11 AM]


Q1.4: Related newsgroups and mailing lists

alt.sci.physics.acoustics
Some discussion of speech production & perception.

alt.binaries.sounds.* - posting and discussion of sound samples.

Mailing Lists
Voice-Users Mailing List
For discussion of any aspect of using voice recognition systems.
❍ Using such systems safely, without muscle or voice strain

❍ Techniques for improving recognition accuracy

❍ How to set up the physical voice workstation

❍ Tips for effective use of voice interfaces

❍ Configuration of specific systems, troubleshooting, etc

To subscribe fill out the web-based subscription form


Posts to the list should go to: voice-users@voicerecognition.com
Colibri
News about language, speech, logic and information.
Email: colibri@let.ruu.nl
WWW: http://colibri.let.ruu.nl/
ECTL - Electronic Communal Temporal Lobe
Founder & Moderator: David Leip. Moderated mailing list for researchers with interests in
computer speech interfaces. This list serves a broad community including persons from signal
processing, AI, linguistics and human factors. To subscribe, send your name, institute,
department, daytime phone and email address to:
❍ ectl-request@snowhite.cis.uoguelph.ca

The ECTL archive site is


ftp://snowhite.cis.uoguelph.ca/pub/ectl
Prosody Mailing List
Unmoderated mailing list for discussion of prosody. The aim is to facilitate the spread of
information relating to the research of prosody by creating a network of researchers in the
field. If you want to participate, send the following one-line message to
❍ listserv@msu.edu

❍ subscribe prosody Your Name

foNETiks
A moderated monthly newsletter distributed by e-mail. It carries job advertisements, notices of
conferences, and other news of general interest to phoneticians, speech scientists and others.
The editors are Linda Shockey and Gerry Docherty. To subscribe send the following 1 line
message to
❍ mailbase@mailbase.ac.uk

❍ join fonetiks your_first_name your_second_name

Digital Mobile Radio

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.4.html (2 of 3) [10/31/2003 8:48:11 AM]


Q1.4: Related newsgroups and mailing lists

Covers lots of areas include some speech topics including speech coding and speech
compression. Mail Peter Decker dec@dfv.rwth-aachen.de to subscribe.

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 17:09 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.4.html (3 of 3) [10/31/2003 8:48:11 AM]


Digital Dreams Speech Recognition Plug-Ins

Digital Dreams Speech Recognition Plug-Ins


● Platform: Apple Macintosh
● Description (General): A suite of speech plug-ins for the interactive multimedia market which
enable developers to quickly incorporate speech recognition into their titles without having to
resort to a low-level programming language, such as C. Speech plug-ins bridge the gap
between a speech recognition API, such as Apple's PlainTalk Speech Recognition technology,
and authoring/development environments, such as Macromedia Director or HyperCard. Digital
Dreams currently offers Macintosh speech plug-ins for Macromedia Director and HyperCard.
Support for other environments, including AppleScript, Apple Media Tool, Authorware, and
Windows is being developed. Currently available for North American Adult English. More
information is available on the Digital Dreams WWW site.
● ShockTalk: is a combination of Netscape, ShockWave and Speech Recognition technologies
for the Power Macintosh and Quadra AVs that enables you to navigate web sites and
hyperlinks using spoken commands as well as create shockwave movies that respond to spoken
user interactions.
● Requirements: Power Macintosh (PowerPC w/ MacOS)
Microphone (PlainTalk compatible)
PlainTalk Speech Synthesis and PlainTalk Speech Recognition
Netscape Navigator
● Contact: Digital Dreams
4308 Harbord Drive, Oakland, CA, 94618, USA
Tel: (510) 547-6929 Fax: (510) 547-6799
email: dreams@surftalk.com
WWW: http://www.surftalk.com/
FTP: ftp://ftp.surftalk.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:16 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/digital.dreams.html [10/31/2003 8:48:12 AM]


Dragon Dictation Products

Dragon Dictation Products


● Dragon NaturallySpeaking
● DragonDictate for Windows
● Dragon PowerSecretary
● General Information

Dragon NaturallySpeaking

● Platform: Windows
● Description: General purpose, continuous speech dictation system. Personal Edition has a
30,000 word active vocabulary and comes with a 200,000+ word pronunciation dictionary;
users can also add their own words or phrases.
More information on Dragon's NaturallySpeaking web site.
● Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM (Windows NT
4.0), supported sound card.
● Price: see Dragon's NaturallySpeaking web site.
● Related products: see general information below
● Contact: see general information below

DragonDictate for Windows

● Platform: Windows
● Description: Speech-to-text dictation system. Discrete dictation; continuous command/control;
speaker-adaptive. Also provides mouse movement for hands-free operation of Windows.
Comes with a 120,000 word pronunciation dictionary; users can also add their own words or
phrases. Dictate directly into any application. Available in US and UK English, French, Italian,
German, Spanish, and Swedish. Add-on vocabularies for medicine, law, business and finance,
computers and technology, journalism.
Available as DragonDictate Singles Editions (10,000 words active), DragonDictate Personal
Edition (10,000 words active), DragonDictate Classic Edition (30,000 words active),
DragonDictate Power Edition (60,000 words active).
Includes Office97 support.
More information on the Dragon Systems web site.
● Requirements: 486/66, 7-10 MB dedicated RAM (depending on edition), Windows 3.1x, NT
3.51, or 95.
Supported sound boards: Creative Labs Sound Blaster 16, Microsoft Windows Sound System,
IBM M-Audio Capture/Playback Adapter, many notebooks with built-in audio.
See Dragon Systems Compatibility list for details.
● Price: Check at the Dragon Systems web site.
● Related products: see general information below
● Contact: see general information below

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/dragon.dictation.html (1 of 3) [10/31/2003 8:48:15 AM]


Dragon Dictation Products

Dragon PowerSecretary

● Platform: Apple Macintosh


● Description: Speaker dependent/adaptive system requiring words to be separated by short
pauses. Available as PowerSecretary Power Edition, Personal Edition, PowerSecretary MED
for Healthcare Professionals.
Vocabulary: 30,000 - 60,000 at any one time, automatically selected from 120,000-word
dictionary.
● Requirements: Power Macintosh 6100, 7100, 8100, Performa 6100 series, Powerbook 540,
68040 class Macintosh such as Quadra 660AV, 700, 800, 840AV, 900, 950, Centris 650 and
660AV.
Hard Disk with at least 25Mb free.
System 7.5 or greater
(Some systems require add-on hardware)
● More information: PowerSecretary home page
● Related products: see general information below
● Contact: see general information below

General Information

Dragon Dictation Products

● Dragon NaturallySpeaking
● DragonDictate for Windows
● Dragon PowerSecretary
● General Information

Dragon Developer Products

● Dragon PhoneQuery
● DragonXTools
● Dragon SpeechTool
● Dragon VoiceTools

Related Web Sites

● Simon Crosby's FAQ for DragonDictate

Contact:

● Dragon Systems, Inc.


320 Nevada Street, Newton, MA 02160, USA

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/dragon.dictation.html (2 of 3) [10/31/2003 8:48:15 AM]


Dragon Dictation Products

Tel: 1-617-965-5200 or 1-800-TALK-TYP


Fax: 1-617-527-0372
Email: info@dragonsys.com
WWW: http://www.dragonsys.com/
CompuServe: GO DRAGON

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:11 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/dragon.dictation.html (3 of 3) [10/31/2003 8:48:15 AM]


Macintosh Speech Recognition Manager

Macintosh Speech Recognition Manager


● Platform: Macintosh
● Description: supports developers who wish to add speech recognition to existing Macintosh
applications. Provides speaker independent recognition and robustness to noise. Apple's
Speech home page provides developer information and the complete speech recognition and
synthesis synthesis SDKs. The recognition SDK includes samples code, control panels,
interfaces, documentation and the recognizer.
● Availability: under licensing conditions from the Macintosh Speech Developer's page
http://www.speech.apple.com/speech/dev/dev.html.
● Requirements: Power Macintosh with 16-bit sound, System 7.5, and a PlainTalk Microphone
or equivalent
● Cost: Free
● See also: Macintosh Plaintalk and Speech Manager (Q5.5).
● Note: Check out Kevin Lenzo's list of Macintosh Speech Applications.
● Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:26 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/macintosh.html [10/31/2003 8:48:15 AM]


PowerSecretary

Dragon PowerSecretary
● Platform: Apple
● Description: Information moved to the page on Dragon Dictation products including Dragon
PowerSecretary
(Previously Articulate PowerSecretary.)

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:34 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/powersecretary.html [10/31/2003 8:48:16 AM]


AT&T Watson Speech Recognition

AT&T Watson Speech Recognition


● Platform: Windows 95/NT on a Pentium 75 Mhz or higher
● Description: Watson is a software implementation of AT&T Bell Laboratories voice
processing technology. Watson includes BLASR Speech Recognition and FlexTalk speech
synthesis (see Q5.5). It requires no special hardware to run other than a standard sound card
and/or phone card. Technical details for BLASR Speech Recognition include:
❍ Compliant with Microsoft Speech API and Telephone API

❍ Speaker independent, continuous speech recognition

❍ Fast, run-time vocabulary change

❍ Open mic and telephone line environments

❍ SoundBlaster compatible sound card and drivers required

❍ Subword models and whole-word digit models

❍ Background, silence, and filler/garbage models

❍ 50 word name vocabulary or 100 word phrase real-time recognition with 95% accuracy

❍ Rejection of out-of-vocabulary words

❍ American English only - other languages in development

❍ Barge-in speech begin/end notification - requires hardware echo cancellation

The AT&T Advanced Speech Products Group home page provides more detailed information
including a Frequently Asked Questions list, information for application developers on the
Independent Software Vendor (ISV) Program (including info on the SDK, licensing, and the
training program).
● Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz or higher CPU
(uses < 50% CPU).
● Cost and Availability: WATSON is a software-based speech platform with a Software
Developers Kit (SDK) that allows application developers to use voice processing in their
applications. It is not available as a stand-alone product.
Licensing information (inc. price) is provided in the AT&T Advanced Speech Products Group
home page
● See also: Watson FlexTalk speech synthesis in Q5.5, Microsoft Speech API, and Advanced
Speech API.
● Contact: AT&T Advanced Speech Products Group
Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
Email: aspg@attmail.com
WWW: http://www.att.com/aspg/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:49 31-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/att.html [10/31/2003 8:48:17 AM]


Cambridge Voice for Windows

Cambridge Voice for Windows


● Platform: Windows
● Description: Speaker-independent recognition of continuous speech in real time. Vocabularies
can range from small to very large (more than 60,000 word forms). Support is planned for
languages including English, Danish, Dutch, French, German, Italian, Norwegian, Spanish,
Swedish, and Japanese. The engine complies with the Microsoft Speech API.
● Contact: Cambridge Group Research, Ltd.
Box 7290, Buffalo Grove, IL 60089
Ph: (708) 821-1040, Fax: (708) 821-1041
E-mail: 76061.3350@compuserve.com

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:51 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/cambridge.voice.html [10/31/2003 8:48:18 AM]


CustomVoice and CustomTelephone: A&G Graphics Interface Inc.

CustomVoice and CustomTelephone: A&G


Graphics Interface Inc.
● Platform: Windows
● CustomVoice: Speech recognition custom control for Visual Basic, Visual C++, Borland C++,
and other development platforms that support *.VBX. Provides an engine/proprietary
independent development platform for speech recognition. Currently supports ICSS, but
should soon support other platforms. Includes a grammar debugger and parser APIs to parse
spoken speech into useful data types.
Requirements: 486/DX or better PC, Windows 3.1 or Windows for Workgroups, 8Mb RAM
(minimum), SoundBlaster 16, microphone, and mouse. Supports Visual Basic, Visual C++,
Borland C++, and Delphi.
● CustomTelephone: Windows-based developers tool that allows programmers to build speech
enabled "telephony" applications via standard custom control properties (VBX). It supports
IBM VoiceType Application Factory (VTAF), a continuous speech, speaker independent
speech recognizer, and supports voice response boards such as Dialogic. Comes with a VB
custom control, pre-built grammar sets for common data types, an interactive grammar
debugger to identify valid speech patterns, and parser API functions that convert recognized
speech into data types supported by VB, C++ and Delphi. Includes sample applications with
source code, and VBX, VCL and DLLs. Bundled with speech recognition engines.
Requirements: 486/DX or better, Windows 3.1 or Windows for Workgroups, 8Mb RAM
(minimum), SoundBlaster or compatible sound card, Dialogic D2X or D4X board, and mouse.
Microphone and speaker optional. Supports Visual Basic, Visual C++, Borland C++, and
Delphi.
● Contact: A&G Graphics Interface
51 Gore Street, Cambridge, MA 02141-1213 , USA
Ph: +1-617-492-0120, Fax: +1-617-427-2133
Email: customvc@world.std.com
CompuServe: 74774,273 CompuServe ( GO SPEECH )
WWW: http://www.customvoice.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:45 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/custom.voice.html [10/31/2003 8:48:18 AM]


DragonDictate for Windows

DragonDictate for Windows


● Platform: Windows
● Description: Information moved to the page on Dragon Dictation products including
DragonDictate for Windows

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:33 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/dragon.dictate.windows.html [10/31/2003 8:48:19 AM]


Dragon Developer Tools

Dragon Developer Tools


● Dragon PhoneQuery
● DragonXTools
● Dragon SpeechTool
● Dragon VoiceTools

Dragon PhoneQuery

● Platform: Windows NT
● Description: Software for building voice response systems. Callers are able to do the
following: Ask for information using completely natural and continuous language. Have a
spoken dialog to fine tune a request. Request information to be faxed, sent by electronic mail,
or read over the phone, using text-to-speech.
More information on the Dragon Systems telephony pages.
● Requirements: Pentium or Pentium Pro PC running Windows NT 4.0. Telephone interconnect
requirements vary by application.
● Related products: see general information below
● Contact: see general information below

DragonXTools

● Platform: Windows
● Description: VBX and OCX controls that allow an application to control DragonDictate's
capabilities, ranging from small vocabulary command and control to customized large
vocabulary dictation. More information is available on the Dragon Developer pages
● Related products: see general information below
● Contact: see general information below

Dragon SpeechTool

● Platform: Windows
● Description: Create small, optimized vocabularies for your speech-enabled applications, or
supplement DragonDictate's extensive built-in vocabularies with specialized terms and names.
More information is available on the Dragon Developer pages
● Related products: see general information below
● Contact: see general information below

Dragon VoiceTools

● Platform: Windows, DOS

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/dragon.tools.html (1 of 2) [10/31/2003 8:48:20 AM]


Dragon Developer Tools

● Description: integrate small-vocabulary speech recognition directly into your DOS and
Windows 3.1x applications. More information is available on the Dragon Developer pages
● Related products: see general information below
● Contact: see general information below

General Information

Dragon Dictation Products

● Dragon NaturallySpeaking
● DragonDictate for Windows
● Dragon PowerSecretary
● General Information

Dragon Developer Products

● Dragon PhoneQuery
● DragonXTools
● Dragon SpeechTool
● Dragon VoiceTools

Related Web Sites

● Simon Crosby's FAQ for DragonDictate

Contact:

● Dragon Systems, Inc.


320 Nevada Street, Newton, MA 02160, USA
Tel: 1-617-965-5200 or 1-800-TALK-TYP
Fax: 1-617-527-0372
Email: info@dragonsys.com
WWW: http://www.dragonsys.com/
CompuServe: GO DRAGON

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:27 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/dragon.tools.html (2 of 2) [10/31/2003 8:48:20 AM]


Ficomp Interpreter 6000

Ficomp Interpreter 6000


● Platform: DOS, Windows 3.1, Win95, Win NT, UNIX
● Description: Ficomp Systems, inc., is a systems integrator that has developed commercial
speaker-dependent, continuous-speech recognition applications for use in high noise
environments on several platforms. Applications are specialized in the finance industry for
exchange floors, banks and brokerage firms.
● Contact: Ficomp Systems, Inc.
Ph: (732) 274-2600, Fax: (732) 274-2601
117 Docks Corner Road, Dayton, NJ 08810
E-Mail: fsisales1@aol.com
WWW: http://www.ficompsystems.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:34 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/ficomp.html [10/31/2003 8:48:21 AM]


IBM VoiceType Dictation and Control

IBM VoiceType Dictation


● Platform: OS/2 and Windows
● Description: IBM VoiceType Dictation supports speech input at 70-100 words a minute and
can be used to control your desktop and applications. Isolated-word, speaker-dependent system
using a speech adapter card. Available for U.S. English, U.K. English, French, German, Italian,
Spanish and Arabic. Provided with a general office vocabulary and support for major OS/2 and
Windows applications. Additional specialised vocabularies are available:
❍ US: Legal, Emergency Medicine, Radiology and Journalism

❍ UK: Legal

❍ IT: Radiology

● Requirements: See http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html


● Cost: See http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
● Misc: An IBM VoiceType Dictation FAQ is supported by UltraMedia Systems International (a
distributor of IBM VoiceType): http://www.infi.net/~ums/ibmfaq.htm
● Demo software: Available on the IBM WWW site:
http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
● Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
Email: talk2me@vnet.ibm.com
WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html

IBM VoiceType Control (US Only)


● Platform: OS/2 and Windows
● Description: VoiceType Control is a speech recognition navigator that lets you control
programs by speaking. VoiceType Control converts voice commands to keystroke macros. The
program provides speaker independent, continuous speech recognition, so you do not have to
train the program for your specific speech patterns.
● Requirements: ?
● Cost: ?
● Demo software: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
● Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
Email: talk2me@vnet.ibm.com
WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:33 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/ibm.voicetype.html [10/31/2003 8:48:21 AM]


IN CUBE

IN CUBE
● Platform: Three versions for Windows 95, Windows NT and Sun SPARCstations
● IN CUBE for Windows 95: Developed for general purpose Windows 95 users. It is packaged
for online distribution with a full working demo and an option to register and unlock the full
product. The system uses Command Corp's Mark II continuous speech recognition engine and
handles changable lexicons of up to 75 commands.
❍ Price: $49.95 US

❍ Requirements: 386/25MHz processor or better, Microsoft Windows 3.1 or later,

Windows compatible sound card or built-in audio, and microphone.


❍ Availability: http://www.commandcorp.com/cci/win95.html

Demo mode available.


● IN CUBE Mark II Pro for Windows NT: IN CUBE is a continuous realtime speech recognition
system developed to provide a fast and convenient means of window navigation and voice
macro command input for command intensive applications like CAD and publishing. Speaker-
dependent training and ability to add new commands and macros.
❍ Price: $495 including the PRO 8 microphone. $540 including the MT 858 desk

microphone.
❍ Requirements: Windows NT, Windows NT-compatible audio board (16-bit audio

recommended).
❍ Availability: http://www.commandcorp.com/cci/pront.html

Demo available.
● IN CUBE Voice Command for Sun SPARCstations: Provides continuous realtime speech
recognition system for window navigation and voice macro command input to the workstation.
Speaker-dependent training and ability to add new commands and macros.
An IN CUBE Application Programming Interface is available with a library of linkable object
modules is available for developers.
❍ Price: $495 per seat. The developer's API sells for $695.

❍ Requirements: SUN OS 4.1.x or Solaris 2.x with OpenWindows and Motif. Works with

all audio-equipped SPARCs and clones. Models range from SPARCStation 1s to


SPARCStation 20s.
❍ Availability: http://www.commandcorp.com/cci/in3sparc.html

A free 5 day evaluation license is available.


● Contact: Command Corp. Inc.,
3761 Venture Drive, PO Box 956099, Duluth, Georgia, 30136, USA
Ph: +1-770-813-8030
Email: in3@commandcorp.com
WWW: http://www.commandcorp.com/incube_welcome.html

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:26 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/incube.html [10/31/2003 8:48:22 AM]


Kurzweil Speech Recognition (2 products)

Kurzweil Voice for Windows


● Platform: Windows 3.1 or later
● Description: Kurzweil Voice for Windows is a dictation product enabling the user to create
text and enter data by speaking to Windows-based applications. System is adaptive but
requires no initial training. Users can choose either 30,000 or 60,000 word active vocabulary.
Application command translation templates for popular Windows application such as
WordPerfect, 1-2-3, Organizer, Word (30+ applications are listed on the Kuzweil WWW
pages). More detailed information is available on the Kurzweil WWW pages.
● Requirements: 486DX/33 or higher, 8 or 16 MB dedicated memory (depends on vocabulary,
30 MBs dedicated disk space, VGA or higher, Kurzweil-supplied microphone and DSP board.
● Contact:
Kurzweil Applied Intelligence, Inc.
411 Waverley Oaks Road, Waltham, MA 02154 USA
Phone: 1-800-380-1234
Email: info@kurzweil.com
WWW: http://www.kurzweil.com/

Kurzweil Clinical Reporter


● Platform: Windows 3.1 or later
● Description: Kurzweil Clinical Reporter is a voice-activated clinical reporting system for
computer-based patient records. The family of products includes:
❍ VoiceEM for emergency medicine

❍ VoiceEM/TR for triage reporting

❍ VoiceRAD for diagnostic imaging and radiology

❍ VoicePATH for surgical and anatomical pathology

❍ VoiceMED for Primary Care for family medicine, internal medicine and pediatrics

❍ VoiceORTHO for office-based orthopaedic surgery

❍ VoiceCATH for invasive cardiology

❍ VoiceReport for general reporting

● More information: from the Kurzweil WWW pages: http://www.kurzweil.com/medical/


● Contact:
Kurzweil Applied Intelligence, Inc.
411 Waverley Oaks Road, Waltham, MA 02154 USA
Phone: 1-800-380-1234
Email: info@kurzweil.com
WWW: http://www.kurzweil.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/kurzweil.html (1 of 2) [10/31/2003 8:48:23 AM]


Kurzweil Speech Recognition (2 products)

Administrivia, Copyright, Submit Information : Last Revision: 13:56 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/kurzweil.html (2 of 2) [10/31/2003 8:48:23 AM]


Lernout & Hauspie ASR SDK

Lernout & Hauspie ASR SDK


● Platform: Windows
● Description: Windows based Software Development Kits are available for integrating
automatic speech recognition technology with Windows based PC applications.
● Requirements: IBM-compatible 486 DX/33 MHz + 8 MB RAM + MS DOS 5.0 + MS
Windows 3.1 (or higher) + Sound Blaster compatible sound board.
● See also: L&H ASR Products
● More Information: on the Lernout & Hauspie WWW pages: http://www.lhs.com/asr.html
● Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:32 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/lernout.hauspie.sdk.html [10/31/2003 8:48:23 AM]


Listen for Windows 2.0 from Verbex Voice Systems

Listen for Windows 2.0 from Verbex Voice


Systems
● Platform: Windows
● Description: Listen for Windows Version 2.0 is a Speaker Independent software product that
provides continuous speech recognition for Windows applications. The product works with
most industry standard sound cards and PCs with inbedded audio chips. Listen for Windows
comes with over 16,000 commands in speech interfaces for over 40 software applications, such
as MS Office, Lotus SmartSuite,Quicken, etc. The Listen Command Editor allows a user to
change or add commands to existing speech interfaces or create new speech interfaces for most
Windows applications.
More detailed information is available on the Verbex Listen for Windows page.
Verbex also sells Verbal Advantage Voice Browser for controlling a web browser, Verbal
Advantage DeskTop for controlling desktop applications.
● Requirements: 486/25SX PC or higher
● Pricing and Availbility: See the Verbex ordering page for pricing. Verbex products are
available over the web or can be shipped. Microphones available from Verbex.
● Demo: A "Freeware" demo is available from the Verbex WWW site demo page.
● Contact: Verbex Voice Systems
1090 King Georges Post Rd., Bldg 107, Edison NJ 08837, USA
Ph: 1-800-ASK-VRBX, (908) 225-5225, Fax:(908) 225-7764
WWW: http://www.verbex.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:26 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/listen.html [10/31/2003 8:48:24 AM]


Microsoft Speech Recognition

Microsoft Speech Recognition


Microsoft Dictation Research Demonstration

● Platform: Windows 95 or Windows NT 4.0


● Description: A free demonstration of research technology that enables a computer to transcribe
what you speak into Windows applications such as email and word-processors. Features of the
demo software include:
❍ 60,000 word vocabulary with the ability to add new words

❍ High recognition accuracy

❍ Works with any Windows 5application

❍ "Dictation Pad" provides enhanced dictation features

❍ "IntelliSense" converts spoken numbers and times automatically

❍ Compatible with the Microsoft Speech API

● Requirements: Windows 95 or Windows NT 4.0, Pentium 90 or better (RISC builds are


available), 16 megabytes of RAM on Windows 95, Sound card with 16 kHz 16 bit input
signals, High quality close-talk microphone, Speakers.
● Availability: Free demo software is available at:
http://www.research.microsoft.com/research/srg/install.htm
● More information: http://www.research.microsoft.com/research/srg/

Microsoft Command and Control Engine

● Platform: Windows 95
● Description: Provides command and control speech recognition using SAPI (the Microsoft
Speech API) and "Whisper", Microsoft's speech recognition technology. Features include:
❍ Speaker independent, continuous, sub-word modeling, context free grammars

❍ Has its own letter-to-sound rules means it can recognize any words in a grammar.

❍ North American English

❍ PC microphone and telephone speech recognition with high performance

❍ Word spotting option

❍ Results objects containing top-N choices, segmentation, and confidence

❍ Written to SAPI, the Microsoft Speech API.

● Requirements: Windows 95 or Windows NT 4.0, Pentium 60 or better. (RISC builds are


available), 1.5 megabyte working set, 16 kHz or 8 kHz input signals, 6 megabytes on disk,
Requires Microsoft Speech SDK to use.
● Availability: Free demo software is available at:
http://www.research.microsoft.com/research/srg/install.htm
● More information: http://www.research.microsoft.com/research/srg/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/microsoft.html (1 of 2) [10/31/2003 8:48:25 AM]


Microsoft Speech Recognition

Administrivia, Copyright, Submit Information : Last Revision: 15:59 21-Apr-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/microsoft.html (2 of 2) [10/31/2003 8:48:25 AM]


NCC Dictate

NCC Dictate
● Platform: Windows
● Description: NCC Digital DictateTM is an add-on, enhanced interface for use with IBM's
VoiceType(TM) Dictation for Windows and various Windows 3.1 applications (e.g. MS Word,
WordPerfect). Digital DictateTM provides faster corrections and dictation rates and various
other features. This version is not a stand alone product; it requires VoiceTypeTM Dictation to
provide the speech recognition engine and the Windows application. Features include:
❍ Direct dictation into Windows applications with access to all functions while dictating.

❍ Versions for MS Word, WordPerfect, Ami Pro, and other Windows applications.

❍ Speech enabled editing.

❍ Capability to save speaker models and defer corrections.

❍ Microphone "pause and restore" functions controlled with speech commands.

❍ Add-on vocabularies for legal, medical, science and business.

❍ SWITCH-ITTM foot pedal control or CardSwitchTM infrared wireless control

available which switch between dictation and proofing/correction modes.


● Requirements: IBM's VoiceTypeTM Dictation for Windows; a computer system meeting
VoiceTypeTM Dictation for Windows requirements; VoiceTypeTM Dictation Adapter.
● Availability: Through computer dealerships.
● Price: $US295
● Contact: NCC Incorporated
5808 E. Turquoise, Scottsdale, AZ 85253
Ph: (602) 922-6236 Fax: (602) 596-9050

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:26 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/ncc.dictate.html [10/31/2003 8:48:25 AM]


Phonetic Engine 500 (PE500) from Speech Systems, Inc.

Phonetic Engine 500 (PE500) from Speech


Systems, Inc.
● Platform: Windows
● Description: Speaker independent, 40,000 word vocabulary, continuous speech recognition for
MS Windows. Grammars with high perplexity possible. Includes noise rejection. Uses
proprietary DSP board.
● Cost: Prices in US$ - quantity one. The PE500 SDK is $995.00 including board, microphone,
and runtime software. Runtime only is $595.00. SpeechWizard(r) adds speech input to existing
Windows applications, $295.00. Two-day training: $295.00 with purchase, $595.00 without.
● Misc: The user defines the grammar of allowed utterances and must write software to invoke
the board driver functions that control recognition. The user must also write software to
collect/parse/interpret the ASCII text strings returned when recognition succeeds.
● Misc 2: SSI now offers speech application development services.
● Contact:
Speech Systems, Inc.
2945 Center Green Court South
Boulder, CO 80301-2275, USA
Tel: 303.938.1110 Fax: 303.938.1874
http://www.speechsys.com

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:03 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/pe500.html [10/31/2003 8:48:26 AM]


Philips Speech Recognition (2 products)

Philips Speech Recognition (2 products)


SpeechMagic: Dictation

● Platform: Windows 3.1 and higher


● Description: A continuous speech recognizer providing a 64,000 word vocabulary, speaker
adaptation and multiple languages. SpeechMagic is currently available for English and
German.
SpeechMagic acts as a server application, processing speech input and providing text output.
Uses an add-on ISA compatible recognition accelerator board. SpeechMagic provided a
correction editor, editing and playback of recordings, and a vocabulary manager for entering
new words, abbreviations, macros and special transcriptions (e.g. for foreign words). Windows
DDE support and a native API are provided for integration.
● Hardware Requirements: IBM compatible personal computer (486DX/ 66 MHz or higher),
minimum 16 MB of RAM, hard disk capacity > 500 MB, and a Philips LFH 6210 Accelerator
Board.
● More Information: For more information visit the SpeechMagic WWW page or the Philips
Speech home page.

Speech Processing System 6000s (Europe only)

● Description: Dictation of medical findings using continuous speech recognition. Designed for
German speaking radiologists and encompasses the complete radiology vocabulary. The
authors use dictation stations (PCs) which are fitted with microphones. The transcriptionists
use editing stations (also PCs) which are additionally fitted with headphones and footswitches.
The SP6000s has a single speech recognition unit serving all users, and it offers automatic data
transfer as well as the advantages of digital dictation functions. For more information visit the
Philips SP6000s WWW page.
● More Information: For more information visit the Philips SP6000s WWW page or the Philips
Speech home page.

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:17 18-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/phillips.html [10/31/2003 8:48:27 AM]


ProNotes Voice Tools

ProNotes Voice Tools


● Platform: Windows
● Description: ProNotes Voice Tools are designed to bring the speech recognition capabilities of
the IBM VoiceTypeTM Dictation System for Windows into any program without the need for
the programmer to directly interface with the speech engine at the API level. There are five
tools, as described below, which are all available in three forms: Visual Basic(TM) Custom
Controls (known as VBXs), 16-bit OLE Custom Controls, and 32-bit OLE Custom Controls.
The tools are intended for use by Windows(TM) developers working with Windows 3.1(TM),
Windows for Workgroups 3.11(TM), Windows NT 3.51 Workstation(TM), and Windows
95(TM). The custom controls can be utilized with any application development environment
which supports the use of such controls (e.g. Visual Basic and Visual C++).
Playback and Record
An object which allows developers to use the IBM Speech Engine to record and play
back sound files. Can be used to add voice prompts and to allow end users to record and
playback sound files.
Voice Button
An object having standard button properties and behavior, which can additionally be
controlled by voice. The button can also be used as a label or a 3D panel.
Dictation Window
A text box that allows free dictation, voice macro utilization, and correction by voice.
Each Dictation Window has access to global and context sensitive vocabularies for both
command and dictation. There are three correction modes.
Voice List Box
Has standard list box properties and behavior, but can additionally be controlled by
voice. A user can select items by pronouncing the entry's text or the entries can be
numbered and selected accordingly.
Voice Navigator
Provides navigation by voice within an application developed with the Voice Tools,
between voice-enabled objects described above, as well as some standard objects found
within the application.
● Requirements: Hardware: 80486/33 DX or higher, 60MB hard disk space for IBM VoiceType
Dictation software, 10MB hard disk space for ProNotes Voice Tools, 3.5" floppy, VGA (or
compatible), 16MB RAM, IBM VoiceType Dictation adapter, microphone, and speakers.
Software: DOS version 6.0 or later, with SHARE.EXE running, Windows 3.1 or later, IBM
VoiceType Dictation software, any programming environment or system compatible with
Visual Basic or OLE Custom Controls.
● Price: Unknown
● Contact: Pronotes, Inc.
1546 Magee Avenue, Philadelphia, PA 19149, USA
Ph: 800-70-NOTES or +1-215-533-8569, Fax: +1-215-533-1276
Email: proinfo@pronotes.com
WWW: http://www.pronotes.com/

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/pronotes.html (1 of 2) [10/31/2003 8:48:27 AM]


ProNotes Voice Tools

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 21:15 31-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/pronotes.html (2 of 2) [10/31/2003 8:48:27 AM]


PureSpeech

PureSpeech 2.0 Recognition Engine


● Platform: Windows 3.1, Windows 95, Unix, Dialogic Antares DSP
● Description: Speaker-independent, continuous speech, large active vocabulary speech
recognition engine for American English, UK English, French, German and Spanish. Permits
on-the-fly additions to the vocabulary using phonetic models and telephone or wideband
microphone input. Flexible grammar, natural language processing, discourse models. Software
only with a small RAM/CPU footprint. Can be used as a voice user interfaces (VUI's) for PC
software applications. Can also be used for high-volume call center telephony, especially in
banks, finance and other specialized applications.
A toolkit for the Dialogic Antares is available.
● Availability: PureSpeech is not available as a stand-alone product. It is available embedded in
Windows-based software or as a toolkit.
● Contact: PureSpeech, Inc
100 Cambridge Park Drive, Cambridge, MA 02140, USA
Ph: (617) 441-0000 Fax: (617) 441-0001
Email: amy@speech.com
WWW: http://www.speech.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 02:32 16-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/purespeech.html [10/31/2003 8:48:28 AM]


smARTspeak from Advanced Recognition Technologies, Inc.

smARTspeak from Advanced Recognition


Technologies, Inc.
● Platform: Windows, Windows 95, DOS, and General Magic
It also works on the following Processors/Microcontollers: Intel's 80 x 86, Intel's 8031, 8051,
Motorola's 68000, and Hitachi's SH1, SH3, SH8.
● Description: smARTspeak is suited to voice command and control applications, such as voice
dialing in cellular and desktop telephones, or voice command operation in computers and
multimedia products. It uses a compact (10KB size on 16 bit machines), fast, user dependent
recognition engine.
smARTspeak can recognize any language in any accent.
ART recently completed a Software Developer Kit (SDK) for smARTspeak, running under
Windows 3.1 or higher which allows the voice recognition engine to be used within Windows
Applications.
More detailed information on smARTspeak and the SDK is available on the ART WWW
pages.
● Availability: Currently liscensed to other equipment manufacturers (OEMs), system
integraters, software, and application developers, and value added resellers (VARs) who port
are technology into their product.
● Contact: Advanced Recognition Technologies, Inc.
International Office:
43 Brodezky Street, POB 39918, 61398 Tel Aviv, lsrael
Ph: 972-3-642-7242, Fax: 972-3-642-5887
Email: 100274.3223@Compuserve.com
WWW: http://www.artcomp.com/
US Office:
9574 Topanga Canyon Blvd. Chatsworth, CA 91311, USA
Ph: 818-678-3999, Fax: 8181-678-3994
WWW: http://www.artcomp.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:42 24-Feb-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/smartspeak.html [10/31/2003 8:48:29 AM]


Visual Voice from Stylus Innovation

Visual Voice from Stylus Innovation


● Platform: Microsoft Windows
● Description: Visual Voice is a toolkit for building Windows-based voice processing and
telephony applications including interactive voice response (e.g. touch-tone banking), fax-on-
demand, and voice mail. Visual Voice can be used to add voice recognition to your telephony
applications.
Voice Recognition (VR) Support for Visual Voice is exposed as a standard VBX control and
provides one or more voice recognition "resources" to your application. Applications can
dynamically assign resources across several voice lines. Voice recognition is either "discrete"
or "continuous". Discrete recognition is slightly more accurate and requires the speaker to
pause briefly between words. Continuous recognition provides a natural way to enter
information by speaking without pauses. Three configurations are supported:
Software-Only Solution
The software only solution uses Telaccount's SpeechEasy technology for discrete
recognition using your PC's CPU. A vocabulary is included with digits, basic command
words and more.
Hardware-Assisted Solution with Dialogic AEB boards
Discrete voice recognition in over 25 languages using Dialogic D/41D voice boards and
the Dialogic VR/40 board. Vocabularies are included with digits, basic command
words, voice mail vocabulary and more.
Hardware-Assisted Solution with Dialogic PEB boards.
Use the VR control with any Dialogic PEB-based voice board, such as the D/12x or
D/24x, to access voice recognition resources from your phone lines. This requires a
Dialogic VRP board with either 1 to 4 VRM/40 modules (4 channel discrete voice
recognition modules) and/or 1 to 4 VRM/2C modules (2 channel continuous voice
recognition modules). You can have up to 4 modules on each VRP: 4 VRM/40s for 16
channels of discrete voice recognition; 4 VRM/2Cs for 8 channels of continuous
recognition; or a combination. Over 25 languages supported. Includes vocabularies as
described above.
● Pricing: Unknown
● Availability: From Stylus Innovations Inc. or from the distributors listed on the Stylus WWW
pages.
● Misc: More detailed technical information, slide show demonstration software is available on
the Stylus home page.
● Contact: Stylus Innovation Inc.
One Kendall Square, Building 300, Cambridge, MA 02139
Ph: (617) 621 9545, Fax: (617) 621 7862
WWW: http://www.stylus.com/
Compuserve forum: GO STYLUS
Email: info@stylus.com

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/stylus.html (1 of 2) [10/31/2003 8:48:30 AM]


Visual Voice from Stylus Innovation

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:05 16-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/stylus.html (2 of 2) [10/31/2003 8:48:30 AM]


VoiceAssist for Windows from Creative Labs, Inc.

VoiceAssist for Windows from Creative Labs, Inc.


● Platform: Windows
● Description: Seeking a description.
● Availability: VoiceAssist preview software is available from the Creative Labs VoiceAssist
home page.
● Contact: Creative Labs, Inc.
Ph: 1-800-998-1000 (Sales)
Ph: 1-800-998-5227 (Product info and dealer referrals)
CompuServe: support forum: GO BLASTER
WWW: http://www.creaf.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:36 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/voiceassist.html [10/31/2003 8:48:30 AM]


VoiceServer for Windows

VoiceServer for Windows


● Platform: Windows
● Description: Speaker dependent, each with an independent directory. Isolated words. Up to
1000 words/user, 300 words/window. 1 word occupies 2Kb on hard disk. Can be used to
control Windows applications by issuing voice commands instead of menu selection.
● Rough Cost: 292 Pounds(UK)
● Requirements: None
● Misc: Price includes a half-sized AT voice card (including a DSP), software, documentation &
a microphone (attachable to keyboard or speaker). A light-weight high-spec headset is an
optional extra.
● Contact:
Mark Redwood
Applied Voice Technologies
26 Danbury Street, Islington,
London, UK, N1 8JU
Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/voiceserver.html [10/31/2003 8:48:31 AM]


Whisper

Whisper
See the new page for Microsoft speech recognition software.

● Platform: Windows 95 and Windows NT 4.0


● Description: Command and control recognition.

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:04 21-Apr-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/whisper.html [10/31/2003 8:48:32 AM]


WildCard Speech Products

WildCard Speech Products


● Platform: Windows 3.1 and Windows 95
● OfficeTalk for Windows: provides voice commands for dictation, navigation, command and
control, and formatting for business uses of computers. Provides user voice access to a wide
variety of software applications in office suites from Microsoft, Novell/WordPerfect, and
Lotus. More information on the WildCard OfficeTalk page.
● LawTalk for Windows: adds features and interfaces that meet the specific needs of legal users.
More information on the WildCard LawTalk page.
● VoiceCompanion for the Internet: Surf the net using voice commands. Controls browsers like
Netscape and Microsoft Explorer. More information on the VoiceCompanion web page.
● VoiceCompanion - RemoteAccess: Over the telephone remote access to your desktop PC, for
voicemail, FAX forwarding and address book information. More information on the
VoiceCompanion web page.
● Availability: WildCard Technologies Inc.
180 West Beaver Creek Road, Richmond Hill, Ontario, Canada L4B 1B4
Phone: (905) 731-6444, Fax: (905) 731-7017
Email: sales@wildcardtech.com
WWW: http://www.wildcardtech.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:19 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/wildcard.html [10/31/2003 8:48:33 AM]


DATAVOX - French

DATAVOX - French
● Platform: PC / DOS
● Description: Continuous speech - speaker independent or dependent.
● Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an A/D - D/A module
(ASA116)
● Misc: Application software may dialog with DATAVOX through 2 types of interfaces :
❍ Keyboard overlay: The application software may be used with any PC compatible

package. No specific adaptation is necessary, you only need to define your


configuration with the application software.
❍ C library: Allows a user-written program to drive the recognition system.

DATAVOX is based on the AMADEUS speech recognition software developed at LIMSI. It


provides
❍ Continuous speech recognition with 500 words speaker dependent, 50 words speaker

independent (custom-made vocabulary).


❍ Grammar of the application language (syntax acquisition, verification and

simplification software).
❍ Large vocabulary : DATAVOX can recognize vocabularies of several thousand words

as long as there are no more than 500 words in the active vocabulary at any given node.
It takes less than 1 second to change syntax and vocabulary.
❍ Training controlled by the system (use of co-articulation models).

❍ Response time less than 500 ms for any phrase length.

❍ Synthetis (ADPCM) can be heard simultaneously while recognition is being carried out.

● Contact: VECSYS
Le Chene rond, 91570 Bievres, France
Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:38 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/datavox.html [10/31/2003 8:48:33 AM]


Jialong He's Speech Recognition Research Tool

Jialong He's Speech Recognition Research Tool


● Platform: SUN SPARC (SunOS), PC (MSDOS)
● Description: This is a speech recognition research tool. it contains a feature extraction program
and three speech recognizers: a DTW recognizer, discrete didden Markov model (DHMM)
based recognizer and Continuous density hidden Markov mode (CHMM) with Gaussian
mixture functions based recognizer. The utilities are grouped as:
❍ feature -- extract featue vectors from a speech signal (MFCC etc.)

❍ dtwcmp -- dynamic time-wapping (DTW) comparision.

❍ gensym -- turn vector sequences to discrete observation symbols.

dhmm -- discrete HMM training program.


dtest -- DHMM companion test program.
❍ chmm -- continuous density HMM training program.

viterbi -- CHMM companion test program.


Note, this is a research tool not a complete speech recognition system.
● Availability: By anonymous ftp:
MSDOS Version
UK: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spchtool.zip
Germany: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.zip
Sun SPARC version, compiled with GNU C
UK: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spch_sun_v1.tar.gz
Germany: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun_v1.tar.gz
● See also: Jialong He's Speaker Recognition (Identification) Tool
● Contact: Jialong He
email: jialong@neuro.informatik.uni-ulm.de

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:49 31-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/jialong.html [10/31/2003 8:48:34 AM]


Votan VPC2100 Voice Card and VSP 1010 Speech Processor

Votan VPC2100 Voice Card and VSP 1010 Speech


Processor
● Platform: DOS
● VPC2100 Voice Card: a hardware and software system based on the TMS320C10. providing
continuous speech recognition. The VPC2100 consists of a circuit board, microphone, speaker,
software, and documentation. It is designed to add voice I/O and telephone management
capabilities to the PC/AT and compatibles. Features:
❍ Voice store-and-forward at 4- to 16.4-Kb/s speed

❍ Speaker-independent speech recognition (0-9, YES, NO)

❍ Continuous speaker-dependent speech recognition

❍ Telephone interface, pulse or tone dialing, call progress, and DTMF

❍ Software for development, voice mail, telephone management, and VoiceKey

❍ High-level applications-generator software

● Votan VSP 1010 speech-processor board: can service a single voice channel, providing
recognition, voice output, and telephone interfacing. Digital signal processing is performed by
a TMS320 integrated circuit.
● Costs: Unknown
● WWW: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
● Contact: Votan Division, MOSCOM Corporation
6920 Koll Center Parkway, Suite 214, Pleasanton, CA 94566, USA
Ph: +1-510-426-5600, Fax: +1-510-426-6767

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/votan.html [10/31/2003 8:48:35 AM]


AbbotDemo

AbbotDemo
● Platform: SunOS4, IRIX, Linux, HU-UX
● Description: Large vocabulary, speaker independent, continuous automatic speech recognition
system. Uses recurrent neural networks and hidden Markov models with a 5,000 word
vocabulary upgradable) and a trigram word grammar. Includes a front end for waveform
capture and display (including spectrogram) and a graphical display of the phoneme
representation as well as a rewriting display of the best guess word sequence.
● Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster processor, 16 bit soundcard,
reasonable quality microphone and a copy of the Wall Street Journal newspaper.
● Price: Free for non-commercial use
● Availability: By anonymous ftp from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
● Note 1: This is not a complete system for dictation.
● Note 2: At present there are no sources with this distribution. For sources for an earlier version
see the recnet entry.
● Note 3: Not supported.
● Contact: AbbotDemo@compute.demon.co.uk
Tony Robinson
Cambridge University Engineering Department
Trumpington Street, Cambridge, CB2 1PZ, UK
Tel: +44-1223-332815 Fax: +44-1223-332662

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:25 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/abbotdemo.html [10/31/2003 8:48:35 AM]


BBN Hark Telephony Recognizer

BBN Hark Telephony Recognizer


● Platform: Available for Unix-based workstation and PC platforms including IBM
RS6000/AIX and Pentium/SCO Unix.
● Description: Large vocabulary (2,000+ words), speaker independent, continuous ASR
software. Specifically designed for large scale telephony applications. Using a client/server
architecture, all features and capabilities are integrated in one software product instead of on
separate boards. Very memory efficient, the Hark Telephony Recognizer runs in as little as
2MB of physical memory. Multiple recognizers can be run on a single platform. Uses Hidden
Markov Model and phoneme-based BBN recognition algorithms. An API is provided for
integration with existing applications. A developer's toolkit is available.
● Price and availability: Price varies depending on vocabulary size. Version 3.0 available
immediately.
● Misc: BBN Hark provides application design and human factors consulting services. Regular
monthly training classes on developing speech-enabled applications are held at BBN Hark's
Cambridge (Mass) headquarters.
● WWW: For additional information see BBN Hark's home page.
● Contact: BBN Hark Systems
70 Fawcett Street, Cambridge, MA 02138, USA
Tel: 617-873-4636 Fax: 617-873-2473
WWW: http://www.bbn.com/bbn_hark/HarkHome.html

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:23 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/bbn.hark.html [10/31/2003 8:48:36 AM]


EARS: Single Word Recognition Package

EARS: Single Word Recognition Package


● Platform: Linux and Unixs with the Voxware sound driver
● Description: Intended as a limited ready-to-use single word recognizer. However, its design
aims at being a platform for various kinds of methods used in speech recognition (SR). EARS
is designed to be a flexible environment for recognition system components; for example, take
this feature extractor and that recognizing method, and this list of words. New methods for
single word recognition can be integrated easily, as EARS uses C++ abstract base classes. You
speak the words you want to be recognized later. Your utterances can be saved to RIFF WAV
files for inspection, change or delete them before they are further processed to the pattern files
on which the recognizer is finally trained. As of version 0.20, the feature extractors are: Rasta-
PLP, PLP, LPC, Mel-Cepstrum. The implemented recognizers are: DTW and non-recurrent
neural nets on fixed-size sound patterns.
● Requirements: Soundcard with mic
● Misc 1: The current version is an Alpha release.
● Misc 2: For more information subscribe to the EARS mailing list. Send email to
majordomo@phil.uni-sb.de with "subscribe ears-list" in the body.
● Misc 3: Niels Thorwirth (thorwir@pi4.informatik.uni-mannheim.de) has made changes to
Version 0.14 which support the AF audio server software (see Q1.11) and the OGI Speech
Tools (see Q1.9) so that EARS is more portable to other UNIX platforms. Available by email
to Niels.
● Requirements: Soundcard with mic
● Availability: Source and Linux binaries are available by anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.tar.gz
ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
● Contact: Ralf W. Stephan: ralf@ark.franken.de

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:04 16-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/ears.html [10/31/2003 8:48:37 AM]


Hidden Markov Model Toolkit (HTK) from Cambridge University Engineering Department

HTK (Hidden Markov Model Toolkit)


● Platform: Range of Unix platforms.
● Description: HTK is a portable toolkit for building and manipulating hidden Markov models.
HTK is primarily used for speech recognition research although it has been used for numerous
other applications including research into speech synthesis, character recognition and DNA
sequencing. HTK consists of a set of library modules and tools available in C source form. The
tools provide sophisticated facilities for speech analysis, HMM training, testing and results
analysis. The software supports HMMs using both continuous density mixture Gaussians and
discrete distributions and can be used to build complex HMM systems. The HTK release
contains extensive documentation and examples. HTK was originally developed at the Speech
Vision and Robotics Group of the Cambridge University Engineering Department (CUED)
where it has been used to build CUED's large vocabulary speech recognition systems
● Misc 1: HTK is available free of charge and can be downloaded from a Web site
http://htk.eng.cam.ac.uk
● Cost: free.
● Contact:
Speech Vision and Robotics Group
Department of Engineering,
University of Cambridge,
Trumpington St.,
Cambridge CB2 1PZ.
United Kingdom.
email - htk-mgr@eng.cam.ac.uk
WWW: http://htk.eng.cam.ac.uk/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:26 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/htk.html [10/31/2003 8:48:38 AM]


NICO Artificial Neural Network Toolkit

NICO Artificial Neural Network Toolkit


● Platform: UNIX (ANSI C source code)
● Description: The NICO Toolkit is an artificial neural network toolkit specifically designed and
optimized for automatic speech recognition applications. Networks with both recurrent
connections and time-delay windows are easily constructed. The network topology is flexible --
any number of layers is allowed and layers can be arbitrarily connected. Tools for extracting
input-features from the speech signal are included as well as tools for computing target values
from standard phonetic label-files.
● Availability: Through the NICO homepage (http://www.speech.kth.se/NICO/index.html)
or the download page.
● Contact: Nikko Strom, nikko@speech.kth.se

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: REVISION-DATE

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/nico.html [10/31/2003 8:48:39 AM]


Nuance Speech Recognition System

Nuance Speech Recognition System


● Platform: UNIX-based workstations including Sun and SGI.
● Description: The Nuance Recognizer features client-server architecture with multiple
recognizers available on a single processing platform. Primarily developed for telephony-based
applications, the system accepts speaker-independent, continuous speech and supports very
large vocabularies. Included is a "template matching" natural language capability for
identifying the meaning of speech. A toolkit is available for use in developing a wide variety of
speech recognition applications.
● Price and availability: Contact Nuance
● Contact: Nuance Communications
1380 Willow Road, Menlo Park, CA 94025, USA
Ph: +1-650-847-0000, Fax: +1-650-847-7979
WWW: http://www.nuance.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 11:05 02-Oct-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/nuance.html [10/31/2003 8:48:40 AM]


recnet

recnet
● Platform: UNIX
● Description: Speech recognition for the speaker independent TIMIT and Resource
Management tasks. It uses recurrent networks to estimate phone probabilities and Markov
models to find the most probable sequence of phones or words. The system is a snapshot of
evolving research code. There is no documentation other than published research papers. The
components are:
❍ A preprocessor which implements many standard and many non- standard front end

processing techniques.
❍ A recurrent net recogniser and parameter files

❍ Two Markov model based recognisers, one for phone recognition and one for word

recognition
❍ A dynamic programming scoring package. The complete system performs

competatively.
● Cost: Free
● Requirements: TIMIT and Resource Management databases
● Contact: Tony Robinson: ajr@eng.cam.ac.uk
● Availability: by anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/recnet-1.3.tar.Z

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:27 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/recnet.html [10/31/2003 8:48:40 AM]


HM2007 - Speech Recognition Chip

HM2007 - Speech Recognition Chip


● Platform: Intergrated circuit.
● Description: HM2007 is a 48-pin single chip CMOS voice recognition LSI circuit with on-chip
analog front end, voice analysis, recognition process and system control functions. A 40 word
isolated-word voice recognition system can be composed of an external microphone, keyboard,
SRAM and a few other components. When combined with a microprocessor, an intelligent
recognition system can be built. A demo board for this chip is being distributed by The Summa
Group.
● Cost: Approx US$16 for the HM2007 and US$160 for the demo board.
● Misc: Jean-Pierre Lereboullet's document on Voice Recognition Processors provides additional
information on the HM2007.
● Producer: HUALON Microelectronic Corp. USA
Tel: (415) 288 0390 Fax: (415) 288-0399
● Distributor 1: Marywale Engineering Company
Tel: (602) 247 4451 Fax: (602) 247 6167
Email: meco@indirect.com
● Distributor 2: The Summa Group Limited
One California Street, Suite #1940,
San Francisco, CA 94111
Ph: (415) 288-0390
● Distributor 3: Images Company
39 Seneca Loop, Staten Island, NY 10314, USA
Ph: +1-718-698-8305, Fax: +1-718-982-6145
Sells single piece quanities of HM2007 48Pin Dip Chip and HM2007 52 Pin PLCC style chip.
Sells HM2007 Demo Kits unassembled $100.00 and assembled $135.00 (using 48 Pin dip
chip)

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:52 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/hm2007.html [10/31/2003 8:48:41 AM]


OKI VRP6679 - Speech Recognition Chip

OKI VRP6679 - Voice Recognition Processor


● Platform: Intergrated circuit.
● Description: Speech recognition IC. 25 words max. Speaker independent recognition
capability. Recognition rate quoted as 97% in a noisy environment (e.g. a car).
● Misc: Alias MSM6679
● Misc 2: More information is provided in Jean-Pierre Lereboullet's document on Voice
Recognition Processors.
● Cost: Approx US$20. Demo board $876
● Availability: OKI Semiconductor and OKI Distributors
Corporate Headquarters
785 North Mary Avenue, Sunnyvale, CA, 94086 2909
Tel: (408) 720 1900, Fax: (408) 720 1918

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:58 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/oki.html [10/31/2003 8:48:42 AM]


Sensory Inc. Integrated Circuits

Sensory Inc. Integrated Circuits


● Platform: Integrated circuits
● Description: Sensory's low cost high quality Interactive Speech line of speech recognition IC's
are designed for consumer telephony products, portable consumer electronics, and other
consumer applications. Technologies available include speech recognition (speaker-
independent and speaker-dependent), speaker verification, speech/music synthesis, digital
record/playback, and general product control on one chip. Development tools and
demonstration units are available. Detailed product information on the Interactive Speech chips
is available from the Sensory Circuits WWW site.
● Contact: Sensory, Inc.
521 E. Weddell Drive, Sunnyvale, CA 94089
Ph: +1-408-744-9000, Fax: +1-408-744-1299
Email: Sales@SensoryInc.com
WWW: http://www.sensoryinc.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:48 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/sensory.html [10/31/2003 8:48:43 AM]


Speech Commander - Verbex Voice Systems

Speech Commander - Verbex Voice Systems


● Platform: Various: external hardware with serial port connection
● Description: A hand-held (portable) device about the size of a paperback book which provides
speaker-dependent continuous speech recognition. The active vocabulary is dependent on the
model chosen and can vary from 300 to 10,000 active words. The device connects through a
serial port, so it can be connected to a wide range of computers. It comes with a battery pack.
● Contact: Verbex Voice Systems
1090 King Georges Post Rd., Bldg 107,
Edison NJ 08837, USA
Ph: (908) 225-5225, Fax: (908) 225-7764
Email: sales@listen.verbex.com
WWW: http://www.verbex.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:13 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/speech.commander.html [10/31/2003 8:48:43 AM]


Voice Control Systems Recognition

Voice Control Systems Continuous Speech


Recognition
● Description: Voice Control Systems (VCS) continuous speech recognition is a proprietary
phonetic recognizer based on technology developed at VCS over the last 17 years. It is robust
for applications such as the "hands-free" automotive environment or telephone networks, both
wireless and wireline. VCS speech recognition is used by many developers and manufacturers
in telecommunications. VCS technology is a software-based capability which VCS has
currently developed for a limited number of processing environments. VCS offers "off-the-
shelf" capabilities for the TI-C3X and C4X DSPs with other hardware platform support
planned for the future. As a benchmark, today's VCS continuous technology requires about 1/2
of a 33Mhz TMS320C31. VCS continuous technology is available in cellular and wireline
based libraries for continuous digit input in approximately 15 languages. VCS continuous
recognition is a modified HMM decision strategy built upon the foundation of VCS phonetic
"front end".
● Availability: VCS continuous technology is available today in software form from VCS or
implemented in hardware or speech systems from VCS distributors including Dialogic
Corporation, Brite Voice, Intervoice, Periphonics, and Syntellect.
● Cost: Software royalties are volume based and range from per unit costs of $500 per
recognizer to less than $5 in large quantities.
● See also: the VCS Phonetic Dictionary Recognizer and VCS Isolated Word Speech
Recognition below, and the VCS 2030 & 2060 Voice Dialers.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Voice Control Systems Phonetic Dictionary


Recognizer
● Description: This recognizer is based upon a HMM type recognition strategy coupled with the
VCS "front end" (feature extraction software). The HMM modeling is based upon the basic
phonetic building blocks in each language. In American English this is approximately 43 units.
The recognition vocabulary is built up by combining these units into word models. By building
the words in this way new recognition vocabularies may be constructed. The phonetic
assembly can also be used for "word spotting" recognition libraries.
● Platform: This VCS recognition software runs on the TI TMS320C30 DSP. Two recognizers
can operate on a single 55mhz C30. Currently the software may be purchased as an Enhanced
Technology from VCS to run on the Dialogic VR/160p speech recognizer board. The hardware
is purchased from Dialogic, with the "Enhanced" software purchased from VCS. Up to four
phonetic recognizers can run on a single 160; one per VRM2C (C30-33mhz DSP)

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/vcs.cts.rec.html (1 of 3) [10/31/2003 8:48:44 AM]


Voice Control Systems Recognition

daughtercard.
● Note: This recognizer is in its late "beta" stage of development and is available for U.S.
English vocabularies. Other languages are presently under development.
● Price: VCS software is priced at $350 per recognizer for unit quantities with volume discounts
available.
● See also: VCS Continuous Recognition above, VCS Isolated Word Speech Recognition below,
and the VCS 2030 & 2060 Voice Dialers.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Voice Control Systems Isolated Word Speech


Recognition
● Description: Voice Control Systems (VCS) isolated word recognition using VCS phonetic
recognizer technology. It is robust in demanding environments such as the "hands-free"
automotive environment, telephone networks, wireless or wireline. Capabilities include
speaker-independent, speaker-dependent and speaker-adaptive recognition. Libraries are
available for 45+ languages and custom vocabulary development services are available. The
technology is suited for many applications including:
❍ Desktop computing: such as keyboard accelerators orinteractive multimedia.

❍ Network telephony: such as automating operator functions or voice dialing.

❍ Computer telephony: such as remote access to a personal computers.

❍ Automotive accessory control: such as voice activated cellular phones or other

automotive accessories.
❍ Consumer electronics: such as voice controllers for video games or VCRs and

televisions.
● Platform: Include Intel-X86, TI-C5X, C3X, C4X and C2X, OKI 6679, and NEC-V20 and
V30, and can operate on 16 bit microcontrollers. As a benchmark, 8 recognizers can run on an
Intel 486-33 DX.
● Availability: The technology is available under software licenses direct from VCS or by
purchasing hardware from an OEM. VCS OEMs include: Dialogic, Oki Semiconductor,
Intervoice, Periphonics, etc.
● Cost: VCS isolated word recognition software is available under a volume pricing license
agreement. Small quantity royalties are in the $500.00 per recognizer range while large
(millions) quantity royalties are less than $1.00 per recognizer.
● See also: VCS Continuous Speech Recognition and VCS Phonetic Dictionary Recognizer
above, and the VCS 2030 & 2060 Voice Dialers.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/vcs.cts.rec.html (2 of 3) [10/31/2003 8:48:44 AM]


Voice Control Systems Recognition

WWW: http://www.voicecontrol.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:27 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/vcs.cts.rec.html (3 of 3) [10/31/2003 8:48:44 AM]


VCS 2030 & 2060 Voice Dialer

VCS 2060 Voice Dialer


VCS 2030 Voice Dialer
● Platform: Stand-alone hardware, TMS320C5X based with VCS phonetic speech recognition
and CELP speech compression.
● Description: The VCS 2060 is a telephone dialing system which recognizes 50 names - and
speed dials the associated telephone number. The VCS 2030 has 20 memories. Users use
speaker-independent recognition to select the "call", "program", or "list" menu, then place a
call, enroll a new memory, or listen to playback of entries in the phonebook. Enrollment is
simple and includes a "name tag" enrollment pass so that when one selects an entry to call, the
selection is confirmed by repeating the memory's associated name tag, e.g. "calling Pete". The
system uses both speaker-independent and speaker-dependent technology from Voice Control
Systems, Inc.
● Installation: The VCS 2060 can be installed in series (RJ-11) with one phone for single phone
operation or installed in parallel (RJ-31) to provide voice dialing from every phone in a house.
● Cost: Standard retail prices:
❍ VCS 2030 Voice Dialer - $269.00

❍ VCS 2060 Voice Dialer - $299.00

● Availability: From catalogs or direct from Voice Control Systems.


Voice Control Systems
14140 Midway Rd., Dallas, Tx. 75225, USA
Ph: 800-VCS-7525, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/voice-dialer.html [10/31/2003 8:48:45 AM]


Simon Says (NeXT)

Simon Says (NeXT)


● Platform: NeXT
● Description: Provides the ability to link commands to spoken phrases.
● Availability:By anonymous ftp.
Simon Says demo
ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio/audio-
apps/SimonSaysDemo.1.5.1.N.b.tar.gz
Readme file
ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio/audio-
apps/SimonSaysDemo.1.5.1.README
● Contact: Metrosoft
710 13th Street, Suite 310 X, San Diego, California 92101
Ph: 619.488.9411 Fax: 619.488.3045
Email: info@metrosoft.com [NeXTmail welcome]

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:13 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/simon.says.html [10/31/2003 8:48:46 AM]


Voice Command Line Interface (Amiga)

Voice Command Line Interface


● Platform: Amiga
● Description: VCLI will execute CLI commands, ARexx commands, or ARexx scripts by voice
command through your audio digitizer. VCLI allows you to launch multiple applications or
control any program with an ARexx capability entirely by spoken voice command. VCLI is
fully multitasking and will run in the background, continuously listening for your voice
commands even while other programs are running. Documentation is provided in AmigaGuide
format. VCLI 6.0 runs under either Amiga DOS 2.0 or 3.0.
● Requirements: Supports the DSS8, PerfectSound 3, Sound Master, Sound Magic, and Generic
audio digitizers.
● Availability: by ftp from wuarchive.wustl.edu in the file
systems/amiga/incoming/audio/VCLI60.lha and from amiga.physik.unizh.ch as the file
pub/aminet/util/misc/VCLI60.lha
● Contact: Author's email is RHorne@cup.portal.com

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:13 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/vcli.html [10/31/2003 8:48:47 AM]


Visus SpeechKit

Visus SpeechKit
● Platform: NeXT
● Description: SpeechKit is based on SPHINX, a speaker-independent, 1000 word or so,
continuous speech recognition system which allows you to incorporate speech recognition into
your applications. You can design your vocabulary and grammars.
● Contact: Visus - no address or phone provided. A possible contact is Robert Brennan at
Carnegie Mellon University. email: Robert_Brennan@cmu.edu

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:27 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/visus.html [10/31/2003 8:48:47 AM]


Berkeley Restaurant Project (BeRP)

Berkeley Restaurant Project (BeRP)


● Description: BeRP is a test bed for a speech recognition system being developed by the
International Computer Science Institute in Berkeley, CA. BeRP is a medium-vocabulary,
speaker-independent spontaneous continuous speech understanding system. BeRP functions as
a knowledge consultant whose domain is the restaurants in the city of Berkeley. The system
serves as a testbed for several research projects, including robust feature extraction,
connectionist phonetic likelihood estimation, automatic induction of multiple pronunciation
lexicons, foreign accent detection and modeling, advanced language models, and lip-reading.
● Note: As far as I know the BeRP software is in-house software - that is, it is not made available
for distribution.
● More information: http://www.icsi.berkeley.edu/real/berp.html

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:26 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/berp.html [10/31/2003 8:48:48 AM]


Lernout & Hauspie ASR (3 products)

Lernout & Hauspie ASR 1000/T and 1000/M


[Note: L&H asr200/A is described below.]

● L&H asr1000/T: ASR for the Telephony and Telecommunications Market


● L&H asr1000/M: TTS for the Computer and Multimedia Market

● Description: Automatic speech recognition software providing continuous speech recognition,


isolated word recognition, keyword spotting or continuous digits recognition. The engine is
speaker independent, and phoneme-based with optimization for commonly used words.
General features include:
❍ Languages available: US English, German, French, Spanish (Castilian), Dutch.

❍ Available vocabulary: >100,000 words.

❍ Line adaptation.

❍ Rejection of out of vocabulary/grammar words.

❍ N-best alternatives for isolated word recognition and keyword spotting.

❍ Push to talk.

● asr1000/T
❍ Single channel platform examples: Motorola 56156, TI TMS320C2X/C3X/C5X

❍ Multi-channel platform examples: TI TMS320C3X/C5X, AT&T DSP32C/3210,

Motorola 96000
❍ Input: 8 kHz telephone sampling

● asr1000/M
❍ Single processor platform examples: Intel 486/Pentium

❍ Input: 8 kHz telephone or 11 kHz microphone sampling

● See also: L&H ASR SDK for Windows


● More Information: on the Lernout & Hauspie WWW pages: http://www.lhs.com/asr.html
● Cost: Unknown
● Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 238 0960
Fax: (617) 238 0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Lernout & Hauspie ASR 200/A for the Automotive


and Industrial Market
● Description: Automatic speech recognition software providing isolated word recognition,
keyword spotting and alphabet recognition (optional). This engine is robust, speaker
independent and word based. Other features:
❍ Vocabulary: 100 words US English

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/lernout.hauspie.rec.html (1 of 2) [10/31/2003 8:48:49 AM]


Lernout & Hauspie ASR (3 products)

❍Voice activation detection


❍ Response time <250msec

❍ Platform examples: Analog Devices ADSP2101/5

❍ Input: 8 kHz telephone or microphone sampling

● See also: L&H ASR SDK for Windows


● More Information: on the Lernout & Hauspie WWW pages: http://www.lhs.com/asr.html
● Cost: Unknown
● Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 12:32 13-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/lernout.hauspie.rec.html (2 of 2) [10/31/2003 8:48:49 AM]


Voice-Trek 2.0

Voice-Trek 2.0
● Platform: Unknown.
● Description: VoiceTrek is primarily used by the United States Postal Service to sort mail.
Tardis Technology Inc. was created to develop and market applications that utilize speech
recognition. They do consulting work as well as turnkey systems.
● Contact: Tardis Technology Inc., Voice Recognition Div.
6444 E. Spring St., #286, Long Beach, CA 90815-1500, USA
Phone: +1-310-497-0077, Fax: +1-310-497-0080

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/voice.trek.html [10/31/2003 8:48:50 AM]


Voicetek Corp.

Voicetek Corp.
● Platform: Unknown.
● Description:Voicetek Corporation provides voice processing solutions, training and consulting
services and an object-oriented, graphical Generations Platform for development of integrated
computer telephony systems.
● Contact: Voicetek Corporation
19 Alpha Road, Chelmsford, MA 01824, USA
Ph: +1-508-250-9393, Fax: +1-508-250-9378
WWW: http://www.voicetek.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/voicetek.html [10/31/2003 8:48:50 AM]


Voice Processing Corporation Speech Recognition Product Line

Voice Processing Corporation Speech


Recognition Product Line
● Platform: Unknown.
● Description: Voice Processing Corporation (VPC) supplies automated speech recognition
systems. VPC's products are used in the telecommunications, cellular and personal computer
markets to enable computers to understand human speech. The company's VPro product line is
sold to original equipment manufacturers (OEMs), value added resellers (VARs), system
integrators and application developers. VPC's speech recognition systems are currently used in
applications such as voice mail, voice activated dialing, interactive voice response, and
command and control of personal computers.

The following are descriptions of the Voice Processing Corporation's VPro Product Line:
VProContinuous, VPro/XD, VPro/RT, VProCel, VProSpeller, VProPRL, VPro hardware
platforms, and the application Osprey.

More information is available on these products at the VPC WWW site: http://www.vpro.com/
● VProContinuous(TM) is a speaker-independent, continuous digit recognizer. It recognizes digit
strings spoken in a continuous manner, by any caller, without unnatural beeps or pauses.
VProContinuous uses out-of-vocabulary rejection and word spotting technologies to reject
extraneous words and phrases often spoken by callers. The VProContinuous vocabulary
consists of the words "zero" through "nine," "yes," "no," and "oh." The product is language-
independent. American English, Australian English, Brazilian Portuguese, Canadian French,
Castilian Spanish, French, German, Italian, Mexican Spanish, Portuguese, Swiss German and
U.K. English versions are available.
● VPro/XD(TM) is a discrete or multiword speech recognizer for extra-demanding applications
and/or vocabularies. This robust discrete product recognizes isolated discrete utterances (words
or very short phrases). VPro/XD utilizes proprietary out-of-vocabulary rejection and word-
spotting technologies. VPro/XD is speaker-independent and includes Talkover capability
allowing speech-interrupt over prompts. Pre-trained vocabulary libraries are available in
American English, Australian English, Brazilian Portuguese, Canadian French, Castilian
Spanish, Central American Spanish, German, Italian, Mandarin Chinese, Mexican Spanish,
Portuguese, Swiss German and UK English. Pre-trained vocabularies consisting of voice mail
words, voice dialing words, call control words, banking, and emergency words are available in
American English (both cellular and land-line).
● VPro/RT(TM) is a discrete speech recognizer for rapid training of vocabularies in the field.
This robust discrete product recognizes isolated discrete utterances. Application designers and
end-users define the vocabulary of their choice and train the system in real-time either prior to
system start-up, or adapting on-the-fly while the system is running live. Vocabularies can be
subset, and applications involving thousands of words can be developed quickly. VPro/RT,
which also supports Talkover, is suited to speaker-dependent recognition tasks, such as the
personal directory of names in a voice-activated dailing application. VPro/RT is also good for
applications that require speaker-independent vocabularies to be developed quickly in the field
or those that require many vocabularies. VPro/RT can also be used as a tool for quick

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/vpc.rec.html (1 of 2) [10/31/2003 8:48:51 AM]


Voice Processing Corporation Speech Recognition Product Line

prototyping of applications.
● VProCel consists of speaker-independent VProContinuous, VPro/XD and speaker-dependent
VPro/RT specifically tuned for the cellular environment. The speaker-dependent discrete
feature of VProCel allows for a user-defined 20-word personal directory, with a one-pass
enrollment whereby users need only speak their chosen commands once. In addition, cellular-
ready VPro/XD vocabularies consisting of voice-activated dialing command words are also
available. VProCel is suited to voice-activated dialing applications using either digit strings or
a listing of words in a personal directory.
● VProSpeller is a recognizer that can determine which name or word is being spelled by a
caller. Users may spell a string of letters (up to 32 letters) in an uninterrupted manner (without
prompts or beeps between each letter). VProSpeller can recognize confusable letters by
conducting an automated search of a database of words maintained by the application for the
best candidates to match.
● VProPRL Designed for customers who wish to enable VPC speech recognition technologies on
platforms other than those supported by VPro hardware, the VProPRL is a portable recognizer
library of VProContinuous, VPro/XD and VPro/RT, which can be embedded into a wide
variety of hardware platforms. It consists of a library of object modules which can be linked
with a user application or task.
● VPro Hardware Platforms: VPro-42, VPro-84, VPro-88 : The VPro platforms are ISA
compliant PC/AT boards. Each supports four to eight Virtual Speech Processors (VSPs). Each
VSP, depending on load factors, can handle multiple telephone lines. Application and host
computers communicate with each of the VSPs as separate autonomous units. VPro platforms
use Texas Instruments TMS320C31 microprocessors which provide up to 133 MFLOPS of
compute power. The platforms can have up to 8 megabytes of memory shared among all
processors. In addition, each processor has 512K bytes of local memory. Both the PEB and
MVIP PCM audio buses are supported by all VPro platforms.
● Osprey is a call management software application that performs the kinds of telephone related
activities typically done by a personal assistant, such as answering the phone, screening callers,
routing calls, and taking and delivering messages. It is an automated phone attendant.
● Price and availability: Contact Voice Processing Corporation
● Contact: Kelli V. Smith
Voice Processing Corporation
1 Main Street, Cambridge, MA, 02142 USA
Ph: (617)494-0100 Fax: (617)494-4970
e-mail: KSmith@vpro.com
WWW: http://www.vpro.com/

Back to Q6.5 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 26-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/vpc.rec.html (2 of 2) [10/31/2003 8:48:51 AM]


ImagineNation: Voice Activated UnLock Technology

Voice Activated UnLock Technology (VAULT):


ImagineNation
● Description: Password-based voice verification technology using a card to store voice-print
data. Introductory information and the VAULT FAQ are provided on the ImagineNation
WWW pages.
● Contact: Imagine
PO Box 212, Swansea, MA 02777, USA
Ph: +1-508-678-9563
Fax: 508-678-1470
Email: feedback@ImagineNation.com
WWW: http://www.ImagineNation.com/

Back to Q6.6 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:47 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Verification/imaginenation.html [10/31/2003 8:48:52 AM]


Jialong He's Speaker Recognition (Identification) Tool

Jialong He's Speaker Recognition (Identification)


Tool
● Platform: SUN SPARC (SunOS), PC (MSDOS)
● Description: This package contains a set of speaker recognition research utilities, including
Gaussian mixture models, VQ codebook designing program and MLP network. They can also
be used as general classifiers. The utilities are divided into the following categories:
❍ Feature extraction and dimensional reduction

cepstrum -- extract features from speech sigals (LPCC, MFCC, etc.).


search -- select effective features (SFS, SBS method).
randline -- randomize the a sequence, auxiliary utility.
bin2asc -- binary to ASCII, auxiliary utility.
❍ MLP network

mlptrain -- MLP network training program.


mlptest -- MLP network test program.
❍ VQ codebook training and test programs

lbglvq -- VQ codebook training program.


nearest -- VQ codebook test program.
❍ Gaussian mixture model (GMM)

gmmtrain -- GMM training program.


gmmtest -- GMM test program.
Note: this is a research tool not a true speaker recognition system.
● Availability: By anonymous ftp:
MSDOS Version
UK: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkrtool.zip
Germany: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.zip
Sun SPARC version, compiled with GNU C
UK: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkr_sun_v1.tar.gz
Germany: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_sun_v1.tar.gz
● See also: Jialong He's Speech Recognition Research Tool
● Contact: Jialong He
email: jialong@neuro.informatik.uni-ulm.de

Back to Q6.6 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:48 31-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Verification/jialong.html [10/31/2003 8:48:53 AM]


Keyware Biometric Security Products

Keyware Biometric Security Products


● Description: VoiceGuardian and S2 Security Server provide authentication and access control
technologies. An online demo of Voice Guardian is available.
● Contact: Keyware Technologies
USA
Keyware Technologies
500 West Cummings Park, Suite 3600, Woburn, MA 01801, USA
Ph: (617) 933 1311, Fax: (617) 933 1554
Belgium
Keyware Technologies
Excelsiorlaan 28-30, 1930 Zaventem, Belgium
Ph: 32 2 721 4574, Fax: 32 2 721 5015
Email: sales@keywareusa.com
WWW: http://www.keywareusa.com/

Back to Q6.6 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:09 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section6/Verification/keyware.html [10/31/2003 8:48:54 AM]


SpeakerKey Voice Verifier from ITT

SpeakerKey Voice Verifier from ITT


● Platform: Windows/Pentium and Solaris/SPARC
● Description: SpeakerKey provides over-the-phone voice verification. It is configurable for use
in a wide range of applications.
SpeakerKey provides a Speaker Verification API (SVAPI).
SpeakerKey uses two technologies: (1) speaker-independent digit recognition using hidden
Markov models, (2) speaker verification using "Nearest Neighbour Matching with Likelihood
Ratio Scoring and cohort speakers."
Dr. Joe Campbell maintains a SpeakerKey FAQ on the WWW. It provides a more detailed
description of SpeakerKey and discusses several speaker verification issues:
http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.html
● Requirements: Minimum 60 MHz Pentium (with sound card) or SPARCstation 5, plus phone
line interface devices.
● Price: Evaluation kits available from $75. Developer's kits are $1500. Run-time licenses are
priced from $600 to $10,000 depending upon the number of user and/or verifications per hour.
Application customization is available.
● Contact: ITT Industries
Fort Wayne, IN, USA
Ph: +1-219-487-6321, Fax: +1-219-487-6126
Email: speakerkey@itt.com

Back to Q6.6 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:48 31-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Verification/speakerkey.html [10/31/2003 8:48:54 AM]


SpeakEZ Voice Print Speaker Verification

SpeakEZ Voice Print Speaker Verification


● Description: Designed to prevent cell phone theft and cloning fraud by comparing the cellular
caller's statement of a pass-phrase to a stored digital "voice print" of the authorized subscriber.
If the caller's voice patterns do not match the stored voice print, service will be denied or the
caller will be referred to operator assistance for further validation processing. Features include:
❍ Customer selected password.

❍ Vocabulary and language independent.

❍ No special hardware required by customer.

❍ Multiple delivery options.

● Contact: T-NETIX, Inc.


6675 South Kenton Street Englewood, CO 80111 USA
Phone: (800) 352-8628, (303) 790-9111, Fax: (303) 790-9540
WWW: http://www.t-netix.com/

Back to Q6.6 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 15:51 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Verification/tnetix.html [10/31/2003 8:48:55 AM]


Voice Control Systems: Speaker Verification Technology

Voice Control Systems: Speaker Verification


Technology
● Description: SpeechPrint ID™ technology provides language independent speaker
verification. Features:
❍ Multiple speech input formats

❍ Operates over various microphones or the telephone network

❍ Can can be used in conjunction with discrete and continuous recognition

❍ Robust against background noise and spurious telephone channel noise

For more information on features, hardware and software requirements, pricing and
availability, contact Voice Control Systems, Inc. or visit their the VCS WWW site or the
SpeechPrint ID WWW page.
● See also: VCS speech recognition products in Q6.5.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/

Back to Q6.6 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:48 31-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Verification/vcs.html [10/31/2003 8:48:56 AM]


SpeechWorks™from Applied Language Technologies, Inc.

SpeechWorks™from Applied Language


Technologies, Inc.
● Description: SpeechWorks and companion products provide advanced speech recognition
technology for the telephony market. SpeechWorks can be used by developers to "speech-
enable" call center, messaging, enhanced services, and other types of applications. The three
major system modules - SpeechWorks, DialogModules and SpeechBuilder - are described
below. More detailed information is available from the Applied Language Technologies home
page.
ALTech develops and markets speech understanding software which provides large
vocabulary, speaker-independent, phonetic speech recognition. ALTech's software contains a
comprehensive set of features for speech-enabling telephone-based transactions and services.
SpeechWorks is based on technology licensed from the Spoken Language Systems Group at
the Massachusetts Institute of Technology.
● SpeechWorks: provides the core speech recognition capabilities. Features include:
❍ Phonetic segment-based, speaker-independent, large vocabulary, continuous speech

recognition
❍ Real-time vocabulary generation directly from text

❍ Database integration

❍ "Barge-in" capability

❍ Adaptive channel normalization

❍ "n-best" output and associated confidence scores

❍ Support for multiple languages

❍ Software-only or DSP-based implementations

❍ Support for multiple platforms and operating systems (e.g., SCO UNIX, WindowsNT,

etc.)
● DialogModules: manage the "conversation" between the system and the caller within an
application. They provide high-level application building blocks which enable developers to
quickly and easily add speech interfaces to computer telephony applications. Each
DialogModule accomplishes a particular task within an application, ranging from "simple"
tasks such as capturing a yes/no response or a phone number, to more complex tasks such as
capturing credit card information or name and address information.
DialogModules provide "out-of-the-box" functionality. They contain pre-built grammars, user-
interface design, internal call flow and error recovery routines, parameters for customization
and a set of C++ class libraries and C APIs.
● SpeechBuilder: provides tools for customizing the DialogModules and for developing and
maintaining applications. A GUI-based Vocabulary Editor provides the ability to generate and
maintain vocabulary or word lists. Pronunciations can be generated automatically using the
built-in dictionary or can be automatically generated using a set of text-to-phoneme rules.
● Product Bundles: are available which combine SpeechWorks and multiple DialogModules into
application templates for a set of generic application categories.
❍ SpeechForms SpeechForms provides an interactive method for entering data over the

phone, such as ordering products, filling out surveys and completing registration forms.
Typical applications include: order entry, reservations, catalog and literature requests,

http://mi.eng.cam.ac.uk/comp.speech/Section6/Integration/altech.html (1 of 2) [10/31/2003 8:48:57 AM]


SpeechWorks™from Applied Language Technologies, Inc.

catalog shopping, subscriptions, change of service, claims, credit card activation, home
banking, stock transactions, and warranty reservations.
❍ SpeechQuery SpeechQuery is used to deliver information in response to voice requests

over the phone, such as airline information, product delivery status and retirement
benefit information. Typical applications include: order status, product information,
account balance, flight status, movie listings, job listings, stock quotes, guide
services,classified ads, claims status, dealer locator services, and technical support.
❍ SpeechAgent SpeechAgent provides a set of modules for automating telephone-based

voice messaging applications, such as integrated messaging, single-number services and


voice-dialing. Typical applications include: voice messaging, voice dialing, auto
attendant, address book access, email access, and scheduling.
● Platform: Platforms and Operating systems: ALTech's software can be deployed on industry-
standard hardware platforms and operating systems including: Sun SPARC-based systems
running SunOS or Solaris, IBM RS/6000s running AIX, HP systems running HP-UX, and
486/Pentium-based PCs and servers running Windows, WindowsNT, SCO UNIX, or Solaris.
ALTech's systems are designed to run all or some of the software on a digital signal processor.
● Availability: contact ALTech for licensing information.
● Contact: Applied Language Technologies, Inc.
215 First Street, Cambridge, MA 02142
Ph: 617-225-0012, Fax: 617-225-0322
Email: to Alisa Moyer: moyer@altech.com
WWW: http://www.altech.com/

Back to Q6.7 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:16 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Integration/altech.html (2 of 2) [10/31/2003 8:48:57 AM]


Nortel Speech Technology Products

Nortel Speech Technology Products


● Nortel's AudioGram Delivery Service (ADS):
When a busy or no answer condition is encountered, an intercept message offers ADS, which
provides a service to the calling party by taking a message automatically. ADS records the
caller's message and attempts delivery repeatedly if needed until the message is delivered.
ADS is comprised of four independent services: 0+, 1+ and Local, Intentional, and Millenium
AudioGram. ADS services utilize Nortel's Flexible Voice Recognition (FVR) voice-processing
capabilities. ADS features include:
❍ Cost-saving common service platform (NAV)

❍ Builds upon existing network investment in toll infrastructure capabilities of AABS

(Automated Alternate Billing Service)


❍ Leverages the capabilities of existing TOPS (Traffic Operator Position System)

attendants.
More information: is available on the Nortel Multimedia Network Applications WWW page
for AudioGram Delivery Service.
● Nortel's Voice-Activated Auto Attendant (VAAA):
Replaces touch tone menu with easy-to-use voice interface. Geared to businesses and
corporations to provide more effective management of incoming customer calls. Residing on
the Network Applications Vehicle (NAV) platform, VAAA uses Flexible Vocabulary
Recognition (speaker-independent) technology to recognize spoken words, and directs calls
accordingly. Other features include:
❍ Cost-saving common service platform (NAV)

❍ Serves DTMF and rotary dial callers.

❍ Handles incoming calls for all corporate users (Centrex, PBX, or key systems)

More information: is available on the Nortel Multimedia Network Applications WWW page
for Voice-Activated Auto Attendant.
● Nortel's Voice-Activated Dialing (VAD):
Phoneme-based speech dialing capabilities provided through speaker-trained and speaker-
independent technologies. Residing on the Network Applications Vehicle (NAV) platform,
VAD enables subscribers to dial using speech, as well as to create and customize personal
telephone directories. Other features include:
❍ Cost-saving common service platform (NAV)

❍ Speech playback and Text-to-speech synthesis

❍ Dual Language capability (optional)

❍ Speech Recording

❍ Canadian French speechware (optional, prompts and FVR)

❍ Spanish speechware (optional, prompts and FVR)

❍ 75-name VAD directory size

❍ Word-spotting

❍ DTMF tone detection

❍ Directory sharing

❍ Scalable service deployment

❍ Talk-through

http://mi.eng.cam.ac.uk/comp.speech/Section6/Integration/nortel.html (1 of 2) [10/31/2003 8:48:57 AM]


Nortel Speech Technology Products

More information: is available on the Nortel Multimedia Network Applications WWW page
for Voice-Activated Dialing.
● Nortel's Voice-Activated Premier Dialing (VAPD):
Enables businesses to take advantage of the public network directories to stimulate customer
calls. Residing on the Network Applications Vehicle (NAV) platform, VAPD uses Flexible
Vocabulary Recognition (speaker-independent) technology to recognize business names, and
routes calls to the appropriate business entity. VAPD promotes cost savings by utilizing a
common service platform, the Network Applications Vehicle (NAV). It services DTMF callers
as well as rotary dialers, and handles incoming calls for all corporate users: Centrex, PBX, and
key systems. More information: is available on the Nortel Multimedia Network Applications
WWW page for Voice-Activated Premier Dialing.
● Platform: This speech-based service operates on the Network Applications Vehicle (NAV)
platform. NAV is a multi-application, digital signal processing platform supporting both
speech- and display-based applications. The NAV platform provides the speech recognition
capabilities and application logic used by NAV features an open, modular hardware
architecture and flexible software design. Other features include:
❍ Scalable hardware - from 24 to over 2000 ports per NAV node; 1 to 24 independent

application shelves per node


❍ Powerful speech processing - speaker-independent and speaker-trained speech

processing support
❍ Reliability - N+1, N+M, and 2N redundancy

❍ Central Management - access via graphical user interface to remote connections

● See Also: Nortel Feature Planning Guide, reference number 50004.11; NAV Applications and
Planning Guide, reference number 50118.16.
Nortel's Multimedia web pages: http://www.nortel.com/entprods/multimedia/
● Contact: NORTEL
Multimedia Communications Systems Division
Multimedia Network Applications
1000 Park Forty Plaza
Durham, NC 27713 USA
Ph: 1-800-4NORTEL
WWW: http://www.nortel.com/entprods/multimedia/

Back to Q6.7 of Section 6 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:41 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Integration/nortel.html (2 of 2) [10/31/2003 8:48:57 AM]


Q1.2: comp.speech ftp site

Q1.2: comp.speech ftp site


Tony Robinson maintains the comp.speech ftp site. The ftp site is a comprehensive repository of
software and information related to speech technology. The site is

● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/

Comp.speech Archives

The comp.speech ftp site provides full archives of the comp.speech newsgroup dating back to the
creation of the group in 1991. The postings are stored in the order in which they arrive. Batches of
1000 articles are grouped into gzip'ed tar file. Matching files listing the subjects are also provided.

● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/

Software and Other Resources

The comp.speech ftp site includes a wide range of useful software and resources. Tony has arranged it
into a series of sub-directories:

/analysis : Speech analysis software


FFT code, a pitch tracker, RASTA code, and IEEE DSP code.

/auditory : Auditory model software


AIM, Auditory Toolbox and Lutear.

/coding : Speech coding software


ADPCM, CELP 3.2a, G711, G721, G723, GSM, LDCELP, LPC10, Shorten.

/data : Repository for (small) speech-related databases


BEEP, CMUDict, Homophone list, hVd database, Peterson Barney database

/dictionaries : Phonetic dictionaries


BEEP, CMUDict, CUVOALD, Homophone list, MRC database

/info : Key postings to comp.speech archives by subject


Lots of interesting info!

/recognition : Speech recognition software


http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.2.html (1 of 2) [10/31/2003 8:50:41 AM]
Q1.2: comp.speech ftp site

AbbotDemo, Ears, Lotec, recnet, sound blaster recognition, whistle

/simtel_sound : Mirror of the simtel/msdos/sound directory


Range of useful software

/simtel_voice : Mirror of the simtel/msdos/voice directory


Another range of useful software

/synthesis : Speech synthesis software


Klatt synthesis software, Klatt parameter editor and rsynth.

/tools : Miscelaneous tools


Part-of-speech tagger, OGI speech tools, sox audio file format conversion, SPHERE software
and more.

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 01:52 12-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.2.html (2 of 2) [10/31/2003 8:50:41 AM]


Q1.3: Common abbreviations and jargon

Q1.3: Common abbreviations and


jargon.
● ANN - Artificial Neural Network.
● ASR - Automatic Speech Recognition.
● ASSP - Acoustics Speech and Signal Processing
● AVIOS - American Voice I/O Society
● CELP - Code-book Excited Linear Prediction.
● COLING - COmputational LINGuistics
● DTW - Dynamic Time Warping.
● FAQ - Frequently Asked Questions.
● HMM - Hidden Markov Model.
● IEEE - Institute of Electrical and Electronics Engineers
● JASA - Journal of the Acoustic Society of America
● LPC - Linear Predictive Coding.
● LVQ - Learned Vector Quantisation.
● MFCC - Mel Frequency Cepstral Coefficients
● NLP - Natural Language Processing.
● NN - Neural Network.
● TIMIT - A speech corpus with phoneme labels - see Q1.7
● TTS - Text-To-Speech (i.e. speech synthesis).
● VQ - Vector Quantisation.

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 01:01 16-Apr-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.3.html [10/31/2003 8:50:41 AM]


Q1.5: Associations, publications and conferences

Q1.5: Associations, Journals and


Conferences
[Note: Also see the list provided in Shikano's WWW site on Speech and Acoustics: http://www.aist-
nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html.]

Associations
Institute of Electrical and Electronics Engineers (IEEE)

● Publications: include IEEE Transactions on Signal Processing, IEEE Transactions on Speech


and Audio (from Jan 93), IEEE Transactions on Acoustics, Speech, and Signal Processing
(now obsolete), IEEE Signal Processing Magazine. (More information on the WWW:
http://www.ieee.org/sp/index.html).
● Speech-Related Conferences: ICASSP - Intl. Conf. Acoustics, Speech, and Signal Processing.
IEEE also runs speech technology related workshops and many other conferences. (Does
anyone have a list?)
● Contact: IEEE Service Center
445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855, USA
Phone: 1-800-678-IEEE or (201) 981-0060
● WWW: IEEE: http://www.ieee.org/
IEEE Signal Processing Society http://www.ieee.org/sp/index.html

The Acoustical Society of America (ASA)

● Publications: Journal of the Acoustical Society of America (JASA)


● Conferences: ASA holds four meetings a year. Information is available on the WWW:
http://asa.aip.org/meetings.html.
● Contact: ASA Office Manager,
500 Sunnyside Blvd, Woodbury, NY 11797-2999, USA
Ph: (516) 576-2360, FAX (516) 576-2377
Email: asa@aip.org
● WWW: http://asa.aip.org/

European Speech Communication Association (ESCA)

● Publications: Speech Communications


● Conferences: EUROSPEECH is held every two years. E'97 will take place in Patras, Greece, in
September 1997. ESCA organises regular speech-related workshops: see their WWW pages for
details.

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (1 of 7) [10/31/2003 8:50:43 AM]


Q1.5: Associations, publications and conferences

● Contact: Secretariat ESCA


ICP, Universite Stendhal,
BP 25X, F38400 Grenoble Cedex 9, France
Ph: (+33).76.82.43.36 Fax (+33).76.82.43.35
Email: esca@icp.grenet.fr
● WWW: http://ophale.icp.grenet.fr/esca/esca.html

Association for Computational Linguistics (ACL)

● Publications: Computational Linguistics


● SIGPHON: Special Interest Group for Computational Phonology. The home page is provided
by the Centre for Cognitive Science at the University of Edinburgh. A special issue on
Computational Phonology appeared in Vol 20, Num 3 of Computational Linguistics and
included an Introduction to Computational Phonology by Steven Bird
● Conferences: COLING is held bi-annually. ACL also organises a range of workshops. See the
WWW pages for details.
● Contact: P.O. Box 6090
Somerset, NJ 08875, USA
Ph: (908) 873 3893
Email: acl@bellcore.com
● WWW: http://www.cs.columbia.edu:80/~acl/

American Voice Input/Output Society (AVIOS)

● Description: AVIOS is a not-for-profit organization, dedicated to disseminating information


about applications using speech technology. It aims "to bridge the gap between emerging voice
technology and its application, by providing an interactive forum for the technologists,
students, system developers, business managers, and users actively involved in or with an
interest in the field of voice processing."
● Publications: International Journal of Speech Technology (with Kluwer Academic Publishers)
The Journal of the American Voice Input/Output Society was published from 1984 to 1994.
● Conferences: The International Voice Input/Output Applications Conference is held annually
(since 1982): Sept 10-12, San Jose, CA.
● Contact: 4010 Moorpark Avenue, Suite 105M, San Jose, CA 95117, USA
Ph: +1-408-248-1353, Fax: +1-408-248-0251
Email: avios@pilot.net
WWW: http://www.avios.com/

European Language Resources Association

● Description: The European Language Resources Association was established in Luxembourg


in February, 1995, with the goal of creating an organization to promote the creation,
verification, and distribution of language resources in Europe. A non-profit organization,
ELRA aims to serve as a central focal point for information related to language resources in

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (2 of 7) [10/31/2003 8:50:43 AM]


Q1.5: Associations, publications and conferences

Europe, It will help users and developers of European language resources, as well as
government agencies and other interested parties, exploit language resources for a wide variety
of uses. It will also oversee the distribution of language resources via CD-ROM and other
means and promote standards for such resources.
● More info: see the ELRA Home page for membership information, lists of resources etc.
● Contact: K. Choukri, Executive Director ELRA
87, Avenue d'Italie, 75013 Paris, FRANCE
Ph: +33 1 45 86 53 00, Fax: +33 1 45 86 44 88
Email: elra@calvanet.calvacom.fr
WWW: http://www.icp.grenet.fr/ELRA/home.html

ASSTA: Australian Speech Science and Technology Association

● Conference: SST, the Australian conference on Speech Science and Technology, is held bi-
annually. SST-96 will be held in Adelaide.
● WWW: Home Page: http://cslab.anu.edu.au/~bruce/assta/
List of members: http://ciips.ee.uwa.edu.au/~roberto/assta-users/

SALT: UK Speech and Language Technology Club

● WWW home page: http://salt.essex.ac.uk/salt/

Linguistic Associations

● A comprehensive list of linguistic associations and linguistic WWW links is available at


http://engserve.tamu.edu/files/linguistics/linguist/associations.html

Industry Publications
ASR News

● Description: Monthly newsletter covering developments in the speech recognition and speech
synthesis marketplace.
● Note: Voice Information Associates also publish "Automatic Speech Recognition: A study of
the world-wide market" (revised 1995) and "Text-to-Speech Technology Markets: 1995-2000"
(revised 1995)
● Contact: Voice Information Associates, Inc.
14 Glen Road South, P.O. Box 625, Lexington, MA 02173, USA
Ph: +1-617-861-6680, Fax: +1-617-863-8790
Email: asrnews@tiac.net
WWW: http://www.tiac.net/users/asrnews/

Voice News

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (3 of 7) [10/31/2003 8:50:43 AM]


Q1.5: Associations, publications and conferences

● Description: Monthly newsletter reporting on voice mail, voice response, speech recognition,
speech synthesis, digital voice record/playback and related technologies, markets and company
activities. Review copy available on request.
● Contact: Stoneridge Technical Services
P.O. Box 1891, Rockville, MD, 20849, USA
Ph: +1-301-424-0114, Fax: +1-301-424-8971
Email: info@stoneridgetech.com
WWW: http://www.stoneridgetech.com/

Speech Recognition Update

● Description: Monthly news and analysis of speech recognition markets, applications and
technology.
A free sample copy is available by contacting TMA Associates.
● Also: TMA Associates also publishes market studies, including The Advanced Speech
Technology Market: Recognition, Synthesis and Compression (1996) and Voice ID (1996).
● Contact: TMA Associates
6021 Wish Avenue, Encino, CA 91316, USA
Ph: +1-818-708-0962, Fax: +1-818-345-2980
Email: 72162.3172@compuserve.com
http://www.tmaa.com/

Voice Technology and Services News

● Description: Follows integrated PC LAN messaging (voice, fax, mail, video) and speech
technology. It follows the merging computer and telephone technologies, provides insights into
business and marketing opportunities and offers executive timely information on industry trend
analysis.
● Contact: Phillips Business Information
1201 Seven Locks Rd., Potomac, Maryland, 20854, USA
Ph: 1-800-777-5006 OR +1-301-340-1520
Subscription FAX: +1-301-309-3847
Editorial FAX: +1-424-4297

Telleconnect

● Contact: +1-212-691-8215

Computer Telephony

● Contact: +1-212-691-8215

Voice Processing Magazine

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (4 of 7) [10/31/2003 8:50:43 AM]


Q1.5: Associations, publications and conferences

● Contact: 1-800-854-3112

Speech Technology

● Description: No longer published

Technical and Research Publications


Computer Speech and Language

● Price: $US170 (Institutions), $US75 (Individuals), 4 issues per year.


● Publisher: Academic Press Limited
24-28 Oval Road, London NW1, England
WWW: http://www.apnet.com/

Speech Communication

● Contact: ESCA (see above)


● Publisher: Elsevier Science B.V.
P.O. Box 521, 1000 AM Amsterdam, The Netherlands.
WWW: http://www.elsevier.com/

IEEE Transactions on Speech and Audio Processing,

IEEE Signal Processing Magazine,

IEEE Transactions on Acoustics, Speech, and Signal Processing: OBSOLETE

● Contact: IEEE (see above)

Free Speech Journal

● Description: A Web Journal dedicated to the state of the art in human language technology.
Past volumes, editorial and submission information, and so on are
● Contact: Editor-In-Chief: Ron Cole: cole@cse.ogi.edu
WWW: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html

Linguistics Abstracts Online

● Description: online access to all abstracts published in Linguistics Abstracts since 1985, plus
all current material as it becomes available. Over 250 publications are indexed. Free trial
available.
http://www.blackwellpublishers.co.uk/labs/

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (5 of 7) [10/31/2003 8:50:43 AM]


Q1.5: Associations, publications and conferences

Computational Linguistics

● Contact: Published by Computational Linguistics Assoc. (see above)

Journal of the Acoustical Society of America (JASA)

● Contact: Published by Acoustical Society of America (see above)

International Journal of Speech Technology (was the AVIOS Journal)

● Description: Focuses on speech technology and its applications, and promotes research and
description of all aspects of speech input and output: applications, base technology, theory,
approach, experiment, and testing.
● Publisher: Kluwer Academic Publishers
101 Philip Drive, Norwell, MA 02061, USA
Ph: +1-617-871-6300, Fax: +1-617-871-0449
● Submissions to: International Journal of Speech Technology
Journals Editorial Office, Ms. Kelly Riddle
Kluwer Academic Publishers
(Address, phone, fax as above)
Email: krkluwer@world.std.com

Conferences
ICSLP: Intl. Conference on Spoken Language Processing
Next: 30 Nov to 4 Dec, 1998, Sydney, Australia
Held in even years.

ICASSP - Intl. Conf. Acoustics, Speech, and Signal Processing

Eurospeech

Computational Linguistics (COLING), held bi-annually

International Voice Input/Output Applications Conference

SST: Australian Speech Science and Technology Conference

Also see the following lists on the WWW:

Shikano's WWW site on Speech and Acoustics


http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (6 of 7) [10/31/2003 8:50:43 AM]


Q1.5: Associations, publications and conferences

Institute of Phonetic Sciences WWW list


http://fonsg3.let.uva.nl/Other_pages.html#Meetings

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 18:29 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.5.html (7 of 7) [10/31/2003 8:50:43 AM]


Q1.6: Handicap Aids

Q1.6: Handicap Aids


The following are products and companies which support users who can benefit from the use of
speech technology in a user interface. Please feel free to submit information on relevant products,
names of companies and links to useful information on the Internet (especially WWW sites).
[Of course, most of the products listed in Q5.5 and Q6.5 are useful.]

Man-Machine Interfacing
SpeechViewer II

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5],
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 11:17 27-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.6.html [10/31/2003 8:50:44 AM]


Q1.7: Speech Databases

Q1.7: Speech databases


A wide range of speech databases have been collected. These databases are primarily for the
development of speech synthesis/recognition and for linguistic research.

Some databases are free but most are not. The databases normally require lots of storage space (100's
of MBytes is not unusual). Do not expect to be able to ftp large amounts of speech data.

In addition to the descriptions of speech databases and speech database providers below, information
can be obtained from

LDC: Linguistic Data Consortium


Provides a very wide range of speech and text data to research and commercial users: see
below.
COCOSDA Home Page: http://www.itl.atr.co.jp/cocosda/
The International Committee for the Co-ordination and Standardisation of Speech Databases
and Assesment Techniques for Speech Input/Output.
Shikano's WWW site on Speech and Acoustics
http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html
RELATOR Project
European resource initiative: see below.

The following speech data resources are described in the FAQ.

Bavarian Archive for Speech Signals


BUPT Spoken Digit Database (Chinese)
Center for Spoken Language Understanding (CSLU)
Examples of IPA Symbols
Linguistic Data Consortium (LDC)
NOISEX
Oxford Acoustic Phonetic Database
Phonemic Samples
RELATOR project
ShATR
University of Victoria Phonetic Database

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.8], [Q1.9], [Q1.10], [Q1.11]

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.7.html (1 of 2) [10/31/2003 8:50:44 AM]


Q1.7: Speech Databases

Administrivia, Copyright, Submit Information : Last Revision: 16:48 14-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.7.html (2 of 2) [10/31/2003 8:50:44 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.10.html

Q1.10: Speech Research Sites


Rather than try to list the places round the world which perform speech research this FAQ lists sites on
the WWW where other comprehensive lists are maintained. Try the following:

Shikano's WWW site on Speech and Acoustics


http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html
Lists of speech research sites by country. Currently includes around 100 sites. The list of
Japanese sites is particularly comprehensive.
Mambo Speech Research List
http://mambo.ucsc.edu/psl/speech.html
Lists about 50 speech research sites and related information sources. Very nice presentation!
ESCA: European Speech Communication Association
http://ophale.icp.grenet.fr/esca/labos.html
Links to around 15 European speech research sites and around 15 related sources of
information.
Institute for Perception Research: Speech on the Web
http://www.tue.nl/ipo/hearing/webspeak.htm
Jan Roelof de Pijper at the Institute for Perception Research has a long list of research sites
plus links to lots of other speech material on the WWW.
Russ Wilcox's list of Commercial Speech Recognition
http://www.tiac.net/users/rwilcox/speech.html
Links to information on speech technology vendors, speech research labs, speech resources, on-
line demos and more.
Speech Groups List: Leeds University Cognitive Psychology Research Group
http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html
List of about 25 research sites.
Institute of Phonetic Sciences, Amsterdam
http://fonsg3.let.uva.nl/Other_pages.html#Phonetics
Good list of European sites.
Speech and Hearing Research Group, University of Sheffield, UK
http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclinks.html
Links to sites in the UK, USA, Europe and the rest of the world.
Duncan M. Forrest's Speech Recognition Resource List
http://www.skye.co.za/dmf/speech/

Most speech research sites have links to other speech research sites somewhere in their WWW pages.

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.10.html (1 of 2) [10/31/2003 8:50:45 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.10.html

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 14:22 12-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.10.html (2 of 2) [10/31/2003 8:50:45 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.11.html

Q1.11: Miscellaneous Software and


Resources.
Speech Interface Standards: APIs etc

ASAPI: Advanced Speech API (AT&T)


SAPI: Microsoft Windows Speech API
SRAPI: Speech Recognition API
TAPI: Microsoft Windows Telephony API

Network "Phone" Software

CUSeeMe
CyberPhone
DigiPhone
InterFACE from Hijinx
FAQ: How can I use the Internet as a telephone?
Nautilus: Secure Computer Telephony
NEVOT (1.4v) from AT&T BL
PGPfone
Speak Freely
Internet Phone from VocalTec
WebPhone
WebTalk

Audio Processing Software

AF version AF3R1
Voice E-Mail from Bonzi Software
MicNotePad Recording Software for Macs
MixViews
Network Audio System Release 1.1
NIST Software - SPHERE and SCORE
Sound Processing Kit
TCPplay

Human Audio Perception

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.11.html (1 of 3) [10/31/2003 8:50:50 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.11.html

Other useful information on Auditory Modeling can be found in

Malcolm Slaney's home page


http://www.interval.com/~malcolm/
Martin Cooke's home page
Speech and Hearing Research Group, Dept of Computer Science, University of Sheffield, UK.
http://www.dcs.shef.ac.uk/~martin/

Auditory Modeller 1
Auditory Modeller 2
Auditory Toolbox for Matlab
Human Audio Perception Document

Dictionaries and other Lexical Tools

BEEP dictionary
CMU dictionary
CUVOLAD dictionary (Oxford Dictionary)
Comprehensive Word List
EAT: Edinburgh Associative Thesaurus
Homophone List
Moby Lexical Resources
MRC Psycholinguistic Database
WordNet
Dictionaries on the WWW

Phonetic Fonts and Phonetic Samples

International Phonetic Alphabet


WWW: Phonetic Fonts and Examples Online
Summer Institute of Linguistics IPA Fonts
Phonetic Fonts for TeX and LaTeX
Yamada Language Center

Subjective Evaluation of Speech Quality

Dynastat, Inc.
Speech Intelligibility Testing with Diagnostic Rhyme Test (DRT), Modified Rhyme Test
(MRT), Phonetically Balanced Word Lists (PB), Diagnostic Medial Consonant Test (DMCT),
Diagnostic Alliteration Test (DALT), ICAO Spelling Alphabet Test (SpAT)
Speech Quality (Acceptability) Evaluation with Diagnostic Acceptability Measure (DAM),

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.11.html (2 of 3) [10/31/2003 8:50:50 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.11.html

Mean Opinion Score (MOS), Degredation Mean Opinion Score (DMOS)


Contact: Dynastat, Inc.
2704 Rio Grande, Suite 4, Austin, TX 78705, USA
Ph: +1-512-476-4797, Fax: 512/472-2883
Email: sharpley@dynastat.com
WWW: http://www.bga.com/dynastat/
ANSI S3.2-1989: American National Standard for Measuring the Intelligibility of Speech Over
Connunication Systems
Available from American National Standards Institute (ANSI)
Ph: +1-212-642-4900, Fax: +1-212-398-0023
WWW: http://www.ansi.org/
Louis Pols' List of References on Synthesis Development And Assessment
700 references: http://www.itl.atr.co.jp/cocosda/output/synth.refs

Very Miscellaneous

The vOICe
The Learning Company's Language Training
Wildfire - an Electronic Assistant

Back to Section 1 of the comp.speech FAQ Home Page.


Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.7], [Q1.8], [Q1.9], [Q1.10],

Administrivia, Copyright, Submit Information : Last Revision: 16:04 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Q1.11.html (3 of 3) [10/31/2003 8:50:50 AM]


Man-Machine Interfacing

Man-Machine Interfacing
● Description: Offers a service designed for people with physical challenges. Can successfully
implement a computerized voice controlled system adapted to unique needs.
They have developed a free-standing microphone and signal processing system to compensate
for speech/articulation distortions, and background noise produced by electronic devices such
as wheelchairs and respirators.
● Contact: Man-Machine Interfacing
P.O. Box 5371, Evanston, IL 60204
Ph: 1-888-425-2001, Fax : (847) 328-7975
Email: jwhite@mcs.com
WWW: http://www.speechrec.com/

Back to Q1.6 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:03 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Aids/man-machine.html [10/31/2003 8:50:51 AM]


SpeechViewer II

SpeechViewer II
● Platform: IBM Machines from Mod 25 on.
● Description: SpeechViewer II is a speech therapy tool. It provides graphical feedback of
various speech features so that speech impaired individuals can improve their speech. It works
with an audio bandwidth of 7.3 Khz and thus allows the therapist to work with sustained
vowels and fricatives. A wide range of graphics are used to provide adequate variability to hold
client interest. An extensive set of statistics are gathered which allows a therapist to do
research or keep therapy records. The speech therapy modules are:
❍ Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing

❍ Skill Building - Pitch, Voicing, Phonology

❍ Patterning - Pitch & Loudness - Waveform & Spectrogram, Spectra

❍ Clinical Management - Profiles, Models, Client Data

A multilingual option is available which provides support for 12 languages: Danish, Dutch,
Finnish, French, German, Icelandic, Italian, Norwegian, Portuguese, Spanish, Swedish, and
UK English. With the Multilingual Option, clinicians can use SpeechViewer II as a training
tool for English as a second language and for foreign language training.
● Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture Playback Adapter). It has
a TI TMS320C25 DSP chip. The input sampling rate is 44.1 Khz stereo, 88.2 Khz mono. This
is a 16 bit card. It has the following jacks: mic in, stereo line in, stereo line out, speaker out.
Note: This card is being replaced by Mwave technology. For more info on Mwave contact
Texas Instruments.
● Price:
❍ The software is $2130 list, $1491 educational, part number 92F2066.

❍ The M-ACPA is $370 list, $222 educational, part number 92F3378.

❍ The MicroChannel adapter part number is 92F3379 (same price).

● Contact: IBM Special Needs Information


1000 N. W. 51st Street, Internal Zip 5432, Boca Raton, Florida 33431, USA
Ph: 1-800-426-4832, TDD: 1-800-426-4833, Fax: 1-407-982-6059
Email: IBM_SPEC_NEEDS_INFO@vnet.ibm.com
WWW: http://www.austin.ibm.com/pspinfo/snsspv2.html

Back to Q1.6 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 18:43 18-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Aids/speechviewer.html [10/31/2003 8:50:51 AM]


Linguistic Data Consortium (LDC)

Linguistic Data Consortium (LDC)


The LDC was established to broaden the collection and distribution of speech and natural language
data bases for the purposes of research and technology development in automatic speech recognition,
natural language processing and other areas where large amounts of linguistic data are needed.
Detailed information on the LDC is now available on the WWW: http://www.ldc.upenn.edu/. The
LDC WWW server provides information on membership agreements, license agreements, and
summaries of speech and text corpora available.

Speech Corpora

● TIMIT Acoustic-Phonetic Continuous Speech Corpora and NYNEX Telephone Version of


TIMIT Corpus (NTIMIT)
● Resource Management Corpora
● Air Travel Information System (ATIS) Corpora (multiple)
● ARPA Continuous Speech Recognition Corpora (WSJ etc)
● Switchboard Corpus of Recorded Telephone Conversations and Switchboard Corpus Excerpts
(Credit Card Conversations)
● Texas Instruments 46-Word Speaker-Dependent Isolated Word Corpus (TI46)
● Texas Instruments Speaker-Independent Connected-Digit Corpus (TIDIGITS)
● Road Rally Conversational Speech Corpus
● HCRC Map Task Corpus
● Air Traffic Control Corpus (ATC0)
● SPIDRE Speaker Identification Corpus
● YOHO Speaker Verification Corpus
● OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone Corpus
● BRAMSHILL
● MACROPHONE
● King Corpus for Speaker Verification Research
● WSJCAM0: Cambridge Read News Corpus
● TRAINS Spoken dialog corpus
● NYNEX PhoneBook Database
● Frontiers in Speech Processing

Text Corpora

● Association for Computational Linguistics Data Collection Initiative (ACL/DCI)


● The Penn Treebank Project - Release 2
● TIPSTER Information Retrieval Text Research Collection
● United Nations Parallel Text Corpus (English, French, Spanish)
● Japanese Language Financial New
● European Corpus Initiative-1

Lexical Databases

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/ldc.html (1 of 2) [10/31/2003 8:50:52 AM]


Linguistic Data Consortium (LDC)

● CELEX Lexical Database


● COMLEX : COMmon LEXical Database of English (English syntax and pronunciation)

Contact information:

Linguistic Data Consortium


3615 Market Street, Suite 200, Philadelphia, PA, 19104-2608, USA.
Phone: +1 (215) 898-0464 Fax: +1 (215) 573-2175
e-mail: ldc@ldc.upenn.edu
WWW: http://www.ldc.upenn.edu/

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:40 20-Feb-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/ldc.html (2 of 2) [10/31/2003 8:50:52 AM]


RELATOR project

The RELATOR project


● Description: RELATOR is a European-wide consortium of researchers who, with the support
of the European Commission, are striving to establish a European repository of linguistic
resources. Linguistic resources comprise a variety of spoken and written language materials,
including lexicons, grammars, corpora, and spoken language databases. RELATOR will ensure
that the requirements of the European language processing community receive attention.
The RELATOR WWW pages provide information on the consortium, The languages currently
covered by the RELATOR consortium include Danish, Dutch, English, French, German,
Greek, Italian, Portuguese, Spanish plus multilingual resources. The resources include both
text and speech.
● WWW: http://cristal.icp.grenet.fr/Relator/homepage.html

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/relator.html [10/31/2003 8:50:53 AM]


Bavarian Archive for Speech Signals

Bavarian Archive for Speech Signals


● Description: The Bavarian Archive for Speech Signals (BAS) was founded in January 1995 as
an initiative of the Institute of Phonetics at the University of Munich, Germany. The BAS will
develop, validate, administrate and disseminate corpora of spoken German to the speech
community as well as to speech engineering industry. Presently the following German speech
corpora are available on ISO 9660 CDROM:
Siemens 1000 - SI1000
5 CDROMs, newspaper corpus, read speech, 10 speakers x 1000 utterances
Siemens 100 - SI100
7 CDROMs, read speech, 101 speakers x 100 sentences
PhonDat 1 - PD1
6 CDROMs, new edition in preparation, read speech, 201 speakers x 450+ sentences
PhonDat 2 - PD2
1 CDROM, read speech, 2nd edition, 16 speakers x 200 sentences, various labelled
information
Verbmobil
Spontaneous speech recorded in a dialog task (appointment scheduling). More
information on the VERBMOBIL project: http://www.dfki.uni-sb.de/verbmobil/
Corpora in Preparation
PhonDat I - PD1: 2nd extended edition (Jul 1995)
Strange Corpora - SC
Reference Corpora that reflect certain well known problems in speech processing, like
accents, repair, breaks, hesitations, repetitions, extreme F0, backround noise,
pathological speech, speaker adaptation. The first SC corpus (SC1 Accents) will be
edited in Jul 1995.
BAS Edition of Verbmobil Corpora - VM: 2nd extended edition
Articulatory data - AD: EMA data of speakers of SI1000 corpus
ERBA: 10000 utterances from a train inquiry task
● Misc: BAS is currently developing tools for the automatic annotation and segmentation of very
large speech corpora. This includes the automatic detection of variants of pronunciation, a
statistical based alignment and a rule-based refinement of the outcome. The BAS seeks to
cooperate with public institutions as well as with industrial partners to further develop new
German speech databases. BAS can be a platform to re-distribute existing German speech.
● Contact and More Information: The BAS is located at the University of Munich, Germany.
BAS c/o Institut fuer Phonetik
Schellingstr. 3/II
80799 Muenchen, Germany
Ph: +49-89-21802758, Fax: +49-89-2800362
Email: bas@sun1.phonetik.uni-muenchen.de
WWW: http://www.phonetik.uni-muenchen.de/BASSeng.html

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/bas.html (1 of 2) [10/31/2003 8:50:53 AM]


Bavarian Archive for Speech Signals

Administrivia, Copyright, Submit Information : Last Revision: 03:00 01-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/bas.html (2 of 2) [10/31/2003 8:50:53 AM]


BUPT Spoken Digit Database (Chinese)

BUPT Spoken Digit Database (Chinese)


● Vocabulary : {0, 1/yi/, 2, 3, 4, 5, 6, 7, 8, 9, 1/yao/, /dui/, /cuo/ }, 13 words in total.
● Size: 1202 speakers in total, 789 Males and 413 Females. Each speaker utters each word 2
times. Total of 31252 utterances.
● Format: 8000Hz 14bit sampling. One utterance per file.
● Contact:
GLuck Co.
195 Berlioz 1C, Nun's Island
Verdun H3E 1C1, Canada
e-mail: weigang@zaphod.math.mcgill.ca

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/bupt.html [10/31/2003 8:50:54 AM]


Center for Spoken Language Understanding (CSLU)

Center for Spoken Language Understanding


(CSLU)
● The ISOLET speech database of spoken letters of the English alphabet. The speech is high
quality (16 kHz with a noise cancelling microphone). 150 speakers x 26 letters of the English
alphabet twice in random order. The ISOLET data base can be purchased for $100 by sending
an email request to vincew@cse.ogi.edu. (This covers handling, shipping and medium costs).
The data base comes with a technical report describing the data.
● CSLU has a telephone speech corpus of 1000 English alphabets. Callers recite the alphabet
with brief pauses between letters. This database is available to not-for-profit institutions for
$100. The data base is described in the proceedings of the International Conference on Spoken
Language Processing.
❍ Contact vincew@cse.ogi.edu if interested.

● CSLU has released for universities its Continuous English Speech Corpus. The corpus contains
recorded speech from 690 different speakers, with label files at various levels - including word
level and phonetic labels. The data were collected as part of the OGI Multi-language telephone
corpus. CSLU provides speech corpora to all universities without charge. To order a corpus,
print the license agreement/order form, complete it, and fax it to the CSLU. A description of
the corpora and an order form are available:
http://www.cse.ogi.edu/CSLU/
ftp://speech.cse.ogi.edu/pub/releases
● Contact: Mike Noel: noel@cse.ogi.edu

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 02:58 01-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/cslu.html [10/31/2003 8:50:55 AM]


Examples of IPA Symbols

Examples of IPA Symbols


UCLA Sounds of the World's Languages

● Description: The UCLA Sounds of the World's Languages are available for Macintosh users
(no DOS based system currently available). The sounds are stored in a Hypercard database
developed at the UCLA Phonetics Laboratory. The aim is to illustrate and teach about the
range of sounds used in human languages with material on more than 80 languages. The set
demonstrates particular highlights of the sound systems focusing especially on rarer sounds
that students may not otherwise have a chance to hear from a native speaker. The recordings
are based on the archives of recordings collected at UCLA, with additional contributions from
outside collaborators. All the languages can be accessed from the list of language names, or by
clicking on the language name in a set of maps. Support for part of this work was provided by
NSF. The database currently includes examples of languages from Agul and Akan to Zulu.
● Availability: 15 DSDD disks, requiring about 35 meg of disk space when expanded. Available
for $50 individual $100 institutions. Prepayment in US dollars (checks or international money
orders payable to "UC Regents") must accompany all orders.
● Contact: The UCLA Phonetics Laboratory
Linguistics Department, UCLA, Los Angeles, CA 90095 1543
Tel: (310) 825-1254
E-mail: oldfogey@ucla.edu

John Eslings "IPA Labels"

● Description: A HyperCard stack which is available for free or a nominal fee.


● Contact: John Esling can be reached by email: pdb@uvvm.uvic.ca.

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/ipa.html [10/31/2003 8:50:55 AM]


NOISEX

NOISEX-92
● Description: Database of recording of various noises available on 2 CDROMs. Some material
from the same source is available by anonymous ftp in the IEEE's Signal Processing
Information Base. The samples include
❍ Voice babble

❍ Factory noise

❍ HF radio channel noise, pink noise, white noise

❍ Various military noises; fighter jets (Buccaneer, F16), destroyer noises (engine room,

operations room), tank noise (Leopard, M109), machine gun


❍ Volvo 340

● Availability 1: The cost of this database is 135 Pounds Sterling for the set of two CD-ROMs.
Send payment with order to:
The Speech Research Unit,
Ex1, DRA Malvern, St.Andrew's Road,
Malvern, Worcestershire, WR14 3PS, UK
Tel +44-684-894074 Fax +44-684-894384
Note: The supply of CD-ROMs is limited so please check that they are still available before
placing an order. The only acceptable methods of payment are cheques (from the UK only) or
bank drafts in Pounds Sterling drawn on a UK bank. They should be made payable to:-
Public Sub Account HMG 4768.
● Availability 2: Information on how to obtain a copy of the NATO RSG.10 NOISE-ROM-0 can
be obtained from the DRA Speech Research Unit (address above) or from:
Dr. Herman Steeneken,
TNO Institute for Perception,
P.O. Box 23, 3769 ZG Soesterberg,
The Netherlands.
● Availability 3 (WWW): Examples of the NOISEX database are available on the Rice University
Digital Signal Processing (DSP) group home page. (Note the files are large (>20MB).
http://spib.rice.edu/spib/select_noise.html

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:14 13-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/noisex.html [10/31/2003 8:50:56 AM]


Oxford Acoustic Phonetic Database

Oxford Acoustic Phonetic Database


● Available on compact disc, from J. Pickering and B. Rosner. It contains data on vowel-
consonant and consonant-vowel combinations in both stressed and unstressed locations. The
language covered include French, German, Hungarian, Italian, Japanese, British English,
Spanish and English. For further information write to
Electronic Publishing, Oxford University
Press, Walton Street, Oxford OX2 6DP, UK.
The ISBN is 0-19-268086-2
● Contact:
Prof. B. Rosner
Dept. of Experimental Psychology
South Parks Rd, Oxford, OX1 3UD, UK
email: burton.rosner@wolfson.ox.ac.uk

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/oxford.html [10/31/2003 8:50:57 AM]


Phonemic Samples

Phonemic Samples
● Some basic data. The following ftp sites have samples of English phonemes (American accent
I believe) in Sun audio format files. See Question 1.8 for information on audio file formats.

ftp://sounds.sdsu.edu/.1/phonemes: This ftp site appears to be obsolete. Does anyone know a


new address?
ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes: There appears to be some config problem
with this ftp server.
ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/phonemic.html [10/31/2003 8:50:57 AM]


ShATR

ShATR
● Description: Multi-simultaneous-speaker corpus available on one CDROM. This specialised
corpus is primarily intended to provide acoustic material for studies in auditory scene analysis.
However many researchers in the speech sciences, ranging from acoustics to discourse analysis
may find it a valuable source of information. The corpus has been transcribed and aligned at
four different levels of analysis. An overlap analysis between the individual speaker channels
and word counts are available. There is also a general tool for accessing concurrent events in
transcribed multi-sound-source databases.
● Cost: 30 Pounds Sterling for one CD-ROM. Availability, licensing and ordering information is
provided on ShATR's home page.
● Examples: Samples of the ShATR database are available on ShATR's home page and by
anonymous ftp
ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
● Contact: Speech and Hearing Research Group
Department of Computer Science, University of Sheffield
Regents Court, 211 Portobello Street, Sheffield S1 4DP, U.K.
WWW: http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.html

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 14:13 12-Nov-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/shatr.html [10/31/2003 8:50:58 AM]


University of Victoria Phonetic Database

University of Victoria Phonetic Database


● Platform: Computerized Speech Lab CSL4300, MultiSpeech on Winxx or Win95 with any
multimedia card, or a SoundBlaster16 option with support from the PDBAUDIO program.
● Description: Phonetic database consisting of proprietary format digitized speech samples from
45 world languages on CDROM. The CDROM is supported by hardcopy documentation
containing the phonetic inventory of each language, transcriptions and orthography of each
digitized speech sample. The PDB depicts and compares the the sounds, symbols and
conventions of transcription used by these languages. More information is available from the
STR web site.
● Contact: Speech Technology Research Ltd.,
Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
Ph: +1-250-477-0544
Email: products@speechtech.com
WWW: http://www.speechtech.com/home/speechtech/

Back to Q1.7 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 01:00 15-May-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Data/victoria.html [10/31/2003 8:50:59 AM]


DADiSP from DSP Development Corporation

DADiSP from DSP Development Corporation


● Platform: Windows and various Unix
● Description: DADiSP is designed for scientists and engineers to collect, analyze, and display
scientific and technical data. Packages available include AdvDSP, Controls, DADiMP, Filters,
GPIBLab, NeuralNet, and Stats.
A description of the application of DADiSP to speech processing is provided on the DSP
Development Corporation WWW site.
Detailed product information is available on the DSP Development Corporation WWW site
and by filling out a WWW form.
● Cost: Unknown
● Availability: See the DSP Development Corporation WWW site
A free, fully featured demo of DADiSP 4.0 is available from the DSP Development
Corporation WWW site and can be mailed on floppy disk.
A special Student Edition of DADiSP is available for free.
● Contact: DSP Development Corporation
One Kendall Square, Cambridge, MA 02139, USA
Ph: (617) 577-1133 Fax: (617) 577-8211
EMail: info@dadisp.com
WWW: http://www.dadisp.com/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:34 29-May-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/dadisp.html [10/31/2003 8:50:59 AM]


GoldWave

GoldWave
● Platform: Windows
● Description: GoldWave is a digital audio editor for Microsoft Windows. It features realtime
amplitude/spectrum oscilloscopes, large file editing, effects, and support for a wide variety of
sound formats.
❍ Editing of multiple waveforms and large waveforms

❍ Realtime amplitude/spectrum oscilloscopes

❍ Resizable device controls window for accessing audio devices

❍ Realtime fast forward and rewind playback

❍ Effects: distortion, Doppler, echo, filter, mechanize, offset, pan, volume shaping, invert,

resample, transpose, etc


❍ Multiple file formats and conversions: .WAV, .AU, .IFF, .VOC, .SND, .MAT, .AIFF,

and raw data


❍ CD-ROM controls window

More information is available on the GoldWave home page.


● Cost: Shareware
● Availability: Through the GoldWave home page:
http://web.cs.mun.ca/~chris3/goldwave/goldwave.html
● Contact: Chris Craig: chris3@cs.mun.ca

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:08 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/goldwave.html [10/31/2003 8:51:00 AM]


Khoros

Khoros
● Platform: Any Unix - source code available.
● Description: Khoros is a technical computing environment for image and signal processing,
visual programming and software development.
● Price: On request.
● Availability: Khoral Research Inc.
6001 Indian School Rd. NE Suite 200, Albuquerque, NM 87110, USA
Ph: (505)837-6500, Fax: (505) 881-3842
Email: info@khoral.com
ftp: ftp://ftp.khoral.com/
WWW: http://www.khoral.com/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 11:08 02-Oct-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/khoros.html [10/31/2003 8:51:01 AM]


Matlab plus Signal Processing Toolbox

Matlab plus Signal Processing Toolbox


● Platform: Wide range
● Description: Matlab (MATrix LABoratory) is a technical computing environment for
numerical computation and visualization based on a matrix oriented, interpreted programming
language. The programming environment provides support for the development of customized
operations, along with debugging facilities and a graphical user interface toolkit. Audio output
is provided.
A specialised Signal Processing Toolbox is available which provides many functions which are
useful for speech analysis. It includes filter design, spectral estimation, statistical signal
processing, waveform generation, and signal and spectrogram display.
A specialised Auditory Toolbox is available which contains functions useful to people
interested in auditory/cochlear models. A more detailed description is given in Q1.10.
● Price: On request.
● Contact: The Math Works Inc. 24 Prime Park Way, Natick, MA 01760-1500 USA
Ph: 1-508-653 1415 Fax: 1-508-653 6284
Email: info@mathworks.com
ftp: ftp://ftp.mathworks.com
WWW: http://www.mathworks.com/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:08 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/matlab.html [10/31/2003 8:51:01 AM]


MacSpeech Lab II

MacSpeech Lab II (MSL II)


● Platform: Macintosh
● Description: A sound analysis and acquisition for Macs. MSL II delivers the most common
functions for speech analysis (FFTs, LPCs, f0 extraction, etc.) & produces grayscale
spectrographic displays. Can be used for various speech technology and phonetic training
tasks.
● Hardware: Requires MacADIOS ("Macintosh Analog/Digital Input/Output System") hardware
for speech I/O at 12/16 bits.
● Misc: Software no longer updated by GW Instruments; MSL soft/hardware will not perform
input/output on Quadras, for example, though analysis seems fine. Known to operate properly
on systems as high as IIcx & II fx.
● Availability: MSL has been replaced by SoundScope; see the SoundScope entry for more
detail.
● Contact:
GW Instruments
35 Medford Street, Somerville, MA 02143, USA
Phone: (617) 625-4096 Fax: (617) 625-1322

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:08 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/msl.html [10/31/2003 8:51:02 AM]


N!Power

N!Power
● Platform: SUN, DEC and HP workstations.
● Description: An object-oriented software package with a MOTIF GUI interface and a range of
functionality for data analysis/editing, signal analysis, speech processing, real-time A/D and
D/A, and 2D/3D interactive graphics. N!Power replaces ILS.
N!Power can provide a Block Diagram user interface, menus, pop-ups, and a high-level IEEE
standard symbolic scripting language. You can customize the blocks, menus and pop-ups with
mouse point-and-click operations.
● Contact: Signal Technology, Inc.
104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126
Phone: +1-805-899-8300, Fax: +1-805-899-4344
Email: stisales@signal.com
WWW: http://www.silcom.com/~stilarry/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:08 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/npower.html [10/31/2003 8:51:03 AM]


Ptolemy

Ptolemy
● Platform: Sun SPARC, DecStation (MIPS), HP (hppa).
● Description: Ptolemy provides a highly flexible foundation for the specification, simulation,
and rapid prototyping of systems. It is an object oriented framework within which diverse
models of computation can co-exist and interact. Ptolemy can be used to model entire systems.

Ptolemy has been used for a broad range of applications including signal processing,
telecomunications, parallel processing, wireless communications, network design, radio
astronomy, real time systems, and hardware/software co-design. Ptolemy has also been used as
a lab for signal processing and communications courses. Ptolemy has been developed at UC
Berkeley over the past 3 years. Further information, including papers and the complete release
notes, is available from the FTP site.
● Cost: Free
● Availability: The source code, binaries, and documentation are available by anonymous ftp
from
ftp://ptolemy.berkeley.edu/pub/README

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:09 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/ptolemy.html [10/31/2003 8:51:03 AM]


Quadravox Speech Processing Products - Qbox

Quadravox Speech Processing Products - Qbox


● Platform: Windows 3.1, Windows 95
● Description: Qbox comprises a Windows-based LPC-12 analysis and editing sytem and a
parallel-port driven programmer for one-time-programmable TI TSP50P11 synthesis chips.
The analysis software utilizes standard 11025Hz, 16bit monaural .wav files for input and
allows graphical editing of the coded pitch, gain, and reflection coefficients. It can also be used
to define concatenation sequences of individual phrases. Data rates depend on the original
sound, but are typically below 2000bits/sec. The processed data can then be merged with
synthesis and control routines and programmed into the TI synthesizer. The Quadravox-
developed synthesis routine accepts run-time modifications of pitch and frame-length (speed),
as well as externally defined concatenation sequences. The synthesis chip interface can be
defined as a matrixed-keyboard drive, a simple parallel control, or a serial bus control
supporting up to 31 individually addressed devices and modules.
● Cost: $90-$150 depending on options selected.
● Contact: Quadravox, Inc.
1701 N. Greenville Ave., Suite 608, Richardson, TX, 75081 USA
Ph: 214-669-4002
Email: info@quadravox.com
WWW: http://www.quadravox.com/

Back to Q1.9 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:20 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Labs/quadravox.html [10/31/2003 8:51:04 AM]


ASAPI: Advanced Speech API (AT&T)

ASAPI: Advanced Speech API (AT&T)


● Description: The AT&T ASAPI Specification is a open, cross-platform, easy-to-use speech
API that can support speech engines from AT&T and other vendors. ASAPI does not replace
the Microsoft Speech API, but it provides extensions and enhancements to the Microsoft SAPI
Specification including support for SAPI-compatible applications.
The ASAPI Specification defines two types of interfaces. The "ASAPI Extensions" interface
which provides extensions to the MS-SAPI interface as well as C++ class encapsulation of
SAPI functionality. The "Visual ASAPI" interface provides an even higher-level abstraction of
SAPI/ASAPI low-level functionality such that application developers can quickly and easily
embed speech technology into existing or new applications. Special Purpose Recognizers are
examples of Visual ASAPI interfaces which integrate lower-level functionality that an
application developer can access via a simple interface.
● More information: Contact Jose Garcia at AT&T on (908) 957-5457 or by email: jrg@att.com.
For more information on the WATSON Speech Engine which supports ASAPI and news about
ASAPI please visit the AT&T Advanced Speech Products Group home page or call 1-800-5-
WATSON.

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 17:09 07-Jun-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Interfaces/asapi.html [10/31/2003 8:51:05 AM]


SAPI: Microsoft Windows Speech API

SAPI: Microsoft Windows Speech API


● Platform: Windows 95 and Windows NT 3.51
● Description: The Microsoft Speech API provides applications with the ability to incoporate
speech recognition (command & control or dictation) or text-to-speech, using either C/C++ or
Visual Basic. SAPI follows the OLE Component Object Model (COM) architecture. It is
supported by many major speech technology vendors. The major interfaces are
❍ Voice Commands: high level speech recognition API for command and control.

❍ Voice Text: simple high level text-to-speech API.

❍ Speech Recognition: provides detailed control of a speech recognition engine for both

command-and-control and dictation.


❍ Text-to-Speech: provides detailed interface to a text-to-speech engine for control of

playback, speaking style, voice quality etc.


❍ Multimedia Audio Objects: audio I/O for microphones, headphones, speakers, telephone

lines, files etc.


● Availability: Download Microsoft's latest speech technology, including the Microsoft Speech
SDK, command and control recognition, the Microsoft dictation research demonstration and
text-to-speech.
● More information: Email: MSSpeech@Microsoft.Com
WWW: The Microsoft Speech API
WWW: An Overview of the Microsoft Speech API
Documentation included with the Microsoft SDK.
● See also: TAPI: Microsoft Telephone API

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 16:01 21-Apr-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Interfaces/ms.speech.api.html [10/31/2003 8:51:06 AM]


SRAPI: Speech Recognition API

SRAPI: Speech Recognition API


● Platform: Various
● Description: The SRAPI provides support for speech recognition, text-to-speech and other
media playback. The SRAPI Committee is a nonprofit Utah corporation with the goal of
providing solutions for interaction of speech technology with applications.
Core members include: Novell, Inc., Dragon Systems, IBM, Kurzweil AI, Intel, and Philips
Dictation Systems. Additional contributing members include Articulate Systems, DEC, Kolvox
Communications, Lernout and Hauspie, Syracuse Language Systems, Voice Control Systems,
Corel, Verbex and Voice Processing Corporation.
● More information: WWW: http://www.srapi.com/
Email: For more information on the SRAPI Developer CD, send email to srapi@srapi.com
with Subject "SRAPI CD Info".

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:27 10-Jun-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Interfaces/srapi.html [10/31/2003 8:51:06 AM]


TAPI: Microsoft Windows Telephony API

TAPI: Microsoft Windows Telephony API


● Description: TAPI allows applications to support telephone communication. TAPI facilitates
include:
❍ Connecting directly to a telephone network.

❍ Automatic phone dialing.

❍ Transmission of data (files, faxes, electronic mail).

❍ Access to data (news, information services).

❍ Conference calling.

❍ Voice mail.

❍ Caller identification.

❍ Control of a remote computer.

❍ Collaborative computing over telephone lines.

Windows 95 comes with a telephony application, DIALER.EXE, that can dial voice calls, act
as a proxy for applications making simple telephony requests, and maintain a call log.
● More information: The Win32 Software Development Kit (SDK) contains documentation,
tools, and sample code for TAPI including the Microsoft Telephony Programmer's Reference
and the Microsoft Telephony Service Provider Interface (TSPI) for Telephony.
WWW: Tapping in TAPI, TAPI White Paper
● See also: SAPI: Microsoft Speech API

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:03 10-Mar-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Interfaces/tapi.html [10/31/2003 8:51:07 AM]


CUSeeMe

CUSeeMe
● Platform: Macintosh and Windows
● Description: Cornell University software for audio and video conferencing over the Internet.
● Requirments: Macintosh to RECEIVE video:
❍ Macintosh platform with a 68020 processor or higher

❍ System 7 or higher operating system

❍ Minimum 16-level-grayscale (e.g. color)

❍ IP network connection and MacTCP

❍ Apple's QuickTime, to receive slides with SlideWindow

Macintosh to SEND video:


❍ All the above plus

❍ Quicktime installed

❍ video digitizer (with vdig software) and Camera

For Windows:
❍ Video receive only 386SX, Video send & receive 386DX, Video receive w/Audio

486SX, Video send & receive w/Audio 486DX


❍ Windows 3.1 or higher running in Enhanced Mode.

❍ Winsock

❍ 256 color (8 bit) video driver

❍ Video camera and a video capture board that supports Microsoft Video For Windows

❍ For audio: Windows Sound board that conforms to the Windows MultiMedia

Specification, speakers and a microphone


● Availability: Mac: http://cu-seeme.cornell.edu/get_cuseeme.html
Windows: http://cu-seeme.cornell.edu/PC.CU-SeeMeCurrent.html
● More information: http://cu-seeme.cornell.edu/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:10 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/cuseeme.html [10/31/2003 8:51:08 AM]


CyberPhone

CyberPhone
● Platform: Sun Workstations running Solaris 2.x (SunOS 5.x)
● Description: Provides voice communications over the internet. Has a graphical user interface
and requires no additional hardware. An optional centralized server system is available to make
finding and connecting to other users easier.
● Availability: a free demonstration is available by anonymous ftp
ftp://magenta.com/pub/cyberphone
● Contact: Email: cyberphone@magenta.com. More information is available on the WWW:
http://magenta.com/cyberphone/.

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:10 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/cyberphone.html [10/31/2003 8:51:08 AM]


DigiPhone

DigiPhone
● Platform: Macintosh, Windows 3.1 and Windows 95
● Description: DigiPhone provides two-way phone conversations by dialing direct and over the
Internet. Includes encryption for privacy, caller ID, call screening, call timer, adjustable sound
and compression quality, messaging, and access to the Global Directory providing a database
of DigiPhone users.
❍ DigiPhone v1.03: provides the standard features listed above. [ More information].

❍ DigiPhone Deluxe: provides the standard features of DigiPhone v1.03 and adds

conference calling, mute, speed dial, call recording and playback, voice effects,
customizations, and internet tools. [ More information].
❍ DigiPhone for Mac: provides the standard features listed above, plus cross-platform

compatibility and mute. [ More information].


● Requirements: DigiPhone v1.03 requires 386DX/33 or faster, 4MB RAM, 9,600 bps modem,
Sound Blaster 16 card (or any compatible half or full duplex card), and a local internet
connection with SLIP or PPP. [Recommend 486DX/33 and 14,400 bps modem]
DigiPhone Deluxe has the same requirements on v1.03 but requires 486DX/33 or faster.
DigiPhone for Mac requires a 68030 33Mhz, 68040 25Mhz or Power PC, 4 MB RAM, System
7.x, 14,400 bps modem or better, Sound Manager 3.x for System 7, microphone and speakers,
MacTCP or Open Transport and a local internet connection with SLIP or PPP.
● Price and Availability: Contact Third Planet Publishing for pricing. Trial software is available
from Third Planet Publishing. Orders and Upgrades can be made on the Web. Also available
through many retailers.
● Contact: Third Planet Publishing, Inc.
17770 Preston Rd, Dallas, Texas 75252, USA
Ph: +1-972-733-3005, Fax: +1-972-380-8712
Email: 3pp@planeteers.com
WWW: http://www.planeteers.com/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 06:01 06-Jan-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/digiphone.html [10/31/2003 8:51:09 AM]


InterFACE from Hijinx

InterFACE from Hijinx


● Platform: Windows
● Description: InterFACE provides voice communication on the Internet through IRC (Internet
Relay Chat) services.
● Requirments: Recommend a 486DX, 8meg Ram, Windows, VGA Monitor and a 16 bit sound
card.
● Availability: Available on CD Only for $60.00 US, which includes, postage and handling.
Demo versions available from the HiJiNX WWW site.
● Contact: HiJiNX, Brisbane, Australia
Email: jester@hijinx.com.au
WWW: http://www.hijinx.com.au/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:10 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/interface.html [10/31/2003 8:51:10 AM]


FAQ: How can I use the Internet as a telephone?

FAQ: How can I use the Internet as a telephone?


● Description: Kevin M. Savetz and Andrew Sears have prepared an FAQ document titled FAQ: How can I use
the Internet as a telephone? The current document has the following sections:
❍ Can I use the Internet as a telephone?

❍ What do I need to call others on the Internet?

❍ How does it work?

❍ How do I make calls using a modem?

❍ Is the sound quality as good as a regular telephone?

❍ Is there a noticeable delay in hearing the other user?

❍ What is the difference between full duplex and half duplex?

❍ What is multicasting?

❍ Can I talk to users of other phone software?

❍ What software is available?

The section on available software covers the following:


❍ Mac: Maven, NetPhone, CU-Seeme, PGPfone

❍ Windows: Speak Freely, CU-Seeme, Internet Phone, Digiphone, Internet Voice Chat, Internet Global

Phone, Web Phone


❍ UNIX: Speak Freely, nevot, vat, mtalk, ztalk

● Availability:
By Email
Mail voice-faq-request@northcoast.com
with "Subject: archive"
and "Body: send voice-faq"
FTP
ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_How_can_I_use_the_Internet_as_a_telephone?
WWW:
http://rpcp.mit.edu/~asears/voice-faq.html
● Contact: Andrew Sears: asears@mit.edu
Kevin Savetz: savetz@northcoast.com

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:10 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/iphone.faq.html [10/31/2003 8:51:11 AM]


Nautilus: Secure Computer Telephony

Nautilus: Secure Computer Telephony


● Platform: DOS, Linux, SunOS, Solaris.
● Description: Nautilus is software which allows two users to hold a secure conversation with
either over ordinary phone lines or over a computer network. Nautilus uses your computer's
audio hardware to digitize and play back your speech using speech compression algorithms
built into the program. It encrypts the compressed speech using your choice of the Blowfish,
Triple DES, or IDEA block ciphers, and transmits the encrypted packets over the internet or
your modem to another computer. At the other end, the process is reversed. Nautilus operates
in half duplex mode like a speakerphone -- only one person can talk at a time. Either user can
hit a key to switch between talking and listening. Audio quality ranges from fair to very good
depending on which of the four speech coders is selected. The Nautilus WWW page provides
more detailed information.
● Requirements: Nautilus runs on IBM PC-compatible computers (386DX25 or faster) under
MSDOS or Linux as well as audio-capable Sun workstations running SunOS or Solaris. The
MSDOS version of Nautilus requires a Soundblaster compatible sound card and currently only
runs over ordinary phone lines with a modem. To use Nautilus over ordinary telephone lines, a
modem capable of connecting at 4800 bps or faster is required.
● Availability: Nautilus is available in three different formats. As a DOS executable, it is
available as an archive in zip format along with it's associated documentation. In source
format, it is available as either a zip-ed archive, or a gzip-compressed tar archive.
Nautilus is distributed freely (subject to US export restrictions) with full source code. This
insures that its security can be independently examined and verified. Follow the instructions in
the following README files to obtain Nautilus.
❍ ftp://ftp.csn.org/mpj/README

❍ ftp://ripem.msu.edu/pub/crypt/README

● More information: WWW: http://www.lila.com/nautilus/


● Contacts: The Nautilus development team includes Bill Dorsey, Paul Rubin, Andy Fingerhut,
Paul Kronenwetter, Bill Soley, and Pat Mullarky. To contact the developers, send email to
nautilus@lila.com.

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:41 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/nautilus.html [10/31/2003 8:51:11 AM]


NEVOT (1.4v) from AT&T BL

NEVOT (1.4v) from AT&T BL


● Platforms: Sun Sparc Station (SunOS 4.1.x) and Silicon Graphics
● Description: Audio-conferencing tool which supports both point-to-point and broadcasting of
audio using multicast IP. Audio encoding:
❍ PCM 64kb/s 8-bits u-law encoded 8KHz PCM (G.711)

❍ ADPCM 32 kb/s [Sun only] (G.721)

❍ DVI ADPCM 32 kb/s

❍ ADPCM 24 kb/s [Sun only] (G.723)

❍ CELP 4.8 kb/s

❍ LPC 2.4 kb/s

● Availability: by anonymous ftp from


ftp://gaia.cs.umass.edu/pub/hgschulz/nevot
● Contact: Henning Schulzrinne (hgs@researh.att.com)

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:11 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/nevot.html [10/31/2003 8:51:14 AM]


PGPfone

PGPfone
● Platform: Macintosh and Windows
● Description: Pretty Good Privacy Phone is free secure audio connection software for the
internet. It uses speech compression and strong cryptography protocols to give you the ability
to have a real-time secure telephone conversation via a modem-to-modem connection.
● Requirements (Mac): Fast modem: at least 14.4 Kbps V.32bis (28.8 Kbps V.34 recommended).
An Apple Macintosh with at least a 25MHz 68LC040 processor (PowerPC recommended),
running System 7.1 or above, Thread Manager 2.0.1, ThreadsLib 2.1.2, and Sound Manager
3.0. (These are available from Apple's FTP sites.)
● Requirements (Windows): Fast modem: at least 14.4 Kbps V.32bis (28.8 Kbps V.34
recommended). A multimedia PC running Windows 95 or NT, with at least a 66 MHz 486
CPU (Pentium recommended), sound card, microphone, and speakers or headphones.
● Contact: Jeffrey I. Schiller
Email: jis@mit.edu
WWW: http://web.mit.edu/network/pgpfone/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:11 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/pgpfone.html [10/31/2003 8:51:15 AM]


Speak Freely

Speak Freely
● Platform: Windows and Unix
● Description: Free "Internet Phone" software supporting voice mail, multicasting, encryption
and several coding methods. Includes 4 forms of data compression and encryption with DES,
IDEA and PGP. The Windows and Unix versions are compatible. You can designate a bitmap
file to be sent to users who connect so they can see who they're talking to. The Unix version
does not have the graphical user interface of the Windows edition, but supports all its
compression and encryption modes.
● More information: http://www.fourmilab.ch/netfone/windows/speak_freely.html

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:11 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/speak.freely.html [10/31/2003 8:51:15 AM]


Internet Phone from VocalTec

Internet Phone from VocalTec


● Platforms: IBM Compatible
● Description: Supports real-time conversations with Internet users by compressing speech.
Voice-activation feature and interactive display. Features an graphical interface and on-line
help. Up to date listing of all on-line users running Internet Phone. Join or create topics for
conversation with people from all over the globe. Supports private topics for private
conversations with family or with business associates.
● Requirements: 486SX PC - 25 MHZ, 8MB RAM (recommended)
An Internet Winsock 1.1 compatible TCP\IP connection (minimum connection: a 14,400 baud
modem SLIP\PPP connection)
Windows 3.1
Windows-compatible sound card
● Cost: $US59 + shipping. You can order on the internet: http://www.vocaltec.com/order.html
● More Information: WWW: http://www.vocaltec.com/
● Availability:
Demo version: ftp://ftp.vocaltec.com/pub/iphone09.exe
● Contact: VocalTec Inc.
157 Veterans Drive, Northvale, NJ 07647
Tel: 201-768-9400 Fax: 201-768-8893
E-mail: info@vocaltec.com

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:11 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/vocaltec.html [10/31/2003 8:51:16 AM]


WebPhone

WebPhone
● Platform: Windows
● Description: WebPhone provides telephone quality, real-time, full duplex, encrypted, point-to-
point voice communication over the Internet and other TCP/IP based networks. (More detail
provided on the NetSpeak WWW pages).
● Requirements: 80486DX-33 MHz running Windows 3.1 or higher, 4 MB of RAM, MCI
compliant sound card, Winsock 1.1 compliant stack, 14.4Kbps modem, VGA card capable of
displaying 256 colors. Full duplex audio card required for full duplex.
● Price: $49.95 (US)
● Availability: via the WWW: http://www.netspeak.com/getphone.html
● Contact: NetSpeak Corporation
902 Clint Moore Rd., Boca Raton, Fl. 33487, USA
Ph: +1-407-997-4001, Fax: +1-407-997-2401
Email: info@netspeak.com
WWW: http://www.netspeak.com/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:11 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/webphone.html [10/31/2003 8:51:17 AM]


WebTalk

WebTalk
● Platform: Windows 3.1/95
● Description: Full-duplex or half duplex, telephone-quality voice, supports many commercial
web browsers.
● Contact: Quarterdeck Corporation
13160 Mindanao Way, 3rd Floor, Marina Del Rey, CA 90292-9705, USA
Ph: +1-310-309-3700, Fax: +1-310-309-4217
Email: info@quarterdeck.com
WWW: http://www.quarterdeck.com/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:11 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Phone/webtalk.html [10/31/2003 8:51:19 AM]


AF version AF3R1

AF version AF3R1
● Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI
● Description: The AF System is a device-independent network-transparent system including
client applications and audio servers. With AF, multiple audio applications can run
simultaneously, sharing access to the actual audio hardware.

The AF3R1 distribution of AF includes server support for Digital RISC systems running
Ultrix, Digital Alpha AXP systems running OSF/1, SGI Indigo running IRIX 4.0.5, Sun
Microsystems SPARCstations running SunOS 4.1.3, and Sun Microsystems SPARCstations
running Solaris 2.3. The servers support audio hardware ranging from the built-in CODEC
audio on SPARCstations and Personal DECstations to 48 KHz stereo audio using the
DECaudio TURBOchannel module or the SPARCstation DBRI interface
● Availability: The source kit is distributed by anonymous ftp from
ftp://crl.dec.com/pub/DEC/AF
WWW: http://www.research.digital.com/CRL/projects/AF/home.html
● Contact: af-request@crl.dec.com

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:02 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/af.html [10/31/2003 8:51:20 AM]


Voice E-Mail from Bonzi Software

Voice E-Mail from Bonzi Software


● Description: Voice E-Mail is an extension to regular e-mail which allows recorded voice
messages to be transmitted in the same way as normal text messages. Voice E-Mail is available
in several forms: Voice E-Mail 3.0 for WinCIM, Voice E-Mail 3.0 for America Online, Voice
E-Mail 3.0 for Eudora, and Voice E-Mail 3.0 for Netscape. Voice E-Mail uses digital audio
and image compression technology to compress messages before transferring them through
CompuServe, America Online, and the Internet.
● Availability: Go to the Bonzi home page - http://www.bonzi.com/ - and follow the links to the
Internet Shopping Network's "Downloadable Software Division."
● Further Information: Bonzi Software
WWW: http://www.bonzi.com/
Email: info@bonzi.com
Fax 805-238-5798

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:03 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/bonzi.html [10/31/2003 8:51:21 AM]


MicNotePad Recording Software for Macs

MicNotePad Recording Software for Macs


● Platforms: Macintosh
● Description: MicNotePad is audio recording tool designed to improve dictation (a digital
replacement for the old-style mechnical tape systems used by typists). It uses the built-in
microphone or sound input port and the hard disk to record conversations or speech of arbitrary
length. Speech compression techniques are used to reduce the disk-space. Once it is recorded,
single keystrokes control playback while you type in your word processor.
● Contact: Nirvana Research
WWW: http://moof.com/nirvana/
Email: nirvana@got.net

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 19:04 25-Sep-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/micnotepad.html [10/31/2003 8:51:21 AM]


MixViews

MixViews
● Description: A Unix/X sound editor. Does waveform play/record, and cut/splice. Has various
filters, handles native file formats, FFT, LPC and more
● Availability: by anonymous ftp including SunOS 4 and IRIX 5 binaries.
ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:06 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/mixviews.html [10/31/2003 8:51:22 AM]


Network Audio System Release 1.1

Network Audio System Release 1.1


● Platforms: Various (includes SunOS, Solaris, SGI)
● Description: A device-independent mechanism for transferring, playing and recording audio
signals over a network. Has a range of features suited to networks.
● Cost: Free
● Availability: By anonymous ftp from
ftp://ftp.x.org:/contrib/audio/nas/netaudio-1.2.tar.gz
Also available in the same directory are document files and some sample sounds.

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:06 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/netaudio.html [10/31/2003 8:51:23 AM]


NIST Software - SPHERE and SCORE

NIST SPeech HEader REsources Package


(SPHERE)
● Description: Standard speech header software from the National Institute of Standards &
Technology (NIST). SPHERE headers represent information about sample frequency, sample
format, etc.
● Availability: By anonymous ftp from
Readme File
ftp://jaguar.ncsl.nist.gov/pub/sphere.README
Source Code
ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z

NIST Speech Recognition Scoring Package


(SCORE)
● Description: Software for scoring results of speech recognition systems from the National
Institute of Standards & Technology (NIST) .
● Availability: By anonymous ftp from
README File
ftp://jaguar.ncsl.nist.gov/pub/score.README
Source Code
ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:06 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/nist.html [10/31/2003 8:51:23 AM]


Sound Processing Kit

Sound Processing Kit


● Platforms: UNIX
● Description: Sound Processing Kit (SPKit) is an object-oriented class library for audio signal
processing. SPKit includes classes for various signal processing tasks and a way of
implementing sound processing algorithms in a simple object-oriented manner. Sound
Processing Kit is implemented in C++ and is designed to be portable. The current version
requires a bare-bones C++ 2.0 compatible compiler (templates and exceptions are not needed).
ANSI C standard libraries are required. SPKit includes classes for
❍ Sound input and output

❍ Basic signal processing

❍ Dynamics processing (compressor, gating etc)

❍ Filtering

❍ Delay and reverberation

❍ Distortion

❍ Signal routing

● Availability:
Full documentation on the WWW:
http://www.music.helsinki.fi/research/spkit/documentation/SPKit.html
Software distribution:
http://www.music.helsinki.fi/research/spkit/distribution/spkit.tar.Z
● Contact: Kai Lassfolk
University of Helsinki Music Research Laboratory
Email: spkit@elisir.helsinki.fi

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:06 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/spkit.html [10/31/2003 8:51:24 AM]


TCPplay

TCPplay
● Description: TCPPlay lets you use your mac as an audio server for your Unix box. Provided
with source code. Written by Bill Stafford, Rich Tsoi and Malcolm Slaney.
● Availability: Anonymous ftp from
ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 22:28 27-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/AudioSoftware/tcpplay.html [10/31/2003 8:51:25 AM]


Auditory Modeller 1

Auditory Modeller 1
● Description: John Holdsworth's implementation of a gammatone filter bank and Roy
Patterson's spiral model, in C (with X-window display).
● Availability: By anonymous ftp from
ftp://ftp.mrc-apu.cam.ac.uk/pub/aim

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/HumanAudio/auditory.model1.html [10/31/2003 8:51:25 AM]


Auditory Modeller 2

Auditory Modeller 2
● Description:Lowel O'Mard's implementation of peripheral filtering, Ray Meddis's hair cell
model and other stuff in C (as a library of routines).
● Availability: By anonymous ftp from
ftp://suna.lut.ac.uk/public/hulpo/lutear

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/HumanAudio/auditory.model2.html [10/31/2003 8:51:26 AM]


Auditory Toolbox for Matlab

Auditory Toolbox for Matlab


● Description: This toolbox provides extensions to Matlab which are useful to people interested
in auditory/cochlear modeling. [Matlab is described is the previous section.] This toolbox has
been tested on both Macintosh and Unix computers. It includes the following major models:
❍ Lyon's Passive Long Wave Cochlear Model (our conventional model)

❍ Patterson-Holdsworth ERB Filter bank with Meddis Hair cell

❍ Seneff's Auditory Model (Stages I and II)

❍ MFCC (Mel-scale frequency cepstral coefficients from the ASR world)

❍ Spectrogram

❍ Correlogram generation and pitch modeling

❍ Simple vowel synthesis

● Availability: From Malcolm Slaney home page and by anonymous FTP:


ftp://ftp.apple.com/pub/malcolm
The following files are available:
❍ AuditoryToolbox.mif.Z

❍ AuditoryToolbox.psc.Z

❍ AuditoryToolbox.sea.hqx

❍ AuditoryToolbox.tar

❍ AuditoryToolbox.tar.Z

The ".mif.Z" file is a Unix compressed version of the FrameMaker documentation. The
".psc.Z" file is a Unix compressed version of the Postscript documentation. The ".tar" and
".tar.Z" files are Unix TAR archives containing all of the m-functions and C-MEX source
code. Finally, the ".sea.hqx" file is a Macintosh self-extracting archive that has been encoded
using BinHex. There is precompiled version of the three MEX function for the Macintosh.
● Misc: Our lawyers ask you to remind you that there is no warranty. We've done some testing
but we undoubtably missed things.
● Contact: Malcolm Slaney, Interval Resarch.
Email: malcolm@interval.com
WWW: http://www.interval.com/~malcolm/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 22:37 27-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/HumanAudio/auditory.tlbx.html [10/31/2003 8:51:27 AM]


Human Audio Perception Document

Human Audio Perception Document


● Description: Document prepared by Argiris Kranidiotis on the human audio perception
system. It lists a number of references, gives plenty of numbers and some equations.
● Availability: by anonymous ftp from the comp.speech archive site
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPerception
● Contact: Argiris A. Kranidiotis
University Of Athens, Informatics Department
email: akra@zeus.di.uoa.ariadne-t.gr

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/HumanAudio/human.audio.html [10/31/2003 8:51:27 AM]


BEEP dictionary

BEEP dictionary
● Description: Phonemic transcriptions of over 250,000 English words. (British English
pronunciations)
● Availability: By anonymous ftp:
BEEP dictionary README file
svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.7.README
BEEP Dictionary (1.1M)
svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep.tar.gz

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 11:23 12-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/beep.html [10/31/2003 8:51:28 AM]


CMU dictionary

CMU dictionary
● Description: Phonemic transcriptions of 100,000 words with American English pronunciation.
● Availability - WWW: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
● Availability - ftp: By anonymous ftp from the directory
ftp://ftp.cs.cmu.edu/project/fgdata/dict/
with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:09 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/cmu-dict.html [10/31/2003 8:51:29 AM]


CUVOLAD dictionary (Oxford Dictionary)

CUVOLAD dictionary (Oxford Dictionary)


● Description: Computer Usable Version of the Oxford Advanced Learner's Dictionary
containing 70,000+ entries. Has British English pronunciations and parts of speech.
● Availability: Anonymous ftp ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/
Documentation: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 01:56 16-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/cuvolad-dict.html [10/31/2003 8:51:29 AM]


Comprehensive Word List

Comprehensive Word List


● Description: A comprehensive word list which should contain most common American words,
abbreviations, hyphenations, and even incorrect spellings. The word lists were compiled from a
number of sources: commercial news services, UseNet news postings, existing dictionaries,
name lists, company lists, UNIX man pages, project Gutenberg's E-texts, project Wordnet,
received mailings, etc. The current size is 460,000 words.
● Availability: anonymous ftp ftp://wocket.vantage.gte.com/pub/standard_dictionary
Note 1: There seems to be some sort of network problem reaching the server.
Note 2: There is a README file which explains the file formats.

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 01:49 16-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/dict.html [10/31/2003 8:51:30 AM]


http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/eat.html

EAT: Edinburgh Associative Thesaurus


● Description: A set of word association norms showing the counts of word association as
collected from subjects.
● Availability: Source and WWW interactive versions
Interactive version
Provided by Computing and Information Systems Department (CISD) of Rutherford
Appleton Laboratory, UK
http://www.cis.rl.ac.uk/proj/psych/eat.html
Set of word association norms
ftp directory. 6 MB
http://www.cis.rl.ac.uk/proj/psych/eat/eat/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 02:05 16-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/eat.html [10/31/2003 8:51:31 AM]


Homophone List

Homophone List
● A list of homophones in General American English is available by anonymous FTP from the
comp.speech archive site:
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homophones-1.01.txt

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:09 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/homophone.html [10/31/2003 8:51:32 AM]


Moby Lexical Resources

Moby Lexical Resources


● Description: A set of lexical resources compiled by Grady Ward.
3449 Martha Ct., Arcata, CA 95521-4884, USA
Email: grady@netcom.com OR grady@northcoast.com
● Availability: Mirrored by Malcolm Crawford (m.crawford@dcs.shef.ac.uk) at the Institute for
Language Speech and Hearing, the University of Sheffield.
WWW: http://www.dcs.shef.ac.uk/research/ilash/Moby/
FTP: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
● Contents:
Moby Hyphenator: mhyph.tar.Z
185,000 entries fully hyphenated. 980kB.
Moby Language: mlang.tar.Z
Word lists in five major languages. 2.3MB.
Moby Part-of-Speech: mpos.tar.Z
230,000 entries with part(s) of speech listed in priority order. 1.2MB.
Moby Pronunciator: mpron.tar.Z
175,000 entries fully International Phonetic Alphabet coded. 3.1MB.
Moby Shakespeare: mshak.tar.Z
The complete unabridged works of Shakespeare. 2.3.MB.
Moby Thesaurus: mthes.tar.Z
30,000 root words, 2.5 million synonyms and related words. 12MB.
Moby Words: mwords.tar.Z
610,000+ words and phrases. 4.0MB.

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 10:18 07-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/moby.html [10/31/2003 8:51:32 AM]


MRC Psycholinguistic Database

MRC Psycholinguistic Database


● Description: A machine usable dictionary containing over 150000 words with up to 26
linguistic and psycholinguistic attributes for each (e.g. pronunciation, part of speech, word
frequency). Psycholinguistic Database was the basis for the "Oxford Psycholinguistic
Database" available for Apple Macs from Oxford University Press.
● Availability: Several versions with different formats:
Interactive Version of MRC Psycholinguistic Database
Produces lists of words meeting user-definable selection criteria. Provided by the Dept.
of Psychology, University of Western Australia.
http://www.psy.uwa.edu.au/uwa_mrc.htm
ftp'able MRC Psycholinguistic Database
Approximately 12M of data.
ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/
README: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.
Information: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 02:49 16-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/mrc.html [10/31/2003 8:51:33 AM]


WordNet

WordNet
● Description: WordNet is an on-line lexical reference system in which English nouns, verbs,
adjectives and adverbs are organized into synonym sets, each representing one underlying
lexical concept. Different relations link the synonym sets.
WordNet was developed in the Cognitive Science Laboratory at Princeton University under the
direction of Professor George Miller.
● Availability:
WWW Interface
http://www.cogsci.princeton.edu/~wn/w3wn.html
Source Distributions
Unix (9.1MB), PC (5.8MB), Macintosh (7.5MB), Prolog (database only, 4.2MB).
ftp://clarity.princeton.edu/pub/wordnet/
Extended interfaces developed by WordNet users (for X, Lisp etc) are listed in the WordNet
home page.
● Further information: Email: wordnet@princeton.edu
WWW: WordNet home page: http://www.cogsci.princeton.edu/~wn/
README: ftp://clarity.princeton.edu/pub/wordnet/README
Publications: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 02:19 16-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/wordnet.html [10/31/2003 8:51:34 AM]


Dictionaries on the WWW

Dictionaries on the WWW


For a while, there was a range of dictionaries and other lexical resources on the WWW and elsewhere
on the Internet. However, due to copyright reasons, fewer sites are publishing dictionary information.
When last checked, the following sites provide dictionaries or links to dictionaries on the net:

CMU Dictionary
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Institute of Phonetic Sciences, Amsterdam
Electronic dictionaries, including French, Norwegian Swahili and English.
http://fonsg3.let.uva.nl/Other_pages.html
1913 Webster's Revised Unabridged Dictionary
Available as a searchable HTML form at the University of Chicago ARTFL project site, and as
a tagged working file and downloadable version (45MB) of the HTML at Project Gutenberg.
Martin Ramsch's Englisch-Worterbucher aller Art
Lists of on-line dictionaries, translation dictionaries, technical dictionaries, etc.
http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/englisch.html
Galaxy's list of dictionaries etc.
A comprehensive list of dictionaries, acronym lists, translation resources, and a Thesaurus.
http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-Information/Dictionaries-
etc.html
Webster's dictionary online
http://c.gp.cs.cmu.edu:5103/prog/webster

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 09:56 11-Mar-1997

http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/www-dict.html [10/31/2003 8:51:34 AM]


International Phonetic Alphabet

International Phonetic Alphabet


● Description: The International Phonetic Association (http://www.arts.gla.ac.uk/IPA/ipa.html)
defines the International Phonetic Alphabet. It is a standard set of symbols for transcribing the
sounds of spoken languages. The full chart of IPA symbols is published on the International
Phonetic Association WWW site. Also provided are charts for consonants, vowels, tones and
accents, suprasegmentals, diacritics and other symbols. A cassette of sounds is available: see
http://www.phon.ucl.ac.uk/home/wells/cassette.htm

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:53 08-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Fonts/ipa.html [10/31/2003 8:51:35 AM]


WWW: Phonetic Fonts and Examples Online

WWW: Phonetic Fonts and Examples Online


George L. Dillon's list of phonetic resources
[http://weber.u.washington.edu/~dillon/PhonResources.html]
Vowel sounds of American English
Examples of standard American vowels along with the IPA phonetic symbols and links
to recordings.
http://weber.u.washington.edu/~dillon/vowels.html
Consonant sounds of English
Examples of consonants along with the IPA phonetic symbols and links to recordings.
http://weber.u.washington.edu/~dillon/consonants.html
Vowel Quadrilaterals for American and British English
Charts and audio.
http://weber.u.washington.edu/~dillon/newstart.html
IPA-ASCII
A scheme for representing IPA transcriptions in ASCII for use in Usenet articles and
email.
http://weber.u.washington.edu/~dillon/ipaascii.html
Some things about studying Speech
Information on speech physiology, acoustic phonetics, speech perception, speech recognition
and voice recognition.
http://www.ccp.uchicago.edu/grad/Francis_Alex/speech.html

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 03:39 11-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Fonts/online.html [10/31/2003 8:51:36 AM]


Summer Institute of Linguistics IPA Fonts

Summer Institute of Linguistics IPA Fonts


● Platform: Apple Macintosh and Mircosoft Windows
● Description: International Phonetic Alphabet (IPA) fonts are available as freeware from the
Summer Institute of Linguistics (SIL). The SIL Encore IPA Fonts are a set of scalable IPA
fonts containing the full International Phonetic Alphabet with 1990 Kiel revisions. Three
typefaces are included: SIL Doulos (similar to Times), SIL Sophia (similar to Helvetica), and
SIL Manuscript (monowidth). Each font contains all the standard IPA discrete characters and
non-spacing diacritics as well as some suprasegmental and punctuation marks. Each font
comes in both PostScript Type 1 and TrueType formats.
● Availability: Via the WWW and Gopher:
❍ WWW: http://www.sil.org/

❍ Gopher: gopher://gopher.sil.org/11/gopher_root/computing/software/fonts/

❍ Ftp for Windows: ftp://ftp.sil.org/fonts/win/silip12a.exe

❍ Ftp for Mac: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx

Also available through the SIL email server. Send either of the following commands to
MAILSERV@sil.org.
Windows:
SEND/MODE=BLOCK/ENCODING=UUENCODE
[FTP.FONTS.WIN]SILIP12A.EXE
Mac:
SEND [FTP.FONTS.MAC]SILIPA12.SEA_HQX
Finally, they are available on diskette from the address below. $US5 to cover the cost of
shipping.
● Contact: International Academic Bookstore
Summer Institute of Linguistics
7500 W. Camp Wisdom Road, Dallas, TX 75236 U.S.A.
Ph: 214-709-2404, Fax: 214-709-2433
e-mail: academic.books@sil.org
WWW: http://www.sil.org/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:22 08-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Fonts/sil.ipa.html [10/31/2003 8:51:38 AM]


Phonetic Fonts for TeX and LaTeX

Phonetic Fonts for TeX and LaTeX


Linguistics/Tex mailing list
ling-tex@ifi.uio.no
Subscription method unknown.
TIPA
Created by Rei Fukui: fkr@tooyoo1.l.u-tokyo.ac.jp.
Source: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
Postscript manual: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
Compressed postscript manual: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
WSUIPA: Washington State University International Phonetic Alphabet fonts
A basic WSUIPA font contains 128 phonetic characters and/or diacritics in five different point
sizes (8, 9, 10, 11 and 12) and in three typefaces (roman, slanted and bold extended). Each size
and typeface includes a TFM (TeX Font Metric) file and its related GF, PK or PXL file. A
macro package and manual are provided. Apparently LaTeX 2.09 compatible - not LaTeX 2e
compliant.
Available from ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/
OR from CTAN-ftp-archives: e.g. ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 13:23 08-Aug-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Fonts/tex.html [10/31/2003 8:51:39 AM]


Yamada Language Center

Yamada Language Center


● Platform: Apple Macintosh and Mircosoft Windows
● Description: The Yamada Language Center maintains an archive of fonts to assist users who
wish to display or type non-English fonts on their computers. Their WWW and ftp sites
include five International Phonetic Alphabet fonts (or near IPA). They also have fonts for over
40 languages (American Sign Language, Arabic, Armenian, Bengali, Burmese, Celtic,
Cherokee......).
● Availability: :
WWW Font List
http://babel.uoregon.edu/yamada/fonts.html
Windows Fonts
http://babel.uoregon.edu/yamada/winfonts.html
IPA Fonts
http://babel.uoregon.edu/yamada/fonts/phonetic.html
ftp site
ftp://yftp@www-vms.uoregon.edu/fonts/
● Contact: Yamada Language Center, University of Oregon

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 00:07 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Fonts/yamada.html [10/31/2003 8:51:40 AM]


The vOICe

The vOICe
● Description: Peter Meijer's Java applet/application for sound analysis and synthesis.
❍ Platform: All (where Java VM available)

❍ Interactive spectrographic synthesis: draw your own sound

❍ Image sonification

❍ Mathematical function sonification

❍ Spectrographic sound analysis (Fourier, spectrogram)

❍ Vision substitution research

● Contact: Peter Meijer

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: REVISION-DATE

http://mi.eng.cam.ac.uk/comp.speech/Section1/Misc/javoice.html [10/31/2003 8:51:41 AM]


The Learning Company's Language Training

The Learning Company's Language Training


● Platform: Windows and Macintosh
● Description: Foreign-language training software for Spanish, French, German, Italian,
Japanese, and English. In the Windows version for English, speech-recognition technology is
used to help users improve accents.
● Contact: The Learning Company
Ph: (800) 852-2255
Email: webmaster@learningco.com
WWW: http://www.learningco.Inter.net/foreign.html

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 20:23 18-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Misc/learningco.html [10/31/2003 8:51:42 AM]


Wildfire - an Electronic Assistant

Wildfire - an Electronic Assistant


● Platform: ?
● Description: Wildfire is a phone-based electronic assistant. Functions include:
❍ Screens, routes, and announces incoming calls.

❍ Contact list with voicedialing.

❍ Schedules and reminders for follow-up calls and action items.

❍ Messaging and advanced voicemail features.

● Contact: Wildfire Communications, Inc.


20 Maguire Road, Lexington, MA 02173 USA
Ph: +1-617-674-1500, Fax: 617-674-1501
Demo line: 1-800-WILDFIRE
Email: info@wildfire.com
WWW: http://www.wildfire.com/

Back to Q1.11 of Section 1 of the comp.speech FAQ Home Page.

Administrivia, Copyright, Submit Information : Last Revision: 20:41 18-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section1/Misc/wildfire.html [10/31/2003 8:51:42 AM]


Q4.1: NLP References and Books

Q4.1: NLP References and Books


Take a look at the FAQ for the "comp.ai" newsgroup as it also includes some useful references.

● James Allen: Natural Language Understanding, (Benjamin/Cummings Series in Computer


Science) Menlo Park: Benjamin/Cummings Publishing Company, 1987.
❍ This book consists of four parts: syntactic processing, semantic interpretation, context

and world knowledge, and response generation.


● G. Gazdar and C. Mellish, Natural Language Processing in Prolog, Addison Wesley, 1989
● G. Gazdar and C. Mellish, Natural Language Processing in Lisp, Addison Wesley, 1989
● G. Gazdar and C. Mellish, Natural Language Processing in Pop11, Addison Wesley, 1989
❍ Emphasis on parsing, especially unification-based parsing, lots of details on the lexicon,

feature propagation, etc. Fair coverage of semantic interpretation, inference in natural


language processing, and pragmatics; much less extensive than in Allen's book, but
more formal. There are three versions, one for each programming language listed
above, with complete code.
● Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1 and 2. New York: John Wiley
& Sons, 1990.
❍ There are articles on the different areas of natural language processing which also give

additional references.
● Paris, Ce'cile L.; Swartout, William R.; Mann, William C.: Natural Language Generation in
Artificial Intelligence and Computational Linguistics. Boston: Kluwer Academic Publishers,
1991.
❍ The book describes the most current research developments in natural language

generation and all aspects of the generation process are discussed. The book is
comprised of three sections: one on text planning, one on lexical choice, and one on
grammar.
● Readings in Natural Language Processing, ed by B. Grosz, K. Sparck Jones and B. Webber,
Morgan Kaufmann, 1986
❍ A collection of classic papers on Natural Language Processing. Fairly complete at the

time the book came out (1986) but now seriously out of date. Still useful for ATN's, etc.
● Klaus K. Obermeier, Natural Language Processing Technologies in Artificial Intelligence: The
Science and Industry Perspective, Ellis Horwood Ltd, John Wiley & Sons, Chichester,
England, 1989.

The following are extensive bibliographies related to NLP:

● Computational Parsing : Syntactic Analysis, Semantic Analysis, Semantic Interpretation,


Parsing Algorithms, Parsing Strategies : BIBLIOGRAPHY, by Conrad F. Sabourin 1994, 2
volumes, 1029p, ISBN 2-921173-02-6, INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal,
H3X 3T4, Canada.
● Computational Text Understanding : Natural Language Programming, Argument Analysis :
BIBLIOGRAPHY, by Conrad F. Sabourin 1994, 657p, ISBN 2-921173-06-9, INFOLINGUA
inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.

http://mi.eng.cam.ac.uk/comp.speech/Section4/Q4.1.html (1 of 2) [10/31/2003 8:51:44 AM]


Q4.1: NLP References and Books

See also: http://gomer.mlink.net/infolingua.html


● Computational Text Generation : Generation from data or Linguistic Structure, Text Planning,
Sentence Generation, Explanation Generation : BIBLIOGRAPHY, by Conrad F. Sabourin with
a survey article by Mark T. Maybury 1994, 649p, ISBN 2-921173-07-7, INFOLINGUA inc.,
P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html
● Natural Language Processing : Interfaces to Databases, to Expert Systems, to Robots, to
Operating Systems, and to Question-Answering Systems : BIBLIOGRAPHY, by Conrad F.
Sabourin, 1994, 2 volumes, 847p, ISBN 2-921173-08-5 INFOLINGUA inc., P.O. Box 187
Snowdon, Montreal, H3X 3T4, Canada
See also: http://gomer.mlink.net/infolingua.html

Journals
The major journals of the field are

● Computational Linguistics and Cognitive Science for the artificial intelligence aspects,
● Cognition for the psychological aspects,
● Language and Linguistics and Philosophy and Linguistic Inquiry for the linguistic aspects.
● Artificial Intelligence occasionally has papers on natural language processing.

Conferences
The major NLP conferences are

● ACL: held annually


● COLING: held biannually

Most AI conferences have a NLP track; AAAI, ECAI, IJCAI and the Cognitive Science Society
conferences usually interesting for NLP. CUNY is an important psycholinguistic conference. Other
conferences include NELS, the conference of the Chicago Linguistic Society (CLS), WCCFL, LSA,
the Amsterdam Colloquium, and SALT.

Back to Section 4 of the comp.speech FAQ Home Page.


Jump to [Q4.2]

Administrivia, Copyright, Submit Information : Last Revision: 00:18 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section4/Q4.1.html (2 of 2) [10/31/2003 8:51:44 AM]


Q4.2: NLP Software

Q4.2: NLP Software


Natural Language Software Registry (NLSR) - NLP
Tools
● The Natural Language Software Registry is available from the German Research Institute for
Artificial Intelligence (DFKI) in Saarbrucken. Its purpose is to facilitate the exchange and
evaluation of natural language processing software within the research community. To this
end, the NLSR is cataloging natural language software projects, both commercial and non-
commercial. The new updated and enlarged version contains more than 100 descriptions of
natural processing software. Registry listings include:
❍ speech signal processors, such as the Computerized Speech Lab (Kay Elemetrics)

❍ morphological analyzers, such as PC-KIMMO (Summer Institute for Linguistics)

❍ parsers, such as Alveytools (University of Edinburgh)

❍ semantic and pragmatic analyzer, such as NLL (University of the Saarland, Germany)

❍ generation programs, such as FUF (Ben Gurion University of the Negev)

❍ knowledge representation systems, such as Rhet (University of Rochester)

❍ multicomponent systems, such as ELU (ISSCO), PENMAN (ISI), Pundit (UNISYS),

SNePS (SUNY Buffalo),


❍ NLP-Tools, such as GULP (University of Georgia) or Linguist (Kansai Research

Laboratory)
❍ applications programs (misc.)

● If you have developed a piece of software for natural language processing that other
researchers might find useful, you can include it by returning the questionnaire available from
the sources below.
● ftp://ftp.dfki.uni-sb.de/pub/registry
● e-mail: registry@dfki.uni-sb.de
● Natural Language Software Registry
Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
Stuhlsatzenhausweg 3
D-66123 Saarbruecken
Germany
● Other ftp sites are
ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry

Part of Speech Tagger


● Description: A rule-based part of speech tagger developed by Eric Brill.
● Availability: The tagger software, about 10 descriptive papers and related data are available by
anonymous ftp from

http://mi.eng.cam.ac.uk/comp.speech/Section4/Q4.2.html (1 of 2) [10/31/2003 8:51:45 AM]


Q4.2: NLP Software

ftp://ftp.cs.jhu.edu/pub/brill/

Back to Section 4 of the comp.speech FAQ Home Page.


Jump to [Q4.1]

Administrivia, Copyright, Submit Information : Last Revision: 01:36 10-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/Section4/Q4.2.html (2 of 2) [10/31/2003 8:51:45 AM]


comp.speech: Odds 'n Ends

Odds 'n Ends


This page provides the WWW links which didn't fit anywhere else in the comp.speech FAQ, didn't
have time to make a proper entry (yet) or something else...

Speech Applications Project


The Speech Applications Project at Sun Microsystems Laboratories (Chelmsford,
Massachusetts) focuses on tools for building speech applications, management of spoken
language discourse, prototype application development, and the definition of effective speech
user interfaces. The project team has built an experimental framework for speech applications,
called SpeechActs.
Signal Processing and Interpretation Lab at Boston University
"It's not just what you say, but also how you say it." Research on general problems of signal
recognition, compression and generation, but the current research focus of the lab is on speech
and language processing applications. The general goal is to improve computer understanding
and generation of speech/language for a variety of applications from speech transcription to
human-computer interaction by voice.
Fourth International Conference on Spoken Language Processing
ICSLP'96 will be held in the Wyndham Franklin Plaza in Philadelphia from October 3-6, 1996.
It will bring together engineers, linguists, psychologists, clinicians, manufacturers, and anyone
with an interest in research and development of spoken language processing by both humans
and machines.
SpeechTEK '97 Conference and Exhibition
The SpeechTEK '97 Conference and Exhibition provides a marketplace for speech technology
buyers and sellers.
The New York Hilton and Towers
1335 Avenue of the Americas, New York City
Conference: Sept 30 - Oct 1, 1997
Institute for Signal and Information Processing (ISIP)
Multidisciplinary program to develop next generation information processing techniques
drawing upon a wide range of research experience in areas such as signal processing,
communications, natural language, database query, intelligent systems, and discrete controls.
Verbmobil
Verbmobil is a long-term project which aims "to give Germany an international top position in
language technology and its economical application in the next millenium by cooperation and
concentration of as many as possible specialists from industry and science. The long-sighted
aim is the development of a mobile translation system for the translation of spontaneous
speech in face-to-face situations."
Consortium for Lexical Research
An Archive of Sharable Natural Language Resources including machine-readable dictionaries
and other lexical resources.
Computation and Language E-Print Archive
A fully automated electronic archive and distribution server for papers on computational
linguistics, natural-language processing, speech processing, and related fields.

http://mi.eng.cam.ac.uk/comp.speech/odds.n.ends.html (1 of 2) [10/31/2003 8:51:46 AM]


comp.speech: Odds 'n Ends

Head-Driven Phrase Structure Grammar


HPSG home page at the Ohio State University offers current information relating to various
aspects of the grammar formalism and linguistic theory of HPSG.
Real-time Visual Displays for Professional Voice Developmen
Department of Electronics, York University, England. Papers, reports and thesis on the
analysis and tuition of speech and singing, for developing voice users, including methods such
as speech visualization. Includes source code and a binary executable for a real time wide band
spectrogram on SGIs.
VMB/60: Voice Message Bank
A solid state record-and-playback device that attaches to the modular handset connector of a
telephone. It serves as a one-minute "voice processor".
From A&B Design
140 San Lazaro Ave., Sunnyvale, CA 94086, USA
Ph: +1-408-749-8037, Fax: 408-749-8038
E-Mail: vmb60@best.com
WWW: http://www.best.com:80/~vmb60/

Back to the comp.speech FAQ Home Page


Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6
Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 18:24 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/odds.n.ends.html (2 of 2) [10/31/2003 8:51:46 AM]


ICSS system from IBM

ICSS system from IBM


● Description: A large vocabulary, speaker independent, continuous speech system which runs
under Windows, OS/2, and AIX.
● Requirements: Soundboard (e.g. Soundblaster)
● Price: $US319
● Contact:
A&G Graphics Interface
ICSS Reseller
51 Gore Street, Cambridge, MA, 02139, USA
(617) 492-0120

Last Revision: 00:26 19-Mar-1996

http://mi.eng.cam.ac.uk/comp.speech/Section6/Recognition/icss.html [10/31/2003 8:51:50 AM]


comp.speech WWW Site

Comp.Speech Frequently Asked


Questions
The Frequently Asked Questions (FAQ) is a regular posting to comp.speech which attempts to answer
some of the regular questions in the comp.speech newsgroup. It covers speech synthesis, speech
recognition, speech coding and a range of related material. It contains lists of speech technology
software and hardware, including commerical products, public domain and freeware software, plus it
contains over 500 links to speech technology sites and software.

The FAQ is not meant to discuss any topic exhaustively. It will hopefully provide readers with
pointers on where to find useful information, especially material available on the Internet.

If you have not already read the Usenet introductory material posted to news.announce.newusers,
please do. For help with FTP (file transfer protocol) look for a regular posting of anonymous FTP
FAQ in comp.misc, comp.archives.admin or news.answers.

This FAQ is posted every 4 weeks to comp.speech, comp.answers and news.answers.

It is also available on the World Wide Web:

● Australia: http://www.speech.su.oz.au/comp.speech/
● Britain: http://svr-www.eng.cam.ac.uk/comp.speech/
● Japan: http://www.itl.atr.co.jp/comp.speech/
● USA: http://www.speech.cs.cmu.edu/comp.speech/

Or by anonymous ftp from the comp.speech archive site:

● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete

Or from the news.answers ftp site (and its mirrors):

● ftp://rtfm.mit.edu/pub/usenet/comp.speech/*

Or by sending email to mail-server@rtfm.mit.edu with the following line in the body of the message:

● send usenet/news.answers/comp-speech-faq/*

If you only have email access to the internet, then I suggest you obtain the Internet-by-email guide.
Send email to mail-server@rtfm.mit.edu with the following line in the body of the message:

● send usenet/news.answers/internet-services/access-via-email

http://mi.eng.cam.ac.uk/comp.speech/FAQ.html (1 of 3) [10/31/2003 8:51:54 AM]


comp.speech WWW Site

Admin
Minor changes each month. Thanks to all the companies and individuals who send in information.

Acknowledgements
Hundreds of people and companies have made contributions to the comp.speech FAQ over the last
few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo
who have provided a wide range of information and assistance. Tony Robinson also maintains the
comp.speech ftp site which is an excellent resource for all people working with speech technology. I
am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for
supporting the FAQ on their WWW sites.

Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the information presented
here, the author assumes no responsibility for errors or omissions, or for damages resulting from the
use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as representing the views or
products of my employer, Sun Microsystems, Inc.

Copyright and Reproduction


Copyright (c) 1994-6 by Andrew Hunt, all rights reserved.
The comp.speech FAQ posting may not be distributed for financial gain.
The comp.speech FAQ posting may not be included in any collections or compilations without
express permission from the author.
The comp.speech FAQ posting may be posted to any USENET newsgroup, on-line service, or BBS as
long as it is posted in its entirety with this copyright statement, and that a current version is always
maintained.
[Note: hyperlinks to the comp.speech WWW pages are encouraged.]

Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by

Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA

http://mi.eng.cam.ac.uk/comp.speech/FAQ.html (2 of 3) [10/31/2003 8:51:54 AM]


comp.speech WWW Site

Ph: (508) 442 2681 Fax: (508) 250 5067


andrew.hunt@east.sun.com

Last Revision: 18:41 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/FAQ.html (3 of 3) [10/31/2003 8:51:54 AM]


comp.speech FAQ - Weekly Reminder

Comp.Speech FAQ Weekly Reminder


A Frequently Asked Questions (FAQ) posting is available for the comp.speech newsgroup. It covers a
range of speech technology issues, provides information on over 200 speech technology products,
software packages and resources, and includes links to over 500 speech locations on the internet.
Please check the FAQ before posting a request for information. The list of software and products and
the FAQ contents are included below.

The FAQ is posted every 4 weeks to comp.speech, comp.answers & news.answers. This reminder is
posted weekly to comp.speech.

The best way to read the comp.speech FAQ in on the World Wide Web:

● Australia: http://www.speech.su.oz.au/comp.speech/
● UK: http://svr-www.eng.cam.ac.uk/comp.speech/
● Japan: http://www.itl.atr.co.jp/comp.speech/
● USA: http://www.speech.cs.cmu.edu/comp.speech/

It is also available for ftp from the comp.speech archive site:

● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete

Or from the news.answers ftp site (and its mirrors):

● ftp://rtfm.mit.edu/pub/usenet/comp.speech/*

Or by sending email to mail-server@rtfm.mit.edu with the following line in the body of the message:

● send usenet/news.answers/comp-speech-faq/*

The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by

Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681
andrew.hunt@east.sun.com

Last Revision: 18:02 10-Jun-1996

http://mi.eng.cam.ac.uk/comp.speech/Reminder.html [10/31/2003 8:51:55 AM]

También podría gustarte