Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Welcome to the comp.speech Frequently Asked Questions WWW site. This site provides a range of
information on speech technology, including speech synthesis, speech recognition, speech coding, and
related material. The information is regularly posted to the comp.speech newsgroup as the
"comp.speech FAQ" posting. This site is mirrored at several other WWW sites around the world
(Australia, UK, Japan and USA) and the information is also available in a plain text format.
There are 250 comp.speech WWW pages and they include over 500 hyperlinks to speech technology
web sites, ftp servers, mailing lists, and newsgroups.
Contents
SpeechLinks: Speech Technology Hyperlinks Pages
Table Of Contents
List Of Software/Hardware/Resources
Update Times
Availability
Odds 'n Ends
Admin
Minor changes each month. Thanks to all the companies and individuals who send in information.
Acknowledgements
Hundreds of people and companies have made contributions to the comp.speech FAQ over the last
few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo
who have provided a wide range of information and assistance. Tony Robinson also maintains the
comp.speech ftp site which is an excellent resource for all people working with speech technology. I
am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for
supporting the FAQ on their WWW sites.
Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the information presented
here, the author assumes no responsibility for errors or omissions, or for damages resulting from the
use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as representing the views or
products of my employer, Sun Microsystems, Inc.
You may make links to the documents, but you may not make copies without permission of the
author.
Note: hyperlinks to the comp.speech WWW pages are encouraged.
Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by
Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681
andrew.hunt@east.sun.com
Availability
comp.speech FAQ/WWW
The comp.speech FAQ is available in two forms: text for posting to newsgroup and availability by ftp,
and HTML for the WWW. The original was the text version, and since September 1994 both WWW
and text versions have been supported. The WWW version is now the master version.
WWW Availability
The WWW version of the comp.speech FAQ is mirrored at a number of web sites.
Text by email
Finally, the text version can be obtained by sending email to mail-server@rtfm.mit.edu with the
following line in the body of the message:
send usenet/news.answers/comp-speech-faq/*
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 1/3
From: andrew.hunt@east.sun.com (Andrew Hunt)
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Followup-To: comp.speech
Organization: Speech Applications Group, Sun Microsystems Laboratories
Summary: Information on Speech Technology
Approved: news-answers-request@MIT.Edu
Archive-name: comp-speech-faq/part1
Last-modified: 1997/09/06
URL: http://www.speech.su.oz.au/comp.speech/
[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]
If you have not already read the Usenet introductory material posted
to news.announce.newusers, please do. For help with FTP (file transfer
protocol) look for a regular posting of anonymous FTP FAQ in
comp.misc, comp.archives.admin or news.answers.
* Australia: http://www.speech.su.oz.au/comp.speech/
* Britain: http://svr-www.eng.cam.ac.uk/comp.speech/
* Japan: http://www.itl.atr.co.jp/comp.speech/
* USA: http://www.speech.cs.cmu.edu/comp.speech/
* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete
* ftp://rtfm.mit.edu/pub/usenet/comp.speech/*
* send usenet/news.answers/comp-speech-faq/*
If you only have email access to the internet, then I suggest you
obtain the Internet-by-email guide. Send email to
mail-server@rtfm.mit.edu with the following line in the body of the
message:
* send usenet/news.answers/internet-services/access-via-email
Admin
Minor changes each month. Thanks to all the companies and individuals
who send in information.
Acknowledgements
Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any
express or implied warranties. While every effort has been taken to
ensure the accuracy of the information presented here, the author
assumes no responsibility for errors or omissions, or for damages
resulting from the use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as
representing the views or products of my employer, Sun Microsystems,
Inc.
Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a
volunteer basis by
Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681 Fax: (508) 250 5067
andrew.hunt@east.sun.com
___________________________________________________________________________
comp.speech FAQ
Table of Contents
+ List Of Software/Hardware
+ Update Times
+ Availability
* SpeechLinks: General
* Q1.1: What is comp.speech?
* Q1.2: comp.speech ftp site
* Q1.3: Common abbreviations and jargon
* Q1.4: Related newsgroups and mailing lists
* Q1.5: Associations, publications and conferences
* Q1.6: Handicap Aids
* Q1.7: Speech Databases
* Q1.8: Speech File Formats and Conversion
* Q1.9: Speech Laboratory Environments and Audio Editors
* Q1.10: Speech Research Sites
* Q1.11: Miscellaneous Software and Resources
___________________________________________________________________________
List of Software/Hardware/Information
* Man-Machine Interfacing
* SpeechViewer II
* CUSeeMe
* CyberPhone
* DigiPhone
* InterFACE from Hijinx
* FAQ: How can I use the Internet as a telephone?
* Nautilus: Secure Computer Telephony
* NEVOT (1.4v) from AT&T BL
* PGPfone
* Speak Freely
* Internet Phone from VocalTec
* WebPhone
* WebTalk
* AF version AF3R1
* Voice E-Mail from Bonzi Software
* MicNotePad Recording Software for Macs
* MixViews
* Network Audio System Release 1.1
* NIST Software - SPHERE and SCORE
* Sound Processing Kit
* TCPplay
* Auditory Modeller 1
* Auditory Modeller 2
* Auditory Toolbox for Matlab
* Human Audio Perception Document
* BEEP dictionary
* CMU dictionary
* CUVOLAD dictionary (Oxford Dictionary)
* Comprehensive Word List
* EAT: Edinburgh Associative Thesaurus
* Homophone List
* Moby Lexical Resources
* MRC Psycholinguistic Database
* WordNet
* Dictionaries on the WWW
* The vOICe
* The Learning Company's Language Training
* Wildfire - an Electronic Assistant
* 32 kbps ADPCM
* Castleton Network Systems - G.729 Voice Coder
* CELP 3.2a & LPC-10
* 8 Kbit/s CELP on the TMS320C5x family of DSP chips
* CyberVoice
* Rockwell's DigiTalk
* File format conversion
* G.711/721/723 Compression
* G.728 LD-CELP vocoder
* G.728 Compression
* GSM 06.10 Compression
* Lernout & Hauspie Speech Coding (5 products)
* Lernout & Hauspie Speech Coding SDK
* MPEG Audio
* shorten - a lossless compressor for speech signals
* Sipro Lab Telecom Inc. Coding
* Sonarc: Digital Audio Compression
* StarAudio Compressor/Player
* TrueSpeech from DSP Group
* U.S.F.S. 1016 CELP vocoder for DSP56001
* ToolVox from Voxware
_Apple Macintosh_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* Infovox Product Range
* Macintosh Speech Output Applications
* Macintosh Speech Synthesis Manager
* MacYack Pro
* MBROLA: Free Speech Synthesis Project
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* Sound Bytes DeveloperUs Kit
* Macintosh Speech Synthesis Manager
_DOS_
* CSRE: Computerized Speech Research Environment
* Infovox Product Range
* MBROLA: Free Speech Synthesis Project
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* spchsyn.exe
* Tinytalk
* ZMD Speech Synthesis
_OS/2_
* ProVerbe Speech Engine from ELAN Informatique
* ProVoice Developer's Speech Toolkit from First Byte
* Sound Bytes DeveloperUs Kit
_Unix_
* AcuVoice
* AsTeR
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* DECtalk: Text-to-Speech from Digital
* ETI-Eloquence
* Emacspeak - A Speech Output Subsystem For Emacs
* Festival Speech Synthesis System
* JSRU
* Klatt-style synthesiser
* KPE80 - A Klatt Synthesiser and Parameter Editor
* "learph": Trainable text-to-phoneme software by Antonio Lucca
_Other Platforms_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* TheBigMouth (NeXT)
* MBROLA: Free Speech Synthesis Project
* Narrator Translator Library (Amiga)
* Narrator (Amiga)
* TextToSpeech Kit (NeXT)
* Orator from Bellcore
* SENSYN speech synthesizer
* WreadFiles: File reader for Commodore Amiga
_Unknown_
* Lernout and Hauspie Text-To-Speech (3 products)
* Lucent Technologies Bell Labs Text-to-Speech system
* SIMTEL
* Text to Phoneme Program 1
* Text to phoneme program 2
* Text to phoneme program 3
_Apple Macintosh_
* Digital Dreams Speech Recognition Plug-Ins
* Dragon Dictation Products
* Macintosh Speech Recognition Manager
* PowerSecretary
_DOS_
* DATAVOX - French
* Dragon Developer Tools
* Ficomp Interpreter 6000
* Jialong He's Speech Recognition Research Tool
* smARTspeak from Advanced Recognition Technologies, Inc.
* Votan VPC2100 Voice Card and VSP 1010 Speech Processor
_OS/2_
* IBM VoiceType Dictation and Control
_Unix_
* AbbotDemo
* BBN Hark Telephony Recognizer
* EARS: Single Word Recognition Package
* Ficomp Interpreter 6000
* Hidden Markov Model Toolkit (HTK) from Entropic
* IN CUBE
* Jialong He's Speech Recognition Research Tool
* Lotec Speech Recognition Package
* Myers' Hidden Markov Model software
* NICO Artificial Neural Network Toolkit
* Nuance Speech Recognition System
* PureSpeech
* recnet
_Other Platforms_
* Simon Says (NeXT)
* Voice Command Line Interface (Amiga)
* Visus SpeechKit
_Unknown_
* Berkeley Restaurant Project (BeRP)
___________________________________________________________________________
* SpeechLinks: General
* Q1.1: What is comp.speech?
* Q1.2: comp.speech ftp site
* Q1.3: Common abbreviations and jargon
* Q1.4: Related newsgroups and mailing lists
* Q1.5: Associations, publications and conferences
* Q1.6: Handicap Aids
* Q1.7: Speech Databases
* Q1.8: Speech File Formats and Conversion
* Q1.9: Speech Laboratory Environments and Audio Editors
* Q1.10: Speech Research Sites
* Q1.11: Miscellaneous Software and Resources
Note: If you don't know what a newsgroup is, then talk to your local
system administration about how to get access. A useful newsgroups for
beginners is news.announce.newusers. You might also find the following
documents useful.
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Us
enet?
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to
_Frequently_Asked_Questions_about_Usenet
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_
posting_to_Usenet
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about
_FAQs
___________________________________________________________________________
Tony Robinson maintains the comp.speech ftp site. The ftp site is a
comprehensive repository of software and information related to speech
technology. The site is
* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/
Comp.speech Archives
* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/
The comp.speech ftp site includes a wide range of useful software and
resources. Tony has arranged it into a series of sub-directories:
___________________________________________________________________________
___________________________________________________________________________
Newsgroups
+ http://www.bdti.com/faq/dsp_faq.htm
+ ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
sci.lang - Language.
Discussion about phonetics, phonology, grammar, etymology and
lots more. A sci.lang FAQ is available.
alt.sci.physics.acoustics
Some discussion of speech production & perception.
Mailing Lists
Colibri
News about language, speech, logic and information.
Email: colibri@let.ruu.nl
WWW: http://colibri.let.ruu.nl/
+ ectl-request@snowhite.cis.uoguelph.ca
+ listserv@msu.edu
foNETiks
A moderated monthly newsletter distributed by e-mail. It
carries job advertisements, notices of conferences, and other
news of general interest to phoneticians, speech scientists and
others. The editors are Linda Shockey and Gerry Docherty. To
subscribe send the following 1 line message to
+ mailbase@mailbase.ac.uk
+ join fonetiks your_first_name your_second_name
___________________________________________________________________________
[Note: Also see the list provided in Shikano's WWW site on Speech and
Acoustics:
http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e
-www-site.html.]
Associations
Email: esca@icp.grenet.fr
* WWW: http://ophale.icp.grenet.fr/esca/esca.html
Linguistic Associations
Industry Publications
ASR News
Voice News
Telleconnect
* Contact: +1-212-691-8215
Computer Telephony
* Contact: +1-212-691-8215
* Contact: 1-800-854-3112
Speech Technology
Speech Communication
Computational Linguistics
Conferences
Eurospeech
___________________________________________________________________________
The following are products and companies which support users who can
benefit from the use of speech technology in a user interface. Please
feel free to submit information on relevant products, names of
companies and links to useful information on the Internet (especially
WWW sites).
[Of course, most of the products listed in Q5.5 and Q6.5 are useful.]
* Man-Machine Interfacing
* SpeechViewer II
Man-Machine Interfacing
SpeechViewer II
___________________________________________________________________________
Some databases are free but most are not. The databases normally
require lots of storage space (100's of MBytes is not unusual). Do not
expect to be able to ftp large amounts of speech data.
RELATOR Project
European resource initiative: see below.
PhonDat 1 - PD1
6 CDROMs, new edition in preparation, read speech, 201
speakers x 450+ sentences
PhonDat 2 - PD2
1 CDROM, read speech, 2nd edition, 16 speakers x 200
sentences, various labelled information
Verbmobil
Spontaneous speech recorded in a dialog task (appointment
scheduling). More information on the VERBMOBIL project:
http://www.dfki.uni-sb.de/verbmobil/
Corpora in Preparation
Strange Corpora - SC
Reference Corpora that reflect certain well known
problems in speech processing, like accents, repair,
GLuck Co.
195 Berlioz 1C, Nun's Island
Verdun H3E 1C1, Canada
e-mail: weigang@zaphod.math.mcgill.ca
http://www.cse.ogi.edu/CSLU/
ftp://speech.cse.ogi.edu/pub/releases
Speech Corpora
(TI46)
* Texas Instruments Speaker-Independent Connected-Digit Corpus
(TIDIGITS)
* Road Rally Conversational Speech Corpus
* HCRC Map Task Corpus
* Air Traffic Control Corpus (ATC0)
* SPIDRE Speaker Identification Corpus
* YOHO Speaker Verification Corpus
* OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone
Corpus
* BRAMSHILL
* MACROPHONE
* King Corpus for Speaker Verification Research
* WSJCAM0: Cambridge Read News Corpus
* TRAINS Spoken dialog corpus
* NYNEX PhoneBook Database
* Frontiers in Speech Processing
Text Corpora
Lexical Databases
Contact information:
NOISEX-92
to:-
Public Sub Account HMG 4768.
* Availability 2: Information on how to obtain a copy of the NATO
RSG.10 NOISE-ROM-0 can be obtained from the DRA Speech Research
Unit (address above) or from:
Dr. Herman Steeneken,
TNO Institute for Perception,
P.O. Box 23, 3769 ZG Soesterberg,
The Netherlands.
* Availability 3 (WWW): Examples of the NOISEX database are
available on the Rice University Digital Signal Processing (DSP)
group home page. (Note the files are large (>20MB).
http://spib.rice.edu/spib/select_noise.html
Prof. B. Rosner
Dept. of Experimental Psychology
South Parks Rd, Oxford, OX1 3UD, UK
email: burton.rosner@wolfson.ox.ac.uk
Phonemic Samples
* Some basic data. The following ftp sites have samples of English
phonemes (American accent I believe) in Sun audio format files.
See Question 1.8 for information on audio file formats.
ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes
ShATR
___________________________________________________________________________
WWW: ftp://ftp.cwi.nl/pub/audio/index.html
Text: ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2
http://peace.wit.com/sounds/SoundConversion/
___________________________________________________________________________
* Platform: DOS
* Description: CSRE (pronounced "Caesar") is a speech processing
system for the PC. It provides
+ Signal recording and playback
+ Signal editing
+ Pitch and spectral analysis and formant analysis
+ Speech synthesis with an implementation of the Klatt-1980
parametric speech synthesizer
* Requirements: PC compatible (80486DX), 1 Meg RAM (recommend 4M),
DOS 3.2 (recommend 6.22), VGA graphics (640x480; 16 colors) 30 Meg
of hard disk space (5 Meg for CSRE plus space for audio
recordings), and a supported audio card .
* Cost: See AVAAZ WWW Pages
* Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
2B0
Ph: +1-519-472-7944, Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/
* Note: See also the CSRE entry in Q5.5 on speech synthesisers.
GoldWave
* Platform: Windows
* Description: GoldWave is a digital audio editor for Microsoft
Windows. It features realtime amplitude/spectrum oscilloscopes,
large file editing, effects, and support for a wide variety of
sound formats.
+ Editing of multiple waveforms and large waveforms
+ Realtime amplitude/spectrum oscilloscopes
+ Resizable device controls window for accessing audio devices
+ Realtime fast forward and rewind playback
+ Effects: distortion, Doppler, echo, filter, mechanize,
Khoros
* Platform: Macintosh
* Description: A sound analysis and acquisition for Macs. MSL II
delivers the most common functions for speech analysis (FFTs,
LPCs, f0 extraction, etc.) & produces grayscale spectrographic
displays. Can be used for various speech technology and phonetic
training tasks.
* Hardware: Requires MacADIOS ("Macintosh Analog/Digital
Input/Output System") hardware for speech I/O at 12/16 bits.
* Misc: Software no longer updated by GW Instruments; MSL
soft/hardware will not perform input/output on Quadras, for
example, though analysis seems fine. Known to operate properly on
systems as high as IIcx & II fx.
* Availability: MSL has been replaced by SoundScope; see the
SoundScope entry for more detail.
* Contact:
GW Instruments
35 Medford Street, Somerville, MA 02143, USA
Phone: (617) 625-4096 Fax: (617) 625-1322
N!Power
Email: stisales@signal.com
WWW: http://www.silcom.com/~stilarry/
Ptolemy
ftp://ptolemy.berkeley.edu/pub/README
* Platform: Macintosh
* Description: Signalyze is an interactive program for the analysis
of speech and other acoustic material. Signalyze's basic concept
revolves around the display of up 100 signals in HyperCard
fashion. The program offers a range of signal editing features,
spectral analysis tools, manual scoring tools, pitch extraction
routines, signal manipulation tools, and extensive input-output
capacity. It also has a range of capabilities for creating,
editing and manipulating label files with flexibility in labelling
format.
Signalyze handles the following file formats: Signalyze, MacSpeech
Lab, AudioMedia, SoundDesigner II, SoundEdit/MacRecorder,
SoundWave, sound resource formats, and ASCII-text.
Sound I/O: Direct sound input from Apple 8- or 16-bit sound input
Sound output via Macintosh 8- or 16-bit sound.
* Compatibility: MacPlus and higher. Takes advantage of large
SoundScope
___________________________________________________________________________
Rather than try to list the places round the world which perform
speech research this FAQ lists sites on the WWW where other
comprehensive lists are maintained. Try the following:
http://mambo.ucsc.edu/psl/speech.html
Lists about 50 speech research sites and related information
sources. Very nice presentation!
Most speech research sites have links to other speech research sites
somewhere in their WWW pages.
___________________________________________________________________________
* CUSeeMe
* CyberPhone
* DigiPhone
* InterFACE from Hijinx
* FAQ: How can I use the Internet as a telephone?
* Nautilus: Secure Computer Telephony
* NEVOT (1.4v) from AT&T BL
* PGPfone
* Speak Freely
* Internet Phone from VocalTec
* WebPhone
* WebTalk
* AF version AF3R1
* Voice E-Mail from Bonzi Software
* MicNotePad Recording Software for Macs
* MixViews
* Network Audio System Release 1.1
* NIST Software - SPHERE and SCORE
* Sound Processing Kit
* TCPplay
* Auditory Modeller 1
* Auditory Modeller 2
* Auditory Toolbox for Matlab
* Human Audio Perception Document
* BEEP dictionary
* CMU dictionary
* CUVOLAD dictionary (Oxford Dictionary)
* Comprehensive Word List
* EAT: Edinburgh Associative Thesaurus
* Homophone List
* Moby Lexical Resources
* MRC Psycholinguistic Database
* WordNet
* Dictionaries on the WWW
Dynastat, Inc.
Speech Intelligibility Testing with Diagnostic Rhyme Test
(DRT), Modified Rhyme Test (MRT), Phonetically Balanced Word
Lists (PB), Diagnostic Medial Consonant Test (DMCT), Diagnostic
Alliteration Test (DALT), ICAO Spelling Alphabet Test (SpAT)
Speech Quality (Acceptability) Evaluation with Diagnostic
Acceptability Measure (DAM), Mean Opinion Score (MOS),
Degredation Mean Opinion Score (DMOS)
Contact: Dynastat, Inc.
2704 Rio Grande, Suite 4, Austin, TX 78705, USA
Ph: +1-512-476-4797, Fax: 512/472-2883
Email: sharpley@dynastat.com
WWW: http://www.bga.com/dynastat/
700 references:
http://www.itl.atr.co.jp/cocosda/output/synth.refs
Very Miscellaneous
* The vOICe
* The Learning Company's Language Training
* Wildfire - an Electronic Assistant
* Platform: Various
* Description: The SRAPI provides support for speech recognition,
text-to-speech and other media playback. The SRAPI Committee is a
nonprofit Utah corporation with the goal of providing solutions
for interaction of speech technology with applications.
Core members include: Novell, Inc., Dragon Systems, IBM, Kurzweil
AI, Intel, and Philips Dictation Systems. Additional contributing
members include Articulate Systems, DEC, Kolvox Communications,
Lernout and Hauspie, Syracuse Language Systems, Voice Control
Systems, Corel, Verbex and Voice Processing Corporation.
* More information: WWW: http://www.srapi.com/
Email: For more information on the SRAPI Developer CD, send email
to srapi@srapi.com with Subject "SRAPI CD Info".
CUSeeMe
CyberPhone
ftp://magenta.com/pub/cyberphone
DigiPhone
* Platform: Windows
* Description: InterFACE provides voice communication on the
Internet through IRC (Internet Relay Chat) services.
* Requirments: Recommend a 486DX, 8meg Ram, Windows, VGA Monitor and
a 16 bit sound card.
* Availability: Available on CD Only for $60.00 US, which includes,
postage and handling.
Demo versions available from the HiJiNX WWW site.
* Contact: HiJiNX, Brisbane, Australia
Email: jester@hijinx.com.au
WWW: http://www.hijinx.com.au/
By Email
Mail voice-faq-request@northcoast.com
with "Subject: archive"
and "Body: send voice-faq"
FTP
ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_
How_can_I_use_the_Internet_as_a_telephone?
WWW:
http://rpcp.mit.edu/~asears/voice-faq.html
ftp://gaia.cs.umass.edu/pub/hgschulz/nevot
PGPfone
Speak Freely
Demo version:
ftp://ftp.vocaltec.com/pub/iphone09.exe
WebPhone
* Platform: Windows
* Description: WebPhone provides telephone quality, real-time, full
duplex, encrypted, point-to-point voice communication over the
Internet and other TCP/IP based networks. (More detail provided on
WebTalk
AF version AF3R1
ftp://crl.dec.com/pub/DEC/AF
WWW:
http://www.research.digital.com/CRL/projects/AF/home.html
* Contact: af-request@crl.dec.com
* Platforms: Macintosh
* Description: MicNotePad is audio recording tool designed to
improve dictation (a digital replacement for the old-style
mechnical tape systems used by typists). It uses the built-in
microphone or sound input port and the hard disk to record
conversations or speech of arbitrary length. Speech compression
techniques are used to reduce the disk-space. Once it is recorded,
single keystrokes control playback while you type in your word
processor.
* Contact: Nirvana Research
WWW: http://moof.com/nirvana/
Email: nirvana@got.net
MixViews
ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews
ftp://ftp.x.org:/contrib/audio/nas/netaudio-1.2.tar.gz
Also available in the same directory are document files and some
sample sounds.
Readme File
ftp://jaguar.ncsl.nist.gov/pub/sphere.README
Source Code
ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z
(NIST) .
* Availability: By anonymous ftp from
README File
ftp://jaguar.ncsl.nist.gov/pub/score.README
Source Code
ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z
* Platforms: UNIX
* Description: Sound Processing Kit (SPKit) is an object-oriented
class library for audio signal processing. SPKit includes classes
for various signal processing tasks and a way of implementing
sound processing algorithms in a simple object-oriented manner.
Sound Processing Kit is implemented in C++ and is designed to be
portable. The current version requires a bare-bones C++ 2.0
compatible compiler (templates and exceptions are not needed).
ANSI C standard libraries are required. SPKit includes classes for
+ Sound input and output
+ Basic signal processing
+ Dynamics processing (compressor, gating etc)
+ Filtering
+ Delay and reverberation
+ Distortion
+ Signal routing
* Availability:
Software distribution:
http://www.music.helsinki.fi/research/spkit/distribution/
spkit.tar.Z
TCPplay
* Description: TCPPlay lets you use your mac as an audio server for
your Unix box. Provided with source code. Written by Bill
Stafford, Rich Tsoi and Malcolm Slaney.
* Availability: Anonymous ftp from
ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx
Auditory Modeller 1
ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
Auditory Modeller 2
ftp://suna.lut.ac.uk/public/hulpo/lutear
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPe
rception
BEEP dictionary
CMU dictionary
ftp://ftp.cs.cmu.edu/project/fgdata/dict/
Interactive version
Provided by Computing and Information Systems Department
(CISD) of Rutherford Appleton Laboratory, UK
http://www.cis.rl.ac.uk/proj/psych/eat.html
ftp directory. 6 MB
http://www.cis.rl.ac.uk/proj/psych/eat/eat/
Homophone List
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homo
phones-1.01.txt
WordNet
WWW Interface
http://www.cogsci.princeton.edu/~wn/w3wn.html
Source Distributions
Unix (9.1MB), PC (5.8MB), Macintosh (7.5MB), Prolog
(database only, 4.2MB).
ftp://clarity.princeton.edu/pub/wordnet/
CMU Dictionary
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-
Information/Dictionaries-etc.html
IPA-ASCII
A scheme for representing IPA transcriptions in ASCII for
use in Usenet articles and email.
http://weber.u.washington.edu/~dillon/ipaascii.html
+ Gopher:
gopher://gopher.sil.org/11/gopher_root/computing/software/fon
ts/
+ Ftp for Windows: ftp://ftp.sil.org/fonts/win/silip12a.exe
+ Ftp for Mac: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx
Also available through the SIL email server. Send either of the
following commands to MAILSERV@sil.org.
Windows:
SEND/MODE=BLOCK/ENCODING=UUENCODE
[FTP.FONTS.WIN]SILIP12A.EXE
Mac:
SEND [FTP.FONTS.MAC]SILIPA12.SEA_HQX
TIPA
Created by Rei Fukui: fkr@tooyoo1.l.u-tokyo.ac.jp.
Source: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
Postscript manual:
ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
Compressed postscript manual:
ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
http://babel.uoregon.edu/yamada/fonts.html
Windows Fonts
http://babel.uoregon.edu/yamada/winfonts.html
IPA Fonts
http://babel.uoregon.edu/yamada/fonts/phonetic.html
ftp site
ftp://yftp@www-vms.uoregon.edu/fonts/
The vOICe
* Platform: ?
* Description: Wildfire is a phone-based electronic assistant.
Functions include:
+ Screens, routes, and announces incoming calls.
+ Contact list with voicedialing.
+ Schedules and reminders for follow-up calls and action items.
+ Messaging and advanced voicemail features.
* Contact: Wildfire Communications, Inc.
20 Maguire Road, Lexington, MA 02173 USA
Ph: +1-617-674-1500, Fax: 617-674-1501
Demo line: 1-800-WILDFIRE
Email: info@wildfire.com
WWW: http://www.wildfire.com/
___________________________________________________________________________
---
Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories Ph: (508) 442-2681
2 Elizabeth Drive, MS UCHL03-207 Fax: (508) 250-5067
Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 2/3
From: andrew.hunt@east.sun.com (Andrew Hunt)
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Followup-To: comp.speech
Organization: Speech Applications Group, Sun Microsystems Laboratories
Summary: Information on Speech Technology
Approved: news-answers-request@MIT.Edu
Archive-name: comp-speech-faq/part2
Last-modified: 1997/09/06
URL: http://www.speech.su.oz.au/comp.speech/
[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]
___________________________________________________________________________
Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
improves the frequency response: the higher the sampling frequency the
better the high frequency content will be. A 16kHz sampling rate is a
reasonable target for high quality speech recording and playback.
When doing speech recognition you need to remember that the your
computer is not as good as your ear so it will have trouble with poor
quality sounds. The choice of an appropriate sampling setup depends
very much on the speech recognition task and the amount of computer
power available.
___________________________________________________________________________
* http://www.bdti.com/faq/dsp_faq.htm
* ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
___________________________________________________________________________
* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
*
ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.t
ar.gz
* Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints
of Isolated Utterances", Bell System Technical Journal, Vol 54,
No. 2, pp 297-315, 1975.
* Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans
on Communications, Vol 26, No 1, Jan 78, pp. 140-145.
* Newman, W.C. "Detecting Speech with an Adapative Neural Network."
Electronic Design. 22 March 1990.
* Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE
Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994, pp 153-159.
___________________________________________________________________________
* FFTW
FFTW is a C subroutine library for computing the FFT in one or
more dimensions. It is not limited to sizes that are powers of
two, and includes real-complex and parallel transforms.
Also on the FFTW web site are benchmarks comparing the
performance and accuracy of many public-domain FFT
implementations on a variety of platforms, as well as links to
other sources of FFT code and information.
Available from http://theory.lcs.mit.edu/~fftw
Developed by Matteo Frigo and Steven G. Johnson:
fftw@theory.lcs.mit.edu
___________________________________________________________________________
* Sampling theory
* Filter bank analysis
* Short-term fourier analysis
* Linear prediction analysis
* Formant analysis and voicing analysis
* Speech coding
* and more....
There are many good books which discuss signal processing for speech:
___________________________________________________________________________
Can anyone provide information for SGI, NeXT, other UNIX hardware and
any other PC soundcards?
ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.
hqx
Routines that play and record sounds using the toolbox are
included (and interfaced to Matlab).
PC Audio Hardware
Note: new soundcards are becoming available all the time - the
information below is definately not up to date. Check out the
following newsgroups for up-to-date information.
* comp.sys.ibm.pc.soundcard
* comp.sys.ibm.pc.soundcard.GUS
* comp.sys.ibm.pc.soundcard.advocacy
* comp.sys.ibm.pc.soundcard.games
* comp.sys.ibm.pc.soundcard.misc
* comp.sys.ibm.pc.soundcard.music
* comp.sys.ibm.pc.soundcard.tech
* http://www.wi.leidenuniv.nl/audio/
* http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
* WWW: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
* News: comp.sys.sgi.misc
* Ftp: ftp://viz.tamu.edu/pub/sgi/faq/
* Platform: Various
* Description: A range of signal I/O, A/D, D/A and DSP products are
available. There are too many to list.
* Contact: Ariel Corp.
433 River Road, Highland Park, NJ 08904.
Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124
___________________________________________________________________________
/**
** Signal conversion routines for use with Sun4/60 audio chip
**/
#include stdio.h
/*
** This routine converts from linear to ulaw
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
** Continuous PCM Companding Law," Villeret, Michel,
** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
** 1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/
unsigned char
linear2ulaw(sample)
int sample; {
static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
int sign, exponent, mantissa;
unsigned char ulawbyte;
return(ulawbyte);
}
/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: 8 bit ulaw sample
** Output: signed 16 bit linear sample
*/
int
ulaw2linear(ulawbyte)
unsigned char ulawbyte;
{
static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
int sign, exponent, mantissa, sample;
ulawbyte = ~ulawbyte;
sign = (ulawbyte & 0x80);
exponent = (ulawbyte >> 4) & 0x07;
return(sample);
}
___________________________________________________________________________
On the Web
The following sites provide lists of useful DSP software. Not all the
software is directly applicable to speech processing.
comp.dsp FAQ
http://www.bdti.com/faq/dsp_faq.htm
___________________________________________________________________________
___________________________________________________________________________
The standard reference point is toll quality speech, this is the same
as what would be expected over a telephone line, for example, speech
coded at 8 kHz using 8 bit ulaw coding and a maximum frequency of
about 3.3 kHz. This is a bit rate of 64 kbps, and as such represents a
compressed form over (say) 16 bit, 16 kHz speech which is the standard
in speech recognition work.
ulaw coding does not exploit the (normally large) sample to sample
correlations found in speech. ADPCM is the next family of speech
coding techniques, and does exploit this redundancy by using a simple
linear filter to predict the next sample of speech. The resulting
prediction error is typically quantised to 4 bits thus giving a bit
rate of 32 kbps (see, for example, the software in Q3.3: 32 kbps
ADPCM, G.711/721/723 Compression, shorten). The advantages of ADPCM
are that is simple to implement and has very low delay.
The CELP family of coders compensates for the lack of quality of the
simple LPC model by using more information in the excitation. Each of
a set of codebook of excitation vectors is tried and the index of the
one that best matches the original speech is transmitted. This results
in an increase in the bit rate to typically 4800-9600bps. Most speech
___________________________________________________________________________
Reference Books
On the WWW
comp.compression FAQ
Includes a few questions and answers on the compression of
speech.
ftp://rtfm.mit.edu/pub/usenet/comp.compression/
http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.h
tml
___________________________________________________________________________
* 32 kbps ADPCM
* Castleton Network Systems - G.729 Voice Coder
* CELP 3.2a & LPC-10
* 8 Kbit/s CELP on the TMS320C5x family of DSP chips
* CyberVoice
* Rockwell's DigiTalk
* File format conversion
* G.711/721/723 Compression
* G.728 LD-CELP vocoder
* G.728 Compression
* GSM 06.10 Compression
* Lernout & Hauspie Speech Coding (5 products)
* Lernout & Hauspie Speech Coding SDK
* MPEG Audio
* shorten - a lossless compressor for speech signals
* Sipro Lab Telecom Inc. Coding
* Sonarc: Digital Audio Compression
* StarAudio Compressor/Player
* TrueSpeech from DSP Group
* U.S.F.S. 1016 CELP vocoder for DSP56001
* ToolVox from Voxware
32 kbps ADPCM
* Platform: Sun (the makefiles and source can be modified for other
platforms)
* Description: CELP is lossy compression technqiue. The US
Department of Defences's Federal-Standard-1016 based 4800 bps code
excited linear prediction voice coder version 3.2a (CELP 3.2a).
Fortran and C simulation source codes.
* Availability: By anonymous ftp from:
ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
Or from the comp.speech ftp server
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.gz
LPC-10 Fortran source code is also available:
ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
Here is a modified LPC-10 release that includes ANSI C source:
http://www.arl.wustl.edu/~jaf/lpc/
* Documentation: The following articles describe the
Federal-Standard-1016 4.8-kbps CELP coder:
+ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder,"
Digital Signal Processing, Academic Press, 1991, Vol. 1, No.
3, p. 145-155.
+ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard
1016)," in Advances in Speech Coding, ed. Atal, Cuperman and
Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p.
121-133.
The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400
bps linear prediction coder (LPC-10) was republished as a Federal
Information Processing Standards Publication 137 (FIPS Pub 137).
It is described in:
+ Thomas E. Tremain, "The Government Standard Linear Predictive
Coding Algorithm: LPC-10," Speech Technology Magazine, April
1982, p. 40-49.
There is also a section about FS-1015 in the book:
+ Panos E. Papamichalis, Practical Approaches to Speech Coding,
Prentice-Hall, 1987.
The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
described in:
+ Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced
Classification of Speech with Applications to the U.S.
Government LPC-10E Algorithm," Proceedings of the IEEE Intl.
Conf. on Acoustics, Speech, and Signal Processing, 1986, p.
473-6.
* Vendors:
Realtime DSP code for FS-1015 and FS-1016 is sold by:
+ John DellaMorte, DSP Software Engineering
165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA
CVI Inc.
443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
Tel: (604) 987 1719 Fax: (604) 986 8139
Email: cvi@extropia.wimsey.com
CyberVoice
Rockwell's DigiTalk
8KHz and transmits 223 bits of coded speech every 26ms, giving an
overall bit rate of 8.577Kbps. The algorithm is based on
analysis-by-synthesis predictive coding with vector-coded
excitation, in which the excitation signal is optimized by
minimizing the perceptually weighted error between the original
and synthesized speech. More information and results of perceptual
tests are available on the WWW.
* Availablity: See the WWW page:
http://www.nb.rockwell.com/ref/digitalk/
ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
G.711/721/723 Compression
* Description:
+ G.711 : CCITT u-law and A-law compression
+ G.721 : CCITT 32 kbps ADPCM coder
+ G.723 : CCITT 24 kbps and 40 kbps ADPCM coders
* Availability: By email to itudoc@itu.ch, with
GET ITU-3022
as the *only* line in the body of the message.
It is also available by anonymous ftp from:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G
721_G723.tar.Z
Cole Erskine
Analogical Systems
299 California Avenue, Suite 120
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
email: cole@analogical.com
G.728 Compression
ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
http://www.cs.tu-berlin.de/~jutta/toast.html
MPEG Audio
* Platform: UNIX/DOS
* Description: A fast waveform coder suitable for a speech and music
signals in a wide variety of file formats. The degree of
compression is adjustable from lossless to three bits a sample.
16bit 16kHz speech generally attains 50% lossless compression and
16:3 compression of CDROM quality speech is obtainable with only
minor audiable degredation.
* Availability: Anonymous ftp - UNIX and DOS versions
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
n.tar.gz
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
n.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
n.zip
Proprietary Standards
1. ACELP 8 v2.0 codec (flexible dual rate codec equipped with a
VAD)
2. ACELP 4.8 codec
* Contact: Sipro Lab Telecom Inc.
770, Chemin Lucerne, Ville Mont-Royal (Quebec), H3R 2H6 CANADA
Ph: (514) 737-5874, Fax: (514) 737-2327
E-mail: sales@sipro.com
WWW: http://www.sipro.com/
StarAudio Compressor/Player
* Platform: Win95
* Description: Using a time-domain process delivers lossless
decompressed data. Processes any source of .wav file format, high
quality 16-bit audio data at any sampling rate. Requires no
special hardware and decompression speed is real-time on most
486's and on any Pentium. The higher the sampling rate the higher
the compression ratio; minimum compression of 4:1 for 11k data,
and usually exceeding 7:1 for 44k data. Full bandwidth of signal
is preserved with default compression options. Compression options
allow increase of compression ratio further with a slight trade
off in the reduction of the output quality. A decompression
library is available for application development.
* Demo: Download the shareware version of the program from the STR
WWW site.
* Misc: A technical paper is available in Word 6.0 format:
ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
* Contact: Speech Technology Research Ltd.,
Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
Ph: +1-250-477-0544
Email: products@speechtech.com
WWW: http://www.speechtech.com/home/speechtech/
* Platform: DSP56001
* Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a
single 27MHz Motorola DSP56001. Free demo software available for
PC-56 and PC-56D. Source and object code available for a one-time
license fee.
* Contact:
Cole Erskine
Analogical Systems
299 California Avenue, Suite 120
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
Email: cole@analogical.com
* Platform: Windows and soon available on Mac (in Beta now) and Unix
* Description: ToolVox is a proprietary frequency domain speech
coder. 11 KHz speech is coded to an average rate of between 5,000
bits per second and 9,000 bps. Real-time compression algorithms
available for 2,400 bps. 22 KHz playback, as well as a ultra low
bit rate 8 KHz codec are coming soon. On playback, the time scale
can be changed by a 5x factor, pitch can be modified over a 3
octave range, and vocal personality can be modified using a
tranformation function called VoiceFonts(tm).
* Misc 1: A SDK for Windows is available.
* Misc 2: Demo software is available from the Voxware Inc WWW page:
http://www.voxware.com/
* Price: Basic toolkit is $895 US. OEM and mass distribution
licenses are separate. Ordering information is provided on the
Voxware WWW server.
* Contact:
Voxware, Inc.
Ph: (609) 497-1212 Fax: (609) 497-2490
___________________________________________________________________________
ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language
_Processing_FAQ
___________________________________________________________________________
Take a look at the FAQ for the "comp.ai" newsgroup as it also includes
some useful references.
Journals
Conferences
Most AI conferences have a NLP track; AAAI, ECAI, IJCAI and the
Cognitive Science Society conferences usually interesting for NLP.
CUNY is an important psycholinguistic conference. Other conferences
include NELS, the conference of the Chicago Linguistic Society (CLS),
WCCFL, LSA, the Amsterdam Colloquium, and SALT.
___________________________________________________________________________
ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry
___________________________________________________________________________
---
Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories Ph: (508) 442-2681
2 Elizabeth Drive, MS UCHL03-207 Fax: (508) 250-5067
Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 3/3
From: andrew.hunt@east.sun.com (Andrew Hunt)
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Followup-To: comp.speech
Organization: Speech Applications Group, Sun Microsystems Laboratories
Summary: Information on Speech Technology
Approved: news-answers-request@MIT.Edu
Archive-name: comp-speech-faq/part3
Last-modified: 1997/09/06
URL: http://www.speech.su.oz.au/comp.speech/
[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]
Speech Synthesis
___________________________________________________________________________
___________________________________________________________________________
There are several algorithms. The choice depends on the task they're
used for. The easiest way is to just record the voice of a person
speaking the desired phrases. This is useful if only a restricted
volume of phrases and sentences is used, e.g. messages in a train
station, or schedule information via phone. The quality depends on the
way recording is done.
More sophisticated but worse in quality are algorithms which split the
speech into smaller pieces. The smaller those units are, the less are
they in number, but the quality also decreases. An often used unit is
the phoneme, the smallest linguistic unit. Depending on the language
used there are about 35-50 phonemes in western European languages,
i.e. there are 35-50 single recordings. The problem is combining them
as fluent speech requires fluent transitions between the elements. The
intellegibility is therefore lower, but the memory required is small.
The longer the units become, the more elements are there, but the
quality increases along with the memory required. Other units which
are widely used are half-syllables, syllables, words, or combinations
of them, e.g. word stems and inflectional endings.
___________________________________________________________________________
On the WWW
___________________________________________________________________________
+ YorkTalk
+ Loughborough Sound Images
+ University of Birmingham - FDFS
+ Eurovocs
+ DECtalk
+ AT&T Bell Labs Synthesiser
+ S.W.A.Ll.C. - Welsh Synthesis from CSTR
+ All-Prosodic Speech Synthesis - IPOX
+ Orator from Bellcore
Pavarobotti
http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
WWW demo of the Pavarobotti synthesis technology developed at
the National Center for Voice and Speech
(http://www.shc.uiowa.edu/ncvs_home.html).
Say...
http://wwwtios.cs.utwente.nl/say
WWW demo of the rsynth speech synthesis software. The WWW
capability was implemented by Axel Belinfante.
+ ICP-Grenoble
+ CNET-Lannion (with TD-PSOLA)
+ KTH-Stockholm
+ Universite-Mons - several versions
Lyricos
http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
Demos of the Lyricos singing voice synthesis system.
Concatenation-based synthesis of singing voice from MIDI input.
___________________________________________________________________________
In the FAQ...
_Apple Macintosh_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* Infovox Product Range
* Macintosh Speech Output Applications
* Macintosh Speech Synthesis Manager
* MacYack Pro
* MBROLA: Free Speech Synthesis Project
* ProVoice Developer's Speech Toolkit from First Byte
* SENSYN speech synthesizer
* Sound Bytes DeveloperUs Kit
* Macintosh Speech Synthesis Manager
_DOS_
* CSRE: Computerized Speech Research Environment
* Infovox Product Range
_OS/2_
* ProVerbe Speech Engine from ELAN Informatique
* ProVoice Developer's Speech Toolkit from First Byte
* Sound Bytes DeveloperUs Kit
_Unix_
* AcuVoice
* AsTeR
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* DECtalk: Text-to-Speech from Digital
* ETI-Eloquence
* Emacspeak - A Speech Output Subsystem For Emacs
* Festival Speech Synthesis System
* JSRU
* Klatt-style synthesiser
* KPE80 - A Klatt Synthesiser and Parameter Editor
* "learph": Trainable text-to-phoneme software by Antonio Lucca
_Other Platforms_
* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
* TheBigMouth (NeXT)
* MBROLA: Free Speech Synthesis Project
* Narrator Translator Library (Amiga)
* Narrator (Amiga)
* TextToSpeech Kit (NeXT)
* Orator from Bellcore
* SENSYN speech synthesizer
* WreadFiles: File reader for Commodore Amiga
_Unknown_
* Lernout and Hauspie Text-To-Speech (3 products)
* Lucent Technologies Bell Labs Text-to-Speech system
* SIMTEL
* Text to Phoneme Program 1
* Text to phoneme program 2
* Text to phoneme program 3
AcuVoice
AsTeR
* Platform: UNIX
* Description: TTS front-end program which encodes structural
information about documents in speech synthesis. For more
information check out:
http://www.research.digital.com/CRL/personal/raman/aster/
aster-toplevel.html
Email: raman@adobe.com
Email: aspg@attmail.com
WWW: http://www.att.com/aspg/
* Platform: NeXT
* Description: Text to speech program based on concatenation of
pre-recorded speech segments.
* Availability:
ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z
Creative TextAssist
* Platform: Windows
* Description: Based on DECtalk speech synthesis. A detailed
description of TextAssist is provided on the Creative WWW pages.
TextAssist TextReader provides a convenient Windows user interface
for text reading.
* Availability: Creative TextAssist is bundled with most (all?)
Creative Sound Blaster audio cards. TextAssist preview software is
available from the Creative Labs TextAssist home page.
* Contact: Creative Labs, Inc.
Address, phone, email etc unknown
WWW: http://www.creaf.com/ :
http://www.creaf.com/wwwnew/tech/devcnr/tassist.html
* Platform: Windows
* Description: The TextAssist API (TAAPI) is created for Microsoft
Windows 3.1x and Windows 95 developers who intend to develop
16-bit Text-to-Speech software applications using Creative's
TextAssist speech engine. It supports direct control of speech
output characteristics, concurrent playback of text-to-speech and
wave files, foreign language support, speech synchronization,
exception dictionaries. It also includes a voice editing tool for
creating new custom voices, a Visual Basic Custom Control for
high-level support in Visual Basic and other languages
* Availability: The TextAssist API is released to registered
developers at no cost.
* Contact: WWW: http://www.creaf.com/
FAQ: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html
* Platform: DOS
* Description: CSRE is a software system which includes in an
implementation of the Klatt speech synthesizer. See the CSRE entry
in Q1.9 and the AVAAZ WWW pages for more detail.
* Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
2B0
Ph: +1-519-472-7944 , Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/
* Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
* Description: Converts ordinary text into natural-sounding,
intelligible speech. Provides personalized voices, and extensive
user controls. DECtalk technology is available for the following
packaging options.
+ DECtalk PC card option: An industry-standard ISA/EISA bus
card implementation that can be integrated with any Intel 486
processor-based system running DOS or Windows. Applications
can be interfaced to the bus via a DOS Terminate and Stay
Resident (TSR) driver or a Windows Dynamic Link Library
(DLL). This option is available with an external speaker with
volume control and headphone jack.
+ DECtalk Express external package: An external, portable
package that you can plug in to any PC or serial port. The
external package includes a built-in speaker and headphone
jack, plus combined on/off and volume controls and a
rechargeable battery pack.
+ DECtalk Software solution: Software-only text to speech for
Alpha or Intel systems running Windows NT or Alpha systems
running Digital UNIX. Provides complete speech synthesis
capabilities so developers can enhance applications with
DECtalk technology. DECtalk Software output can be directed
to audio devices, into WAVE files, or into memory buffers.
* Pricing:
://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis
-oi.html
* More Information:
Digital Equipment Corporation WWW pages: http://www.digital.com/
DECtalk page:
http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
l
Ph: 1-800-DIGITAL
DECtalk Software
l
WWW:
http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synth
esis.html
Ph: 1-800-DIGITAL
ETI-Eloquence
Emacspeak source
http://www.research.digital.com/CRL/personal/raman/emacsp
eak/emacspeak.tar.gz
Eurovocs
HADIFIX
* Platform: Windows
* Description: German speech synthesis system developed at the
Institute for Communications Research and Phonetics , University
of Bonn. Provides conversion of input text to phonemes, automatic
prediction of stress, phrasing and pitch, and speech generation by
concatenation of small units of natural speech. Demisyllables and
similar units are used; they comprise all consonants before the
vowel and the beginning of the vowel (initial demisyllable) or the
end of the vowel and the following consonants (final
demisyllable). For example, the word 'Strolch' is formed by
concatenating 'Stro' and 'olch'.
* Demo: Windows demo software available. Limited to synthesis of one
short text (text.txt) at a time. Speech format limitations too.
1.3MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
A 1993 version is available with unlimited synthesis from a string
of phonemic symbols and accent markers. 6MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
* WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
* On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
* Platform: Windows
* Description: IPOX is an experimental, all-prosodic speech
synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is
freely available (after registration) for evaluation and
non-profit research purposes.
* Requirements: PC (preferably a fast 486) running Windows 3.1 or
higher. Sound output requires a 16-bit Windows-compatible sound
card
* Availability: By WWW from
http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm
JSRU
Klatt-style synthesiser
* Platform: Unix
* Cost: Free
* Description: Software posted to comp.speech in late 1992.
* Availability: By ftp from the comp.speech ftp site
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
04.tar.gz
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
04.tar.Z
* See also: KPE80 - A Klatt Synthesiser and Parameter Editor.
* Platform: Unix
* Description: The KPE80 program provides a graphical interface for
the implementation of the Klatt 1980 formant synthesiser written
by Jon Iles and Nick Ing-Simmons. It was inspired by IGE, a piece
of code written by Rob Fletcher (
http://www.york.ac.uk/~rpf1/IGE.html).
* Technical Desc.: It is comprised of an X-Window interface and
version 3.03 of the synthesiser code. The interface allows users
to display and edit Klatt parameters using a graphical display
which includes the time-amplitude waveform of both the original
speech and its synthetic copy, and some signal analysis
facilities. Most of the work in choosing the parameter values to
produce the synthetic copy has to be done by the user. KPE will
estimate the fundamental frequency contour from an original token;
this estimate will need to be amended where errors occur. It is
possible to specify the formant trajectories with some precision
by overlaying the appropriate formant frequency parameter tracks
on the spectrogram of the target waveform. A number of facilities
* Platform: UNIX
* Description: Experimental software which learns text to phoneme
translation from examples using decision-tree-like data
structures. It is based on the assumption that each letter can
correspond to different phoneme strings depending on the context.
* Availability: Examples and source are available on the WWW:
http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
* Contact: Antonio Lucca: toninlcc@tesi.dsi.unimi.it
Lernout & Hauspie have three TTS products. The functionality of the
products is similar, however, they differ in hardware implementation
and other details where described below.
sentence.
+ Input formats: orthographic input, phonetic input, phonetic
input with prosodic information.
* tts2000/T
+ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
linear PCM.
+ Sampling Frequency: 8kHz
+ Single channel platform examples: SHARP SH7000, ARM6/ARM7,
Intel i960, TI TMS320C31, AT&T DSP3210
+ Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
* tts2000/M
+ Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit
A-law PCM, 16 bit linear PC.
+ Sampling Frequency: 8/10/11.025 kHz
+ Single processor platform examples: ARM6/ARM7, Intel
386/486/Pentium, Motorola 68040
+ Two processor platform examples: {Intel 386/486/Pentium or
Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI
TMS320C25/20C5X}
* tts3000/C
+ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
linear PCM.
+ Sampling Frequency: 10kHz
+ Single processor platform examples: SHARP SH7000, ARM6/ARM7,
Intel i960, TI TMS320C31, AT&T DSP3210
+ Two processors platform examples: { SHARP SH7000 or ARM6/ARM7
or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or
Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
* See also: L&H Windows TTS SDK
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/tts.html
* Price: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/
* Platform: Windows
* Description: The L&H Text-to-Speech software developers kit is
able to integrate text-to-speech technology with your own or
existing PC applications under Microsoft Windows 3.1. This
software will allow conversion of written text into clear human
sounding synthetic speech.
* Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 +
MS Windows 3.1 (or higher) + SoundBlaster compatible sound board.
* See also: L&H TTS Products
* More Information: on the Lernout & Hauspie WWW pages:
http://www.lhs.com/tts.html
* Price: Unknown
* Contact: Lernout and Hauspie Speech Products
20 Mall Road, 4th Floor
Burlington, MA 01803, USA
Ph: +1-617-238-0960, Fax: +1-617-238-0986
Email: sales@lhs.com
WWW: http://www.lhs.com/
* Platform: Windows
* Description: Listen2 is a multi-voice, multi-language text reader.
Listen2 comes in two versions, English only that uses high quality
male and female voices, and the International version that can
speak up to 5 different languages: English, German, French,
Spanish or Italian, all in male voices. The basic International
program comes with built-in English and additional language fonts
can be purchased separately. The English version comes complete.
Both programs are dynamically switchable and configurable. This
means that you can press a hot key to speed up the speech, make it
louder or quieter, etc., as it is reading a file. You can also
insert flags in text files to make it switch voices or switch
languages, depending on what version you have.
Listen2 has all the features of the JTS Reader shareware program
plus a few more. It will voice your reminder messages or
appointment list on start-up. It will also speak a reminder
message on shutting down.
* WWW: A more complete description is available on the Listen2 web
page
* Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
WWW: http://www.islandnet.com/jts/
* Platform: Unknown
* Description:Lucent Technologies provides a web site with demos and
samples of their latest speech synthesis technology. The site has
interactive demos in American English, German, and Mandarin
Chinese, and the capability to adjust voice parameters on the fly.
Pre-synthesized demos for French, Italian, Russian, and Romanian
are also provided.
The site includes downloadable papers with detailed system
descriptions.
* WWW: http://www.bell-labs.com/project/tts/
* Platform: Macintosh
* Description: A comprehensive list of Macintosh Speech Applications
is provided by Kevin Lenzo at CMU:
http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
The Apple Speech WWW Site also has some useful information:
http://www.speech.apple.com/
* Platform: Macintosh
* Description: Apple's text-to-speech system extensions that enable
applications to perform text-to-speech conversion. The Speech
Manager runs on most Macs, but PlainTalk (and the high quality
voices) requires a 68020 Mac or better.
* Availability: By anonymous ftp from:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
em/PlainTalk 1.4.1/
This directory contains subdirectories for recent versions of
PlainTalk. The current release (PlainTalk 1.4.1) contains the
English Text-To-Speech with about a dozen voices
(English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
(Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
MacYack Pro
* Platform: Macintosh
* Description: MacYack Pro is a commercial speech package for
Macintosh that uses the PlainTalk Text-to-Speech synthesis
software. Features include:
+ Add speech to any word processor.
+ Hear notification dialogs and other dialog boxes.
+ See and hear a customized message at startup or shutdown.
+ Hear calculations instantly.
+ Correct pronounciation errors.
+ Create custom double-clickable "speech files."
+ Have speaking alert sounds.
+ Add speech to HyperCard stacks.
+ Use AppleScript to add speech to other programs.
* Price: $29.95 for a limited time, reduced from $49.95 regular
price. 30 days money back guarantee.
* Contact: Scantron Quality Computers
20200 Nine Mile Rd. St. Clair Shores, MI 48080
Ph: 1-800-777-3642, Fax: 810-774-2698
E-mail: sales@sqc.com
WWW: http://www.sqc.com/
Product Info: http://www.lowtek.com/macyack/
http://tcts.fpms.ac.be/synthesis/modelcmp.html.
* Contact: Dr Thierry Dutoit
Faculte Polytechnique de Mons, TCTS Lab,
31, bvd Dolez, B-7000 Mons, Belgium.
Ph: +32-65-374133, Fax: +32-65-374129
e-mail: mbrola@tcts.fpms.ac.be
WWW: http://tcts.fpms.ac.be/synthesis/mbrola.html
* Platform: Windows
* Description: Monologue is a software program that reads text from
the clipboard in Windows 16 or 32 bit applications. It can be
found as a bundled product with many sound cards and multimedia
general purpose computer systems. Monologue can add the element of
speech to virtually any text oriented application. Any
pronounceable combination of letters and numbers will be spoken
clearly. It can be applied to tasks such as eyes-free
proofreading, data verification (e.g. spreadsheets), reading
E-mail and more. User-changeable parameters provide control over
the sound quality by allowing for changes in pitch, and the speed
of speech. An exception dictionary saves preferred pronunciation
of words and abbreviations.
Monologue Win32 now includes support for the Microsoft SAPI.
Monologue male "SpeechFonts" are available for US English, British
English, German, French, Latin American Spanish, Italian. A US
English Female SpeechFont is also available.
For more detailed information and examples go to the First Byte
WWW pages.
* Availability: Currently bundled with many sound cards and
multimedia general purpose computer systems. For pricing,
licensing details, and release information see the First Byte WWW
pages or email info@firstbyte.davd.com.
* See also: ProVoice Developer's Speech Toolkit from First Byte
* Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610 Fax: 310-793-0611
Email: info@firstbyte.davd.com
WWW: http://www.firstbyte.davd.com/
* Platform: Amiga
* Description: A US English text to phoneme translator, implemented
as a resident software library, for use with the Amiga Narrator
Device. This software was supplied as a standard part of the Amiga
operating system software up to O.S version 2.04. (Translator
version 37.1, 1991) Approximately 700 translation rules are used
to create the 'ARPAbet' phonemes. This software is functional on
all current Amiga systems (O.S. 3.1).
* Availability: limited to pre-owned system software disks and
unsold O.S upgrade kits (Pre-O.S. 2.1).
* Platform: Amiga
* Description: an independent replacement for the Commodore-supplied
"translator.library" which is a part of the Narrator speech
synthesis package. It implements multi-lingual text-to-speech for
an Amiga. The translation rules for each language are defined in a
plain text 'Accent' file.
There is a provision for the selection of unique languages for
Narrator
* Platform: Amiga
* Description: Formant based speech synthesis. Includes a
Engish-to-phoneme translation library, and a SPEAK: pseudo-device
for speech output.
* Hardware: Standard Amiga hardware
* Availability: Part of AmigaOS
* See Also: The Narrator Translation library
TextToSpeech Kit
1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca
* Platform: Windows
* Description: PAM is a talking personal assistant and text reader
application. It uses the ProVoice TTS package. PAM will verbally
advise about appointments and reminder messages at specified times
during the day. It can read text files, clipboard text, and text
sent in DDE messages. Using the full verbal interface, PAM can be
used by visually challenged individuals. Shareware - thirty day
free trial.
* Requirements: Any Windows sound card, speakers or headphones. Min.
memory - 4 megs, 8 megs recommended.
* WWW: A more complete description is available on the JTS homepage:
http://www.islandnet.com/~tslemko/
* Availability: The shareware can be downloaded by ftp from
ftp://ftp.islandnet.com/jts/pam_en3c.zip. The file size is approx.
1 MByte.
* Price: $US40 for the registered version.
* Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
* Platform: Windows 3.x, NT, 95, OS/2, Unix Solaris, Unix SCO and
hardware
* Description: The ProVerbe Speech Engine from ELAN Informatique
produces natural sounding speech from written text. Naturalness is
achieved by using the TD-PSOLA process from the CNET (France
telecom's research lab.) which is based on the concatenation of
elementary speech units (including diphones). Supported languages
rsynth
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
nth-2.0.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
nth-2.0.tar.gz
* Platform: SGI
* Description: The SGI Developer Toolbox 4.0 CDROM contains a
basicpublic domain text-to-speech program in the publics/speak
directory. The directory includes man pages and source.
* Availability: on the SGI Developer Toolbox 4.0 CDROM
SIMTEL
ftp://ftp.coast.net/SimTel/msdos/voice/
http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
Voicemaker
ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip
spchsyn.exe
* Platform: DOS
* Availability: By anonymous ftp as a self extracting DOS archive.
ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
* Requirements: May require special TI product(s), but all source is
there.
* Platform: Macintosh
* Description: Apple's text-to-speech system extensions that enable
applications to perform text-to-speech conversion. The Speech
Manager runs on most Macs, but PlainTalk (and the high quality
voices) requires a 68020 Mac or better.
* Availability: By anonymous ftp from:
ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
em/PlainTalk 1.4.1/
This directory contains subdirectories for recent versions of
PlainTalk. The current release (PlainTalk 1.4.1) contains the
English Text-To-Speech with about a dozen voices
(English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
(Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
* Cost: Free
* WWW: The latest information is available from Apple's WWW page for
speech recognition and synthesis:
http://www.speech.apple.com/
* Note 1: Check out Kevin Lenzo's list of Macintosh Speech
Applications.
* Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
Plaintalk. For subscription and other information visit the
Plaintalk Discussion List Home page
* Contact: Apple Computer, Inc.
1 Infinite Loop, Cupertino, CA 95014, USA
WWW: http://www.speech.apple.com/
Email: PlainTalk@atg.apple.com
* Platform: unknown
* Description: Text to phoneme program. Based on Naval Research
Lab's set of text to phoneme rules.
* Availability: by anonymous ftp
ftp://shark.cse.fau.edu/pub/src/phon.tar.Z
* Platform: unknown
* Description: Text to phoneme program.
* Availability: by anonymous ftp
ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz
Tinytalk
OMS Development
610-B Forest Ave., Wilmette, IL 60091
Ph: (800)831-0272 Fax: 708-251-5793
Outside North America: (708)-251-5787
Email: ebohlman@netcom.com
TrueTalk
ftp://ftp.entropic.com/pub/truetalk/README.ptt
Washington, D.C.
Voice: 1-800-ENTROPIC (North America), (202) 547 1420
Fax: (202) 547-6648
Email: truetalk@entropic.com
WWW: http://www.entropic.com/
WinSpeech
* Platform: Windows
* Description: WinSpeech is a text-to-speech application that reads
text and produces speech to the audio output. Features basic text
editing tools, talk from editing window, DDE server allows other
Windows applications to send text for talking, coach mode for
providing audio instructions throughout the program, dictionary
editing tools for customizing pronunciation.
WSPLIB text-to-speech DLL is a speech functions library for
developers. More information available by email.
* Requirements: System requirements: IBM PC or compatible computer
with Windows 3.1 or higher. Sound card is recommended but not
required.
* Availability: Freeware available through the PC WholeWare WWW
page.
* Contact: PC WholeWare
33 Justin Street, Lexington, MA 02173, U.S.A.
Email: info@pcww.com
WWW: http://www.pcww.com/index.html
___________________________________________________________________________
Speech Recognition
___________________________________________________________________________
___________________________________________________________________________
Some systems try to "understand" speech. That is, they try to convert
the words into a representation of what the speaker intended to mean
or achieve by what they said.
___________________________________________________________________________
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechReco
gnition
Overview:
Many variations upon the theme can be made to improve the performance.
Try different filtering of the raw signal and different processing
methods.
___________________________________________________________________________
* "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
* "Seybold Report on Desktop Publishing" published a nine-page,
head-to-head comparison of Dragon's DOS software with IBM's OS/2
software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
19063 USA, phone (610) 565-2480.
* McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
published a two-page review of IBM's Personal Dictation System
software. May 1994; Volume ?, Number ?; Pages 145-146;
ISSN:0360-5280; Editorial, Executive, and Circulation address: One
Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?
* The National Center for Voice and Speech provides some basic
information on preserving "Vocal Health" on their WWW site:
http://www.shc.uiowa.edu/hygiene/home.html
On the WWW
Technical
Course Notes
___________________________________________________________________________
In the FAQ:
_Apple Macintosh_
* Digital Dreams Speech Recognition Plug-Ins
* Dragon Dictation Products
* Macintosh Speech Recognition Manager
* PowerSecretary
_DOS_
* DATAVOX - French
* Dragon Developer Tools
* Ficomp Interpreter 6000
* Jialong He's Speech Recognition Research Tool
* smARTspeak from Advanced Recognition Technologies, Inc.
* Votan VPC2100 Voice Card and VSP 1010 Speech Processor
_OS/2_
* IBM VoiceType Dictation and Control
_Unix_
* AbbotDemo
* BBN Hark Telephony Recognizer
* EARS: Single Word Recognition Package
* Ficomp Interpreter 6000
* Hidden Markov Model Toolkit (HTK) from Entropic
* IN CUBE
* Jialong He's Speech Recognition Research Tool
* Lotec Speech Recognition Package
* Myers' Hidden Markov Model software
* NICO Artificial Neural Network Toolkit
* Nuance Speech Recognition System
* PureSpeech
* recnet
_Other Platforms_
* Simon Says (NeXT)
* Voice Command Line Interface (Amiga)
* Visus SpeechKit
_Unknown_
* Berkeley Restaurant Project (BeRP)
* Lernout & Hauspie ASR (3 products)
* Voice-Trek 2.0
* Voicetek Corp.
* Voice Processing Corporation Speech Recognition Product Line
http://www.tiac.net/users/rwilcox/speech.html
1stVoice
2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
Ph: 415-857-1320, Fax: 415-856-6996
WWW: http://www.1stvoice.com/
Email: mail@1stvoice.com
Dragon Dictation Products
Auscript (Australia)
Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000,
Australia
Ph: +61-2-238 6565, Fax: +61-2-238 6566
WWW: http://www.auscript.com.au/
Dragon Systems
BRITE
WWW: http://www.brite.com/
Computer Telephony Integration & Interactive Voice Response
HealthCare Resources
1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
Ph: +1-310-937-5156, Fax: +1-310-937-5159
EMail: Scalif@AOL.COM
Power Secretary & Dragon Dictate. Specializing in:
Medical/Dental, Motion Picture Industry, Carpal Tunnel related
and Disabled Persons.
O'Brien Resources
Ph: (540) 347-4988 (Address unknown)
Email: obrien@crosslink.net
WWW: http://www.crosslink.net/~obrien/
Kurzweil Voice Recognition Products
SCI VoiceAutomated
215 1/2 Main Street, Huntington Beach, CA 92648, USA
Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
http://www.voiceautomated.com/
IBM VoiceType, Kurzweil Voice, DragonDictate and Philips
speech.
Synapse
3095 Kerner Blvd., Suite S, San Rafael, CA 94901, USA
Ph: (415) 455-9700, Fax: (415) 455-9801
Email: SYNAPSE_ADAPTIVE@msn.com
WWW: http://www.synapseadaptive.com/
Dragon Systems, Kurzweil and IBM products.
Talk Technology
Ph: 1-800-270-1672, Fax: 1-516-360-1213
Email: info@talktechnology.com
http://www.talktechnology.com/
ToppCopy Telecom
Email: ffalzett@toppcopy.com
WWW: http://www.toppcopy.com/
Philips Digital Dictation
VoiceWare Systems
230 California Street, Suite 410, San Francisco, CA 94111
Ph: (415) 433-2001, Fax: (415) 433-6909
Email: info@talk2type.com
WWW: http://www.talk2type.com/home.htm
IBM, Dragon Systems, Kurzweil Applied Intelligence, WildCard
Technologies
WorkLink
A.D.A. Solutions by WorkLink
2566-A Telegraph Avenue, Berkeley, California 94704 USA
Ph: 510-848-8363, Fax:510-848-7322
WWW: http://www.worklink.net/
Email: wayne@worklink.net
Dragon Dictation Products
AbbotDemo
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
* Platform: Windows
* Description: Speaker-independent recognition of continuous speech
in real time. Vocabularies can range from small to very large
(more than 60,000 word forms). Support is planned for languages
including English, Danish, Dutch, French, German, Italian,
Norwegian, Spanish, Swedish, and Japanese. The engine complies
with the Microsoft Speech API.
* Contact: Cambridge Group Research, Ltd.
Box 7290, Buffalo Grove, IL 60089
Ph: (708) 821-1040, Fax: (708) 821-1041
E-mail: 76061.3350@compuserve.com
* Platform: Windows
* CustomVoice: Speech recognition custom control for Visual Basic,
DATAVOX - French
* Platform: PC / DOS
* Description: Continuous speech - speaker independent or dependent.
* Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
A/D - D/A module (ASA116)
* Misc: Application software may dialog with DATAVOX through 2 types
of interfaces :
+ Keyboard overlay: The application software may be used with
any PC compatible package. No specific adaptation is
necessary, you only need to define your configuration with
the application software.
+ C library: Allows a user-written program to drive the
recognition system.
DATAVOX is based on the AMADEUS speech recognition software
developed at LIMSI. It provides
+ Continuous speech recognition with 500 words speaker
dependent, 50 words speaker independent (custom-made
vocabulary).
+ Grammar of the application language (syntax acquisition,
verification and simplification software).
+ Large vocabulary : DATAVOX can recognize vocabularies of
several thousand words as long as there are no more than 500
words in the active vocabulary at any given node. It takes
less than 1 second to change syntax and vocabulary.
+ Training controlled by the system (use of co-articulation
models).
+ Response time less than 500 ms for any phrase length.
+ Synthetis (ADPCM) can be heard simultaneously while
recognition is being carried out.
* Contact: VECSYS
Le Chene rond, 91570 Bievres, France
Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30
* Platform: Windows
* Description: Information moved to the page on Dragon Dictation
products including DragonDictate for Windows
* Dragon NaturallySpeaking
* DragonDictate for Windows
* Dragon PowerSecretary
* General Information
Dragon NaturallySpeaking
* Platform: Windows
* Description: General purpose, continuous speech dictation system.
Personal Edition has a 30,000 word active vocabulary and comes
with a 200,000+ word pronunciation dictionary; users can also add
their own words or phrases.
More information on Dragon's NaturallySpeaking web site.
* Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM
(Windows NT 4.0), supported sound card.
* Price: see Dragon's NaturallySpeaking web site.
* Related products: see general information below
* Platform: Windows
* Description: Speech-to-text dictation system. Discrete dictation;
continuous command/control; speaker-adaptive. Also provides mouse
movement for hands-free operation of Windows. Comes with a 120,000
word pronunciation dictionary; users can also add their own words
or phrases. Dictate directly into any application. Available in US
and UK English, French, Italian, German, Spanish, and Swedish.
Add-on vocabularies for medicine, law, business and finance,
computers and technology, journalism.
Available as DragonDictate Singles Editions (10,000 words active),
DragonDictate Personal Edition (10,000 words active),
DragonDictate Classic Edition (30,000 words active), DragonDictate
Power Edition (60,000 words active).
Includes Office97 support.
More information on the Dragon Systems web site.
* Requirements: 486/66, 7-10 MB dedicated RAM (depending on
edition), Windows 3.1x, NT 3.51, or 95.
Supported sound boards: Creative Labs Sound Blaster 16, Microsoft
Windows Sound System, IBM M-Audio Capture/Playback Adapter, many
notebooks with built-in audio.
See Dragon Systems Compatibility list for details.
* Price: Check at the Dragon Systems web site.
* Related products: see general information below
* Contact: see general information below
Dragon PowerSecretary
General Information
* Dragon NaturallySpeaking
* DragonDictate for Windows
* Dragon PowerSecretary
* General Information
* Dragon PhoneQuery
* DragonXTools
* Dragon SpeechTool
* Dragon VoiceTools
Contact:
* Dragon PhoneQuery
* DragonXTools
* Dragon SpeechTool
* Dragon VoiceTools
Dragon PhoneQuery
* Platform: Windows NT
* Description: Software for building voice response systems. Callers
are able to do the following: Ask for information using completely
natural and continuous language. Have a spoken dialog to fine tune
a request. Request information to be faxed, sent by electronic
mail, or read over the phone, using text-to-speech.
More information on the Dragon Systems telephony pages.
* Requirements: Pentium or Pentium Pro PC running Windows NT 4.0.
Telephone interconnect requirements vary by application.
* Related products: see general information below
* Contact: see general information below
DragonXTools
* Platform: Windows
* Description: VBX and OCX controls that allow an application to
control DragonDictate's capabilities, ranging from small
vocabulary command and control to customized large vocabulary
dictation. More information is available on the Dragon Developer
pages
* Related products: see general information below
* Contact: see general information below
Dragon SpeechTool
* Platform: Windows
* Description: Create small, optimized vocabularies for your
speech-enabled applications, or supplement DragonDictate's
extensive built-in vocabularies with specialized terms and names.
More information is available on the Dragon Developer pages
* Related products: see general information below
* Contact: see general information below
Dragon VoiceTools
General Information
* Dragon NaturallySpeaking
* DragonDictate for Windows
* Dragon PowerSecretary
* General Information
* Dragon PhoneQuery
* DragonXTools
* Dragon SpeechTool
* Dragon VoiceTools
Contact:
IN CUBE
MSDOS Version
UK:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pchtool.zip
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.z
ip
Lernout & Hauspie ASR 200/A for the Automotive and Industrial Market
* Platform: Windows
* Description: Windows based Software Development Kits are available
for integrating automatic speech recognition technology with
* Platform: Windows
* Description: Listen for Windows Version 2.0 is a Speaker
Independent software product that provides continuous speech
recognition for Windows applications. The product works with most
industry standard sound cards and PCs with inbedded audio chips.
Listen for Windows comes with over 16,000 commands in speech
interfaces for over 40 software applications, such as MS Office,
Lotus SmartSuite,Quicken, etc. The Listen Command Editor allows a
user to change or add commands to existing speech interfaces or
create new speech interfaces for most Windows applications.
More detailed information is available on the Verbex Listen for
Windows page.
Verbex also sells Verbal Advantage Voice Browser for controlling a
web browser, Verbal Advantage DeskTop for controlling desktop
applications.
* Requirements: 486/25SX PC or higher
* Pricing and Availbility: See the Verbex ordering page for pricing.
Verbex products are available over the web or can be shipped.
Microphones available from Verbex.
* Demo: A "Freeware" demo is available from the Verbex WWW site demo
page.
* Contact: Verbex Voice Systems
1090 King Georges Post Rd., Bldg 107, Edison NJ 08837, USA
Ph: 1-800-ASK-VRBX, (908) 225-5225, Fax:(908) 225-7764
WWW: http://www.verbex.com/
* Platform: Sun
* Description: Public domain speech recognition software. Operates
from input in Sun audio format (.au files) and outputs word
hypotheses and time labelling data. The software includes programs
to collect speech samples, a labeller, a "featurizer" which
parameterises speech files, a word spotter and the recogniser. The
software can real time recognition on a Sparc 10 for small
vocabularies.
* Requirements: Sun SPARC audio input and a "decent" microphone Sun
multimedia demo software (in /usr/demo/SOUND) and X.
* Availability: By anonymous ftp
ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
* Contact: Nigel Ward: _nigel@sanpo.t.u-tokyo.ac.jp _
* Platform: Macintosh
* Description: supports developers who wish to add speech
* Platform: Windows 95
* Description: Provides command and control speech recognition using
SAPI (the Microsoft Speech API) and "Whisper", Microsoft's speech
recognition technology. Features include:
+ Speaker independent, continuous, sub-word modeling, context
free grammars
+ Has its own letter-to-sound rules means it can recognize any
words in a grammar.
+ North American English
+ PC microphone and telephone speech recognition with high
performance
+ Word spotting option
+ Results objects containing top-N choices, segmentation, and
confidence
+ Written to SAPI, the Microsoft Speech API.
* Requirements: Windows 95 or Windows NT 4.0, Pentium 60 or better.
(RISC builds are available), 1.5 megabyte working set, 16 kHz or 8
kHz input signals, 6 megabytes on disk, Requires Microsoft Speech
SDK to use.
* Availability: Free demo software is available at:
http://www.research.microsoft.com/research/srg/install.htm
* More information: http://www.research.microsoft.com/research/srg/
* Platform: Unix
* Description: Hidden Markov model software for automatic speech
recognition. C++ code that implements a basic left-right hidden
Markov model and corresponding Baum-Welch (ML) training algorithm.
It is meant as an example of the HMM algorithms described by
L.Rabiner and others. The code was built in order to learn how HMM
systems work and we are now offering it to the net so that others
can learn how to use HMMs for speech recognition. Keep in mind
that ease of understanding was our primary concern, not
efficiency. The code can be used to build an experimental speech
recognition systems using "train_hmm" and "test_hmm", and can be
used in conjunction with written tutorials on HMMs to understand
how they work.
* Availability: By anonymous ftp from the comp.speech archive site.
There are two files in the directory
+ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
The files are
+ hmm.README
+ hmm-1.03.tar.gz
* Contact: Richard Myers: rmyers@isx.edu
NCC Dictate
* Platform: Windows
* Description: NCC Digital DictateTM is an add-on, enhanced
interface for use with IBM's VoiceType(TM) Dictation for Windows
and various Windows 3.1 applications (e.g. MS Word, WordPerfect).
Digital DictateTM provides faster corrections and dictation rates
and various other features. This version is not a stand alone
product; it requires VoiceTypeTM Dictation to provide the speech
recognition engine and the Windows application. Features include:
+ Direct dictation into Windows applications with access to all
functions while dictating.
+ Versions for MS Word, WordPerfect, Ami Pro, and other Windows
applications.
+ Speech enabled editing.
+ Capability to save speaker models and defer corrections.
+ Microphone "pause and restore" functions controlled with
speech commands.
+ Add-on vocabularies for legal, medical, science and business.
+ SWITCH-ITTM foot pedal control or CardSwitchTM infrared
wireless control available which switch between dictation and
proofing/correction modes.
* Requirements: IBM's VoiceTypeTM Dictation for Windows; a computer
system meeting VoiceTypeTM Dictation for Windows requirements;
VoiceTypeTM Dictation Adapter.
* Availability: Through computer dealerships.
* Price: $US295
* Contact: NCC Incorporated
5808 E. Turquoise, Scottsdale, AZ 85253
Ph: (602) 922-6236 Fax: (602) 596-9050
* Platform: Windows
* Description: Speaker independent, 40,000 word vocabulary,
continuous speech recognition for MS Windows. Grammars with high
perplexity possible. Includes noise rejection. Uses proprietary
DSP board.
* Cost: Prices in US$ - quantity one. The PE500 SDK is $995.00
including board, microphone, and runtime software. Runtime only is
$595.00. SpeechWizard(r) adds speech input to existing Windows
applications, $295.00. Two-day training: $295.00 with purchase,
$595.00 without.
* Misc: The user defines the grammar of allowed utterances and must
write software to invoke the board driver functions that control
recognition. The user must also write software to
SpeechMagic: Dictation
Dragon PowerSecretary
* Platform: Apple
* Description: Information moved to the page on Dragon Dictation
products including Dragon PowerSecretary
(Previously Articulate PowerSecretary.)
* Platform: Windows
* Description: ProNotes Voice Tools are designed to bring the speech
recognition capabilities of the IBM VoiceTypeTM Dictation System
for Windows into any program without the need for the programmer
to directly interface with the speech engine at the API level.
There are five tools, as described below, which are all available
in three forms: Visual Basic(TM) Custom Controls (known as VBXs),
16-bit OLE Custom Controls, and 32-bit OLE Custom Controls. The
tools are intended for use by Windows(TM) developers working with
Windows 3.1(TM), Windows for Workgroups 3.11(TM), Windows NT 3.51
Workstation(TM), and Windows 95(TM). The custom controls can be
utilized with any application development environment which
supports the use of such controls (e.g. Visual Basic and Visual
C++).
Voice Button
An object having standard button properties and behavior,
which can additionally be controlled by voice. The button
can also be used as a label or a 3D panel.
Dictation Window
A text box that allows free dictation, voice macro
utilization, and correction by voice. Each Dictation
Window has access to global and context sensitive
vocabularies for both command and dictation. There are
three correction modes.
Voice Navigator
Provides navigation by voice within an application
developed with the Voice Tools, between voice-enabled
objects described above, as well as some standard objects
found within the application.
recnet
* Platform: UNIX
* Description: Speech recognition for the speaker independent TIMIT
and Resource Management tasks. It uses recurrent networks to
estimate phone probabilities and Markov models to find the most
probable sequence of phones or words. The system is a snapshot of
evolving research code. There is no documentation other than
published research papers. The components are:
+ A preprocessor which implements many standard and many non-
standard front end processing techniques.
+ A recurrent net recogniser and parameter files
+ Two Markov model based recognisers, one for phone recognition
and one for word recognition
+ A dynamic programming scoring package. The complete system
performs competatively.
* Cost: Free
* Requirements: TIMIT and Resource Management databases
* Contact: Tony Robinson: _ajr@eng.cam.ac.uk_
* Availability: by anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/r
ecnet-1.3.tar.Z
* Platform: NeXT
* Description: Provides the ability to link commands to spoken
phrases.
* Availability:By anonymous ftp.
Simon Says demo
ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
/audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz
Readme file
ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
/audio-apps/SimonSaysDemo.1.5.1.README
* Contact: Metrosoft
710 13th Street, Suite 310 X, San Diego, California 92101
Ph: 619.488.9411 Fax: 619.488.3045
Email: info@metrosoft.com [NeXTmail welcome]
Software-Only Solution
The software only solution uses Telaccount's SpeechEasy
technology for discrete recognition using your PC's CPU.
A vocabulary is included with digits, basic command words
and more.
* Pricing: Unknown
* Availability: From Stylus Innovations Inc. or from the
distributors listed on the Stylus WWW pages.
* Misc: More detailed technical information, slide show
demonstration software is available on the Stylus home page.
* Platform: Amiga
* Description: VCLI will execute CLI commands, ARexx commands, or
ARexx scripts by voice command through your audio digitizer. VCLI
allows you to launch multiple applications or control any program
with an ARexx capability entirely by spoken voice command. VCLI is
fully multitasking and will run in the background, continuously
listening for your voice commands even while other programs are
running. Documentation is provided in AmigaGuide format. VCLI 6.0
runs under either Amiga DOS 2.0 or 3.0.
* Requirements: Supports the DSS8, PerfectSound 3, Sound Master,
Sound Magic, and Generic audio digitizers.
* Availability: by ftp from wuarchive.wustl.edu in the file
systems/amiga/incoming/audio/VCLI60.lha and from
amiga.physik.unizh.ch as the file pub/aminet/util/misc/VCLI60.lha
* Contact: Author's email is RHorne@cup.portal.com
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/
Visus SpeechKit
* Platform: NeXT
* Description: SpeechKit is based on SPHINX, a speaker-independent,
1000 word or so, continuous speech recognition system which allows
you to incorporate speech recognition into your applications. You
can design your vocabulary and grammars.
* Contact: Visus - no address or phone provided. A possible contact
is Robert Brennan at Carnegie Mellon University. email:
Robert_Brennan@cmu.edu
Voice-Trek 2.0
* Platform: Unknown.
* Description: VoiceTrek is primarily used by the United States
Postal Service to sort mail. Tardis Technology Inc. was created to
develop and market applications that utilize speech recognition.
They do consulting work as well as turnkey systems.
* Contact: Tardis Technology Inc., Voice Recognition Div.
6444 E. Spring St., #286, Long Beach, CA 90815-1500, USA
Phone: +1-310-497-0077, Fax: +1-310-497-0080
* Platform: Windows
* Description: Seeking a description.
* Availability: VoiceAssist preview software is available from the
* Platform: Windows
* Description: Speaker dependent, each with an independent
directory. Isolated words. Up to 1000 words/user, 300
words/window. 1 word occupies 2Kb on hard disk. Can be used to
control Windows applications by issuing voice commands instead of
menu selection.
* Rough Cost: 292 Pounds(UK)
* Requirements: None
* Misc: Price includes a half-sized AT voice card (including a DSP),
software, documentation & a microphone (attachable to keyboard or
speaker). A light-weight high-spec headset is an optional extra.
* Contact:
Mark Redwood
Applied Voice Technologies
26 Danbury Street, Islington,
London, UK, N1 8JU
Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225
Voicetek Corp.
* Platform: Unknown.
* Description:Voicetek Corporation provides voice processing
solutions, training and consulting services and an
object-oriented, graphical Generations Platform for development of
integrated computer telephony systems.
* Contact: Voicetek Corporation
19 Alpha Road, Chelmsford, MA 01824, USA
Ph: +1-508-250-9393, Fax: +1-508-250-9378
WWW: http://www.voicetek.com/
* Platform: DOS
* VPC2100 Voice Card: a hardware and software system based on the
TMS320C10. providing continuous speech recognition. The VPC2100
consists of a circuit board, microphone, speaker, software, and
documentation. It is designed to add voice I/O and telephone
management capabilities to the PC/AT and compatibles. Features:
+ Voice store-and-forward at 4- to 16.4-Kb/s speed
+ Speaker-independent speech recognition (0-9, YES, NO)
+ Continuous speaker-dependent speech recognition
+ Telephone interface, pulse or tone dialing, call progress,
and DTMF
+ Software for development, voice mail, telephone management,
and VoiceKey
+ High-level applications-generator software
* Votan VSP 1010 speech-processor board: can service a single voice
channel, providing recognition, voice output, and telephone
interfacing. Digital signal processing is performed by a TMS320
integrated circuit.
* Costs: Unknown
* WWW: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
* Contact: Votan Division, MOSCOM Corporation
6920 Koll Center Parkway, Suite 214, Pleasanton, CA 94566, USA
Ph: +1-510-426-5600, Fax: +1-510-426-6767
* Platform: Unknown.
* Description: Voice Processing Corporation (VPC) supplies automated
speech recognition systems. VPC's products are used in the
telecommunications, cellular and personal computer markets to
enable computers to understand human speech. The company's VPro
product line is sold to original equipment manufacturers (OEMs),
value added resellers (VARs), system integrators and application
developers. VPC's speech recognition systems are currently used in
applications such as voice mail, voice activated dialing,
interactive voice response, and command and control of personal
computers.
The following are descriptions of the Voice Processing
Corporation's VPro Product Line: VProContinuous, VPro/XD, VPro/RT,
VProCel, VProSpeller, VProPRL, VPro hardware platforms, and the
application Osprey.
More information is available on these products at the VPC WWW
site: http://www.vpro.com/
* VProContinuous(TM) is a speaker-independent, continuous digit
recognizer. It recognizes digit strings spoken in a continuous
manner, by any caller, without unnatural beeps or pauses.
VProContinuous uses out-of-vocabulary rejection and word spotting
technologies to reject extraneous words and phrases often spoken
by callers. The VProContinuous vocabulary consists of the words
"zero" through "nine," "yes," "no," and "oh." The product is
language-independent. American English, Australian English,
Brazilian Portuguese, Canadian French, Castilian Spanish, French,
German, Italian, Mexican Spanish, Portuguese, Swiss German and
U.K. English versions are available.
* VPro/XD(TM) is a discrete or multiword speech recognizer for
extra-demanding applications and/or vocabularies. This robust
discrete product recognizes isolated discrete utterances (words or
very short phrases). VPro/XD utilizes proprietary
out-of-vocabulary rejection and word-spotting technologies.
VPro/XD is speaker-independent and includes Talkover capability
allowing speech-interrupt over prompts. Pre-trained vocabulary
libraries are available in American English, Australian English,
Brazilian Portuguese, Canadian French, Castilian Spanish, Central
American Spanish, German, Italian, Mandarin Chinese, Mexican
Spanish, Portuguese, Swiss German and UK English. Pre-trained
vocabularies consisting of voice mail words, voice dialing words,
call control words, banking, and emergency words are available in
American English (both cellular and land-line).
* VPro/RT(TM) is a discrete speech recognizer for rapid training of
vocabularies in the field. This robust discrete product recognizes
isolated discrete utterances. Application designers and end-users
define the vocabulary of their choice and train the system in
real-time either prior to system start-up, or adapting on-the-fly
while the system is running live. Vocabularies can be subset, and
applications involving thousands of words can be developed
quickly. VPro/RT, which also supports Talkover, is suited to
speaker-dependent recognition tasks, such as the personal
directory of names in a voice-activated dailing application.
VPro/RT is also good for applications that require
speaker-independent vocabularies to be developed quickly in the
field or those that require many vocabularies. VPro/RT can also be
used as a tool for quick prototyping of applications.
Whisper
OfficeTalk page.
* LawTalk for Windows: adds features and interfaces that meet the
specific needs of legal users. More information on the WildCard
LawTalk page.
* VoiceCompanion for the Internet: Surf the net using voice
commands. Controls browsers like Netscape and Microsoft Explorer.
More information on the VoiceCompanion web page.
* VoiceCompanion - RemoteAccess: Over the telephone remote access to
your desktop PC, for voicemail, FAX forwarding and address book
information. More information on the VoiceCompanion web page.
* Availability: WildCard Technologies Inc.
180 West Beaver Creek Road, Richmond Hill, Ontario, Canada L4B 1B4
___________________________________________________________________________
* Introduction
* In the FAQ
* On the WWW
Introduction
In the FAQ:
On the WWW
MSDOS Version
UK:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pkrtool.zip
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.z
ip
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
pkr_sun_v1.tar.gz
Germany:
ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_su
n_v1.tar.gz
___________________________________________________________________________
In the FAQ...
___________________________________________________________________________
---
Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories Ph: (508) 442-2681
2 Elizabeth Drive, MS UCHL03-207 Fax: (508) 250-5067
Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com
SpeechLinks
Speech Technology Hyperlinks Page
comp.speech FAQ
Following is the list of all the hyperlinks from the comp.speech FAQ. This is probably the biggest list of speech technology links
available. The links are provided to WWW references, ftp sites, and newsgroups. Cross-references to the comp.speech WWW pages
are also provided.
Numbers of links:
SpeechLinks Pages
WWW Links
1stVoice Dragon Systems reseller
URL: http://www.1stvoice.com/
comp.speech refs: [1]
21st Century Eloquence: speech recognition reseller
URL: http://www.voicerecognition.com/
comp.speech refs: [1]
32 kbps ADPCM
URL: http://www.cwi.nl/ftp/audio/adpcm.shar
comp.speech refs: [1]
Academic Press Limited: Computer Speech and Language Journal
URL: http://www.apnet.com/
comp.speech refs: [1]
URL: http://www.cogsci.ed.ac.uk/ccs/home.html
comp.speech refs: [1]
Centre for Speech Technology Research, Edinburgh University
URL: http://www.cstr.ed.ac.uk/
comp.speech refs: [1]
Ciaran McElroy's Speech Coding Page
URL: http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.html
comp.speech refs: [1]
CMU dictionary on the WWW
URL: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
comp.speech refs: [1] - [2]
COCOSDA Home page
URL: http://www.itl.atr.co.jp/cocosda/
comp.speech refs: [1]
Cognitive Science Laboratory at Princeton University
URL: http://www.cogsci.princeton.edu/
comp.speech refs: [1]
Colibri mailing list
URL: http://colibri.let.ruu.nl/
comp.speech refs: [1]
comp.dsp newsgroup FAQ
URL: http://www.bdti.com/faq/dsp_faq.htm
comp.speech refs: [1] - [2] - [3]
Comprehensive list of FFT software
URL: http://tjev.tel.etf.hr/josip/DSP/fft.html
comp.speech refs: [1]
Comprehensive list of WWW dictionaries, acronym lists, translation resources, and a Thesaurus.
URL: http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-Information/Dictionaries-etc.html
comp.speech refs: [1]
comp.speech FAQ at Cambridge University (UK)
URL: http://svr-www.eng.cam.ac.uk/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
comp.speech FAQ at CMU: USA
URL: http://www.speech.cs.cmu.edu/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
comp.speech FAQ at Sydney University: Australia
URL: http://www.speech.su.oz.au/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
comp.speech FAQ: ATR ITL, Japan
URL: http://www.itl.atr.co.jp/comp.speech/
comp.speech refs: [1] - [2] - [3] - [4]
Computation and Language E-Print Archive
URL: http://xxx.lanl.gov/cmp-lg/
Refs: [1]
Computational Linguistics journal home page
URL: http://www-mitpress.mit.edu/jrnls-catalog/comp-ling.html
comp.speech refs: [1]
Computational Phonology: special issue of Computational Linguistics
URL: http://mitpress.mit.edu/jrnls-catalog/comp-ling-abstracts/comp-ling20-3.html
comp.speech refs: [1]
Computing and Information Systems Department (CISD) of Rutherford Appleton Laboratory, UK
URL: http://www.cis.rl.ac.uk/index.html
comp.speech refs: [1]
Consortium for Lexical Research
URL: http://crl.nmsu.edu/clr/CLR.html
comp.speech refs: [1]
URL: http://www.dragonsys.com/marketing/dragondeveloper.html
comp.speech refs: [1]
Dragon home page
URL: http://www.dragonsys.com/
comp.speech refs: [1] - [2]
Dragon NaturallySpeaking
URL: http://www.naturallyspeaking.com/
comp.speech refs: [1]
Dragon PowerSecretary
URL: http://www.dragonsys.com/marketing/powersecretary.html
comp.speech refs: [1]
Dragon Telephony Products
URL: http://www.dragonsys.com/marketing/telephony.html
comp.speech refs: [1]
Duncan M. Forrest's Speech Recognition Resource List
URL: http://www.skye.co.za/dmf/speech/
comp.speech refs: [1]
Dynastat, Inc: Speech Intelligibility and Quality Testing
URL: http://www.bga.com/dynastat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus
URL: http://www.cis.rl.ac.uk/proj/psych/eat/eat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus: WWW Interactive version
URL: http://www.cis.rl.ac.uk/proj/psych/eat.html
comp.speech refs: [1]
Eg3 Communications: DSP Internet Resources
URL: http://www.eg3.com/dsp.htm
comp.speech refs: [1]
Eg3 Communications: Engineering Information Online
URL: http://www.eg3.com/
comp.speech refs: [1]
Elan Informatique demo registration
URL: http://www.elan.fr/speech/spe-LITO.htm
comp.speech refs: [1]
Elan Informatique: Proverbe demonstration software
URL: http://www.elan.fr/vocal/technical/demoSE.htm
comp.speech refs: [1]
Elan Informatique: Proverbe sample sound files
URL: http://www.elan.fr/vocal/technical/sndwave.htm
comp.speech refs: [1]
Elan Informatique: Proverbe speech synthesis
URL: http://www.elan.fr/vocal/prod-pse.htm
comp.speech refs: [1]
Elan Informatique: ProVerbe Speech Synthesis Engine
URL: http://www.elan.fr/
comp.speech refs: [1]
Eloquence speech synthesis
URL: http://www.eloq.com/
comp.speech refs: [1]
Elsevier Science: Speech Communication journal
URL: http://www.elsevier.com/
comp.speech refs: [1]
Emacspeak - A Speech Output Subsystem For Emacs
URL: http://www.research.digital.com/CRL/personal/raman/emacspeak/emacspeak.html
comp.speech refs: [1]
Emacspeak FAQ
URL: http://www.research.digital.com/CRL/personal/raman/emacspeak/faqs.html
comp.speech refs: [1]
Entropic Research Laboratory home page
URL: http://www.entropic.com/
comp.speech refs: [1] - [2] - [3]
Entropic Signal Processing System (ESPS)
URL: http://www.entropic.com/esps.html
comp.speech refs: [1]
HTK (Hidden-Markov Model Toolkit)
http://htk.eng.cam.ac.uk/
comp.speech refs: [1]
ESCA: European Speech Communication Association list of research sites
URL: http://ophale.icp.grenet.fr/esca/labos.html
comp.speech refs: [1]
European Language Resources Association
URL: http://www.icp.grenet.fr/ELRA/home.html
comp.speech refs: [1]
European Speech Communication Association (ESCA) home page
URL: http://ophale.icp.grenet.fr/esca/esca.html
comp.speech refs: [1]
Eurovocs speech synthesis
URL: http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
comp.speech refs: [1] - [2]
FAQ: How can I use the Internet as a telephone?
URL: http://rpcp.mit.edu/~asears/voice-faq.html
comp.speech refs: [1]
Festival Speech Synthesis System: download software
URL: http://www.cstr.ed.ac.uk/projects/festival/download.html
comp.speech refs: [1]
Festival Speech Synthesis System: home page
URL: http://www.cstr.ed.ac.uk/projects/festival.html
comp.speech refs: [1] - [2]
FFTW software
URL: http://theory.lcs.mit.edu/~fftw
comp.speech refs: [1]
Ficomp Inc. Interpreter 6000
URL: http://www.ficompsystems.com/
comp.speech refs: [1]
Free Speech Journal
URL: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html
comp.speech refs: [1]
Fundamentals of Speech Recognition Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course lecture notes by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/lectures/
comp.speech refs: [1] - [2]
Fundamentals of Speech Recognition course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1996/ee_8993/SYLLABUS.ps
comp.speech refs: [1] - [2]
G.729 Annex A from Sipro Lab Telecom Inc
URL: http://www.sipro.com/g729a.html
comp.speech refs: [1]
George L. Dillon's Consonant sounds of English
URL: http://weber.u.washington.edu/~dillon/consonants.html
URL: http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
comp.speech refs: [1]
IBM VoiceType System Requirements
URL: http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html
comp.speech refs: [1]
ICSLP '98: Sydney, Australia
URL: http://cslab.anu.edu.au/icslp98/
comp.speech refs: [1]
ICSLP'96: Philadelphia
URL: http://www.asel.udel.edu/speech/icslp.html
comp.speech refs: [1]
IEEE Home Page
URL: http://www.ieee.org/
comp.speech refs: [1]
IEEE Signal Processing Society
URL: http://www.ieee.org/sp/index.html
comp.speech refs: [1]
IGE
URL: http://www.york.ac.uk/~rpf1/IGE.html
comp.speech refs: [1]
IN CUBE for Windows 95
URL: http://www.commandcorp.com/cci/win95.html
comp.speech refs: [1]
IN CUBE from Command Corp. Inc.
URL: http://www.commandcorp.com/incube_welcome.html
comp.speech refs: [1]
IN CUBE Mark II Pro for Windows NT
URL: http://www.commandcorp.com/cci/pront.html
comp.speech refs: [1]
IN CUBE Voice Command for Sun SPARCstations
URL: http://www.commandcorp.com/cci/in3sparc.html
comp.speech refs: [1]
Infolingua Bibliographies
URL: http://gomer.mlink.net/infolingua.html
comp.speech refs: [1] - [2] - [3]
Infovox Multi-Lingual Speech Synthesis Products
URL: http://www.promotor.telia.se/NYA/cc/t-s/index.html
comp.speech refs: [1]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
Institute for Communications Research and Phonetics, University of Bonn: Hadifix speech synthesis
URL: http://asl1.ikp.uni-bonn.de/Welcome.html
comp.speech refs: [1]
Institute for Language Speech and Hearing, the University of Sheffield
URL: http://www.dcs.shef.ac.uk/research/ilash/
comp.speech refs: [1]
Institute for Perception Research: Speech on the Web
URL: http://www.tue.nl/ipo/hearing/webspeak.htm
comp.speech refs: [1]
Institute for Signal and Information Processing (ISIP) at Mississippi State University
URL: http://www.isip.msstate.edu/
comp.speech refs: [1] - [2] - [3] - [4]
Institute of Phonetic Sciences, University of Amsterdam
URL: http://fonsg3.let.uva.nl/Welcome.html
comp.speech refs: [1]
URL: http://www.blackwellpublishers.co.uk/labs/
comp.speech refs: [1]
List of Links Relating to Sound Computation
URL: http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html
comp.speech refs: [1]
List of online dictionaries: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Dictionaries
comp.speech refs: [1]
List of speech conferences and meetings: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Meetings
comp.speech refs: [1]
List of speech research sites: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/Other_pages.html#Phonetics
comp.speech refs: [1]
Listen2 web page
URL: http://www.islandnet.com/jts/listen2.htm
comp.speech refs: [1]
Lists of References on Automatic Speaker Verification
URL: http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html
comp.speech refs: [1]
Louis Pols's List of References on Synthesis Development And Assessment
URL: http://www.itl.atr.co.jp/cocosda/output/synth.refs
comp.speech refs: [1]
LPC-10 speech coding software
URL: http://www.arl.wustl.edu/~jaf/lpc/
comp.speech refs: [1]
Lucent Technologies Bell Labs Text-to-Speech
URL: http://www.bell-labs.com/project/tts/
comp.speech refs: [1] - [2]
Lucent Technologies Bell Labs Text-to-Speech: system description
URL: http://www.bell-labs.com/project/tts/tts-overview.html
comp.speech refs: [1]
Lyricos singing speech synthesis
URL: http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
comp.speech refs: [1]
Macintosh Speech: Developer's Information
URL: http://www.speech.apple.com/speech/dev/dev.html
comp.speech refs: [1]
Macintosh Speech Page: Speech Manager and PlainTalk
URL: http://www.speech.apple.com/
comp.speech refs: [1] - [2] - [3] - [4]
MacYack Pro Speech Synthesis software
URL: http://www.lowtek.com/macyack/
comp.speech refs: [1]
Malcolm Crawford's home page
URL: http://www.dcs.shef.ac.uk/~malc/
comp.speech refs: [1]
Malcolm Slaney's home page
URL: http://www.interval.com/~malcolm/
comp.speech refs: [1]
Man-Machine Interfacing
URL: http://www.speechrec.com/
comp.speech refs: [1]
Martin Cooke's home page: auditory modelling and speech recognition in noise
URL: http://www.dcs.shef.ac.uk/~martin/
comp.speech refs: [1]
URL: http://www.nortel.com/entprods/multimedia/applications/audiogrm.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Auto Attendant
URL: http://www.nortel.com/entprods/multimedia/applications/autoattd.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Dialing
URL: http://www.nortel.com/entprods/multimedia/applications/vad.html
comp.speech refs: [1]
Nortel: Multimedia Network Applications: Voice-Activated Premier Dialing
URL: http://www.nortel.com/entprods/multimedia/applications/premdial.html
comp.speech refs: [1]
Nortel: Network Applications Vehicle
URL: http://www.nortel.com/entprods/multimedia/nav.html
comp.speech refs: [1]
Nortel: Northern Telecom, provider of network voice applications
URL: http://www.nortel.com/
comp.speech refs: [1]
N!Power
URL: http://www.silcom.com/~stilarry/
comp.speech refs: [1]
Nuance Communications: Speech recognition
URL: http://www.nuance.com/
comp.speech refs: [1]
O'Brien Resources: Speech Recognition Sales
URL: http://www.crosslink.net/~obrien/
comp.speech refs: [1]
OfficeTalk and LawTalk from WildCard
URL: http://www.wildcardtech.com/
comp.speech refs: [1]
OfficeTalk from WildCard
URL: http://www.wildcardtech.com/speech/info/offtalk.htm
comp.speech refs: [1]
OGI Synthesis using Festival
URL: http://www.cse.ogi.edu/CSLU/research/TTS
comp.speech refs: [1]
Online bibiliography for Phonetics and Speech Technology
URL: http://www.uni-frankfurt.de/~ifb/bib_engl.html
comp.speech refs: [1] - [2] - [3]
Online Speech Synthesis: Institute of Phonetic Sciences
URL: http://fonsg3.let.uva.nl/IFA-Features.html
comp.speech refs: [1]
Orator from Bellcore: home page
URL: http://www.bellcore.com/ORATOR/
comp.speech refs: [1] - [2]
PAM - A Text-To-Speech Application
URL: http://www.islandnet.com/~tslemko/
comp.speech refs: [1]
Pavarobotti synthesis technology from the National Center for Voice and Speech
URL: http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
comp.speech refs: [1]
Peter Meijer's home page
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/
comp.speech refs: [1]
Peter Meijer's "the vOICe" Java applet/application
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/javoice.htm
comp.speech refs: [1]
URL: http://pscinfo.psc.edu/~geigel/menus/sound.html
comp.speech refs: [1]
Soundcard WWW Site
URL: http://www.wi.leidenuniv.nl/audio/
comp.speech refs: [1]
Speak Freely audio networking software
URL: http://www.fourmilab.ch/netfone/windows/speak_freely.html
comp.speech refs: [1]
Speaker Identification And Verification: LIMSI Report
URL: http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html
comp.speech refs: [1]
SpeakerKey Speaker Verification: FAQ
URL: http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.html
comp.speech refs: [1]
Speech and Hearing Research Group, University of Sheffield
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/
comp.speech refs: [1]
Speech and Hearing Research Group, University of Sheffield: Links to Speech Sites
URL: http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclinks.html
comp.speech refs: [1]
Speech and Language Technology Club
URL: http://salt.essex.ac.uk/salt/
comp.speech refs: [1]
Speech Applications Project at Sun Microsystems Laboratories: SpeechActs
URL: http://www.sunlabs.com/research/speech/
comp.speech refs: [1]
Speech Coding and Synthesis Book
URL: http://www.elsevier.nl/section/engtech/scs/menu.htm
comp.speech refs: [1] - [2]
Speech Coding Demonstration
URL: http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html
comp.speech refs: [1]
Speech Communications journal home page
URL: http://www.elsevier.nl:80/eee/specom/contents.html
comp.speech refs: [1]
Speech Groups List from Leeds University Cognitive Psychology Research Group
URL: http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html
comp.speech refs: [1]
Speech Recognition Course Notes
URL: http://www.isip.msstate.edu/publications/1996/speech_recognition_short_course
comp.speech refs: [1]
Speech Recognition List: Applied Speech Technology Laboratory of CLSI at Stanford
URL: http://csli-www.stanford.edu/users/bscott/SRTech.html
comp.speech refs: [1]
Speech Research List
URL: http://mambo.ucsc.edu/psl/speech.html
comp.speech refs: [1]
Speech Systems Phonetic Engine speech recognition
URL: http://www.speechsys.com/
comp.speech refs: [1]
Speech Technology Research Ltd.
URL: http://www.speechtech.com/home/speechtech/
comp.speech refs: [1] - [2]
Speech Toys
URL: http://www.speechtoys.com/
comp.speech refs: [1] - [2]
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
comp.speech refs: [1] - [2]
Tony Robinson's speech analysis course: Voicing analysis
URL: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node68.html
comp.speech refs: [1]
ToolVox from Voxware
URL: http://www.voxware.com/
comp.speech refs: [1]
ToppCopy Telecom: Speech recognition reseller
URL: http://www.toppcopy.com/
comp.speech refs: [1]
Trainable text-to-phoneme software by Antonio Lucca
URL: http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
comp.speech refs: [1]
TrueSpeech capability for WWW pages
URL: http://www.dspg.com/webpage.htm
comp.speech refs: [1]
TrueSpeech from DSP Group
URL: http://www.dspg.com/index.html
comp.speech refs: [1]
TrueTalk from Entropic
URL: http://www.entropic.com/truetalk.html
comp.speech refs: [1]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/centigram/Products/Technology/Truvoice/TruVoice_Brochure.html
comp.speech refs: [1]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/centigram/TruVoice/index.html
comp.speech refs: [1] - [2]
TruVoice speech synthesis from Centigram
URL: http://www.centigram.com/
comp.speech refs: [1]
Typing Injuries Page
URL: http://alumni.caltech.edu/~dank/typing-archive.html
comp.speech refs: [1]
Typing Injury FAQ
URL: http://www.cs.princeton.edu:80/~dwallach/tifaq/
comp.speech refs: [1]
University of Edinburgh
URL: http://www.ed.ac.uk/
comp.speech refs: [1]
University of Victoria Phonetic Database
URL: http://www.speechtech.com/home/speechtech/csl3.html
comp.speech refs: [1]
VAULT Speaker Verification
URL: http://www.ImagineNation.com/Pavilion/Vault/Vault.htm
comp.speech refs: [1]
VAULT Speaker Verification FAQ
URL: http://www.ImagineNation.com/Xanadu/Vault/Vault.htm
comp.speech refs: [1]
VAULT Speaker Verification from ImagineNation
URL: http://www.ImagineNation.com/
comp.speech refs: [1]
Verbex demonstration speech recognition software
URL: http://www.verbex.com/demo.htm
comp.speech refs: [1]
FTP Links
A comprehensive list of American words
URL: ftp://wocket.vantage.gte.com/pub/standard_dictionary
comp.speech refs: [1]
AbbotDemo speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
comp.speech refs: [1]
AF Audio Networking System
URL: ftp://crl.dec.com/pub/DEC/AF
comp.speech refs: [1]
Answers to Frequently Asked Questions about Usenet
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to_Frequently_Asked_Questions_about_Usenet
comp.speech refs: [1] - [2]
Aria Soundcard FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_FAQ_v1.05
comp.speech refs: [1]
Aria Soundcard Support List
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_Support_List_v2.09
comp.speech refs: [1]
Audio file format conversion for G.723, G.721, A-law, u-law and linear
URL: ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
comp.speech refs: [1]
Audio file formats guide by Guido van Rossum
URL: ftp://ftp.cwi.nl/pub/audio/
comp.speech refs: [1]
Auditory Toolbox for Matlab
URL: ftp://ftp.apple.com/pub/malcolm
comp.speech refs: [1]
BEEP pronunciation dictionary
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep-0.7.README
comp.speech refs: [1]
BEEP pronunciation dictionary
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep.tar.gz
comp.speech refs: [1]
Brill part of speech tagger
URL: ftp://ftp.cs.jhu.edu/pub/brill/
comp.speech refs: [1]
Brill part of speech tagger: data and utilities
URL: ftp://ftp.cs.jhu.edu/pub/brill/Misc/
comp.speech refs: [1]
Brill part of speech tagger: papers and descriptions
URL: ftp://ftp.cs.jhu.edu/pub/brill/Papers/
comp.speech refs: [1]
Brill part of speech tagger: software
URL: ftp://ftp.cs.jhu.edu/pub/brill/Programs/
comp.speech refs: [1]
CELP 3.2a and LPC-10
URL: ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.gz
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.Z
comp.speech refs: [1]
Center for Spoken Language Understanding (CSLU): speech database
URL: ftp://speech.cse.ogi.edu/pub/releases
comp.speech refs: [1]
CMU dictionary
URL: ftp://ftp.cs.cmu.edu/project/fgdata/dict/
comp.speech refs: [1]
comp.ai FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.ai/
comp.speech refs: [1] - [2]
comp.compression FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.compression/
comp.speech refs: [1] - [2]
comp.dsp FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
comp.speech refs: [1] - [2]
comp.speech FAQ (text version)
URL: ftp://rtfm.mit.edu/pub/usenet/comp.speech/
comp.speech refs: [1] - [2] - [3]
comp.speech FAQ (text version)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete
comp.speech refs: [1] - [2] - [3]
comp.speech ftp site
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/
comp.speech refs: [1] - [2]
comp.speech ftp site: Analysis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/
comp.speech refs: [1] - [2]
comp.speech ftp site: Auditory modelling software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/auditory/
comp.speech refs: [1] - [2]
comp.speech ftp site: dictionaries and lexical resources
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/
comp.speech refs: [1] - [2]
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkr_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Tool: Sun version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.zip
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spchtool.zip
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: Sun version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spch_sun_v1.tar.gz
comp.speech refs: [1]
John Holdsworth's Auditory Modeller
URL: ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
comp.speech refs: [1]
Khoros signal and image processing environment from Khoral Research Inc.
URL: ftp://ftp.khoral.com/
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.gz
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps
comp.speech refs: [1]
Lists of speech recognition products posted to comp.speech
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognitionProducts
comp.speech refs: [1]
Lotec speech recognition software
URL: ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
comp.speech refs: [1]
Lowel O'Mard's Auditory Modeller
URL: ftp://suna.lut.ac.uk/public/hulpo/lutear
comp.speech refs: [1]
LPC-10 speech coding software
URL: ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
comp.speech refs: [1]
URL: ftp://jaguar.ncsl.nist.gov/pub/score.README
comp.speech refs: [1]
Numerical analysis software: including FFT
URL: ftp://usc.edu/pub/C-numanal/
comp.speech refs: [1]
OGI Speech Tools
URL: ftp://speech.cse.ogi.edu/pub/tools/
comp.speech refs: [1]
Oxford Advanced Learner's Dictionary
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/
comp.speech refs: [1]
Oxford Advanced Learner's Dictionary: documentation
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc
comp.speech refs: [1]
PAM: a talking personal assistant and text reader application
URL: ftp://ftp.islandnet.com/jts/pam_en3c.zip
comp.speech refs: [1]
Personal TrueTalk from Entropic
URL: ftp://ftp.entropic.com/pub/truetalk/README.ptt
comp.speech refs: [1]
Phonemic Samples
URL: ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes
comp.speech refs: [1]
Phonemic Samples
URL: ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes
comp.speech refs: [1]
Ptolemy signal processing software
URL: ftp://ptolemy.berkeley.edu/pub/
comp.speech refs: [1]
recnet: recurrent neural network speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/recnet-1.3.tar.Z
comp.speech refs: [1]
rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz
comp.speech refs: [1]
rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.Z
comp.speech refs: [1]
Rules for posting to Usenet
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_posting_to_Usenet
comp.speech refs: [1]
sci.lang FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/sci.lang
comp.speech refs: [1]
ShATR: A Multi-simultaneous-speaker corpus
URL: ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.gz
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.Z
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.zip
comp.speech refs: [1]
URL: ftp://yftp@www-vms.uoregon.edu/fonts
Refs: [1]
Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
SpeechLinks - General
Speech Technology Hyperlinks Page
A list of hyperlinks from the comp.speech FAQ related to general speech technology matters. Links are provided to
WWW references, ftp sites, and newsgroups. Cross-references to the comp.speech WWW pages are also provided.
SpeechLinks Pages
WWW Links
Academic Press Limited: Computer Speech and Language Journal
URL: http://www.apnet.com/
comp.speech refs: [1]
AF audio networking software
URL: http://www.research.digital.com/CRL/projects/AF/home.html
comp.speech refs: [1]
American National Standards Institute (ANSI)
URL: http://www.ansi.org/
comp.speech refs: [1]
American Voice Input/Output Society (AVIOS) home page
URL: http://www.avios.com/
comp.speech refs: [1]
Association for Computational Linguistics (ACL) home page
URL: http://www.cs.columbia.edu:80/~acl/
comp.speech refs: [1]
URL: http://www.planeteers.com/digiphon/dpjr.htm
comp.speech refs: [1]
Digital Signal Processing (DSP) group at Rice University
URL: http://www-dsp.rice.edu/
comp.speech refs: [1]
Duncan M. Forrest's Speech Recognition Resource List
URL: http://www.skye.co.za/dmf/speech/
comp.speech refs: [1]
Dynastat, Inc: Speech Intelligibility and Quality Testing
URL: http://www.bga.com/dynastat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus
URL: http://www.cis.rl.ac.uk/proj/psych/eat/eat/
comp.speech refs: [1]
Edinburgh Associative Thesaurus: WWW Interactive version
URL: http://www.cis.rl.ac.uk/proj/psych/eat.html
comp.speech refs: [1]
Elsevier Science: Speech Communication journal
URL: http://www.elsevier.com/
comp.speech refs: [1]
Entropic Research Laboratory home page
URL: http://www.entropic.com/
comp.speech refs: [1] - [2] - [3]
Entropic Signal Processing System (ESPS)
URL: http://www.entropic.com/esps.html
comp.speech refs: [1]
ESCA: European Speech Communication Association list of research sites
URL: http://ophale.icp.grenet.fr/esca/labos.html
comp.speech refs: [1]
European Language Resources Association
URL: http://www.icp.grenet.fr/ELRA/home.html
comp.speech refs: [1]
European Speech Communication Association (ESCA) home page
URL: http://ophale.icp.grenet.fr/esca/esca.html
comp.speech refs: [1]
FAQ: How can I use the Internet as a telephone?
URL: http://rpcp.mit.edu/~asears/voice-faq.html
comp.speech refs: [1]
Free Speech Journal
URL: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html
comp.speech refs: [1]
George L. Dillon's Consonant sounds of English
URL: http://weber.u.washington.edu/~dillon/consonants.html
comp.speech refs: [1]
George L. Dillon's list of phonetic resources
URL: http://weber.u.washington.edu/~dillon/PhonResources.html
comp.speech refs: [1]
George L. Dillon's Vowel Quadrilaterals for American and British English
URL: http://weber.u.washington.edu/~dillon/newstart.html
comp.speech refs: [1]
URL: http://asa.aip.org/meetings.html
comp.speech refs: [1]
MicMac Recording Software for Macs
URL: http://moof.com/nirvana/
comp.speech refs: [1]
Microsoft Speech API
URL: http://www.microsoft.com/MEDIADEV/AUDIO/MSPEECH1.HTM
comp.speech refs: [1]
Microsoft Speech API: An Overview
URL: http://www.microsoft.com/mediadev/audio/mspover.htm
comp.speech refs: [1]
Microsoft Speech SDK
URL: http://www.research.microsoft.com/research/srg/install.htm
comp.speech refs: [1] - [2]
Microsoft Telephony API
URL: http://www.microsoft.com/ntserver/communications/tapi.htm
comp.speech refs: [1]
Microsoft Telephony API White Paper
URL: http://www.microsoft.com/ntserver/communications/tapi_wp.htm
comp.speech refs: [1]
Mike Noel's home page (CLSU)
URL: http://www.cse.ogi.edu/~noel/
comp.speech refs: [1]
Moby lexical resources
URL: http://www.dcs.shef.ac.uk/research/ilash/Moby/
comp.speech refs: [1]
MRC Psycholinguistic Database: WWW Interface
URL: http://www.psy.uwa.edu.au/uwa_mrc.htm
comp.speech refs: [1]
National Institute of Standards and Technology (NIST).
URL: http://www.nist.gov/
comp.speech refs: [1]
Nautilus home page: Secure Computer Telephony
URL: http://www.lila.com/nautilus/
comp.speech refs: [1]
NetSpeak home page
URL: http://www.netspeak.com/
comp.speech refs: [1]
NOISEX-92 database
URL: http://spib.rice.edu/spib/select_noise.html
comp.speech refs: [1]
N!Power
URL: http://www.silcom.com/~stilarry/
comp.speech refs: [1]
Peter Meijer's home page
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/
comp.speech refs: [1]
Peter Meijer's "the vOICe" Java applet/application
URL: http://ourworld.compuserve.com/homepages/Peter_Meijer/javoice.htm
comp.speech refs: [1]
URL: http://www.bonzi.com/
comp.speech refs: [1]
Voice Information Associates, Inc.
URL: http://www.tiac.net/users/asrnews/
comp.speech refs: [1]
Voice Users mailing list
URL: http://voicerecognition.com/voice-users/
comp.speech refs: [1]
WebPhone
URL: http://www.netspeak.com/about.html
comp.speech refs: [1]
WebPhone availability
URL: http://www.netspeak.com/getphone.html
comp.speech refs: [1]
Webster's dictionary online
URL: http://c.gp.cs.cmu.edu:5103/prog/webster
comp.speech refs: [1]
Webster's Revised Unabridged Dictionary, 1913
URL: http://humanities.uchicago.edu/forms_urest/webster.form.html
comp.speech refs: [1]
WebTalk from Quarterdeck
URL: http://www.quarterdeck.com/
comp.speech refs: [1]
Wildfire - an Electronic Assistant
URL: http://www.wildfire.com/
comp.speech refs: [1]
WordNet home page
URL: http://www.cogsci.princeton.edu/~wn/
comp.speech refs: [1]
WordNet: WWW interface
URL: http://www.cogsci.princeton.edu/~wn/w3wn.html
comp.speech refs: [1]
Yamada Language Center Fonts
URL: http://babel.uoregon.edu/yamada/fonts.html
comp.speech refs: [1]
Yamada Language Center IPA fonts
URL: http://babel.uoregon.edu/yamada/fonts/phonetic.html
comp.speech refs: [1]
Yamada Language Center (Phonetic fonts)
URL: http://babel.uoregon.edu/yamada.html
comp.speech refs: [1]
Yamada Language Center windows fonts
URL: http://babel.uoregon.edu/yamada/winfonts.html
comp.speech refs: [1]
FTP Links
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech data
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/data/
comp.speech refs: [1] - [2]
comp.speech ftp site: speech processing tools and software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/
comp.speech refs: [1]
comp.speech ftp site: speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
comp.speech refs: [1] - [2] - [3]
comp.speech ftp site: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/
comp.speech refs: [1] - [2]
comp.speech ftp site: Useful information
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/
comp.speech refs: [1]
comp.speech newsgroup archives
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/
comp.speech refs: [1] - [2]
CyberPhone internet voice communication
URL: ftp://magenta.com/pub/cyberphone
comp.speech refs: [1]
ECTL mailing list archives
URL: ftp://snowhite.cis.uoguelph.ca/pub/ectl
comp.speech refs: [1]
FAQ: How can I use the Internet as a telephone?
URL: ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_How_can_I_use_the_Internet_as_a_telephone?
comp.speech refs: [1]
FAQs about FAQs
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about_FAQs
comp.speech refs: [1]
Homophone list
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/homophones-1.01.txt
comp.speech refs: [1]
Human Audio Perception document
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/HumanAudioPerception
comp.speech refs: [1]
Internet Phone from VocalTec
URL: ftp://ftp.vocaltec.com/pub/iphone09.exe
comp.speech refs: [1]
IPA for LaTeX: Washington State University International Phonetic Alphabet fonts
URL: ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/
comp.speech refs: [1]
IPA for LaTeX: Washington State University International Phonetic Alphabet fonts
URL: ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/
comp.speech refs: [1]
John Holdsworth's Auditory Modeller
URL: ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
comp.speech refs: [1]
Khoros signal and image processing environment from Khoral Research Inc.
URL: ftp://ftp.khoral.com/
comp.speech refs: [1]
Lowel O'Mard's Auditory Modeller
URL: ftp://suna.lut.ac.uk/public/hulpo/lutear
comp.speech refs: [1]
Math Works Inc.: Matlab plus Signal Processing Toolbox
URL: ftp://ftp.mathworks.com
comp.speech refs: [1]
Mirror of SIMTEL sound directory
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/simtel_sound/
comp.speech refs: [1] - [2]
Mirror of SIMTEL voice directory
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/simtel_voice/
comp.speech refs: [1] - [2]
MixViews Unix sound editor
URL: ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews/
comp.speech refs: [1]
Moby Hyphenator: 185,000 entries fully hyphenated
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mhyph.tar.Z
comp.speech refs: [1]
Moby Language: Word lists in five major languages
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mlang.tar.Z
comp.speech refs: [1]
Moby lexical resources
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
comp.speech refs: [1]
Moby Part-of-Speech: 230,000 entries with part(s) of speech listed in priority order
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mpos.tar.Z
comp.speech refs: [1]
Moby Pronunciator: 175,000 entries fully International Phonetic Alphabet coded
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mpron.tar.Z
comp.speech refs: [1]
Moby Shakespeare: The complete unabridged works of Shakespeare
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mshak.tar.Z
comp.speech refs: [1]
Moby Thesaurus: 30,000 root words, 2.5 million synonyms and related words
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mthes.tar.Z
comp.speech refs: [1]
Moby Words: 610,000+ words and phrases
URL: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/mwords.tar.Z
comp.speech refs: [1]
MRC Psycholinguistic Database
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/
comp.speech refs: [1]
MRC Psycholinguistic Database and Oxford Advanced Learner's Dictionary
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info
comp.speech refs: [1]
MRC Psycholinguistic Database: Readme
URL: ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_posting_to_Usenet
comp.speech refs: [1]
sci.lang FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/sci.lang
comp.speech refs: [1]
ShATR: A Multi-simultaneous-speaker corpus
URL: ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
comp.speech refs: [1]
Speech File Formats guide by Guido van Rossum
URL: ftp://ftp.cwi.nl/pub/audio/index.html
comp.speech refs: [1]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/
comp.speech refs: [1] - [2]
Speech Filing System (SFS)
URL: ftp://ftp.phon.ucl.ac.uk/pub/sfs/README
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts (for Mac)
URL: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx
comp.speech refs: [1]
Summer Institute of Linguistics IPA Fonts (for Windows)
URL: ftp://ftp.sil.org/fonts/win/silip12a.exe
comp.speech refs: [1]
TCPPlay: a Mac-based audio server
URL: ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx
comp.speech refs: [1]
TCPPlay: Macintosh audio server
URL: ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
comp.speech refs: [1]
TIPA: LaTeX IPA font
URL: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
comp.speech refs: [1]
TIPA: LaTeX IPA font manual
URL: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps
comp.speech refs: [1]
What is Usenet?
URL: ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Usenet?
comp.speech refs: [1]
WordNet: home page
URL: ftp://clarity.princeton.edu/pub/wordnet/
comp.speech refs: [1]
WordNet: README
URL: ftp://clarity.princeton.edu/pub/wordnet/README
comp.speech refs: [1]
WordNet: Technical Papers
URL: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps
comp.speech refs: [1]
Yamada Language Center phonetic fonts
URL: ftp://yftp@www-vms.uoregon.edu/fonts
Refs: [1]
Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
SpeechLinks Pages
WWW Links
comp.dsp newsgroup FAQ
URL: http://www.bdti.com/faq/dsp_faq.htm
comp.speech refs: [1] - [2] - [3]
Comprehensive list of FFT software
URL: http://tjev.tel.etf.hr/josip/DSP/fft.html
comp.speech refs: [1]
CRC Press: Scientific and Technical Publisher
URL: http://www.crcpress.com/
comp.speech refs: [1]
Digital Signal Processing Course by Joe Picone
URL: http://www.isip.msstate.edu/publications/1995/ee_4773/
comp.speech refs: [1]
Digital Signal Processing course syllabus by Joe Picone
URL: http://www.isip.msstate.edu/publications/1995/ee_4773/SYLLABUS.ps
FTP Links
Aria Soundcard FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_FAQ_v1.05
comp.speech refs: [1]
Aria Soundcard Support List
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_Soundcard_Support_List_v2.09
comp.speech refs: [1]
comp.dsp FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
comp.speech refs: [1] - [2]
comp.sys.ibm.pc.soundcard.misc newsgroup FAQs
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/
comp.speech refs: [1]
FFT Software
URL: ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip
comp.speech refs: [1]
FFT Software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff.tar.gz
comp.speech refs: [1]
FFT Software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.zip
comp.speech refs: [1]
FFT Software
URL: ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
comp.speech refs: [1]
Matlab Sound and Image Toolbox
URL: ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx
comp.speech refs: [1]
Midi files information
URL:
ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Midi_files_software_archives_on_the_Internet
comp.speech refs: [1]
Numerical analysis software: including FFT
URL: ftp://usc.edu/pub/C-numanal/
comp.speech refs: [1]
Signal End-Point Detection software
URL: ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.tar.gz
comp.speech refs: [1]
Silicon Graphics audio Frequently Asked Questions (FAQ)
URL: ftp://viz.tamu.edu/pub/sgi/faq/
comp.speech refs: [1]
Speech End-point detection software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
comp.speech refs: [1]
Turtle Beach sound cards FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Turtle_Beach_sound_cards_FAQ
comp.speech refs: [1]
Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
comp.speech refs: [1]
Language newsgroup
URL: news:sci.lang
Refs: [1]
Multimedia newsgroup
URL: news:comp.multimedia
comp.speech refs: [1]
Natural Language Knowledge Representation newsgroup
URL: news:comp.ai.nlang-know-rep
comp.speech refs: [1]
Natural Language Processing newsgroup
URL: news:comp.ai.nat-lang
comp.speech refs: [1]
Neural network newsgroup
URL: news:comp.ai.neural-nets
comp.speech refs: [1]
Newsgroup for discussion of speech production and perception
URL: news:alt.sci.physics.acoustics
comp.speech refs: [1]
Newsgroup for new users of news
URL: news:news.announce.newusers
comp.speech refs: [1] - [2]
Silicon Graphics Systems newsgroup
URL: news:comp.sys.sgi.misc
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.advocacy
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard
comp.speech refs: [1]
Soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.misc
comp.speech refs: [1]
Soundcard technical discussion group
URL: news:comp.sys.ibm.pc.soundcard.tech
comp.speech refs: [1]
Soundcards and Games newsgroup
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
SpeechLinks Pages
WWW Links
32 kbps ADPCM
URL: http://www.cwi.nl/ftp/audio/adpcm.shar
comp.speech refs: [1]
ACELP Codecs from Sipro Lab Telecom Inc.
URL: http://www.sipro.com/acelp.html
comp.speech refs: [1]
Audio and Music Applications for Silicon Graphics Systems
URL: http://reality.sgi.com/employees/cook/audio.apps/public.html
comp.speech refs: [1]
Buddy Software Library: MPEG-1 Audio Layer 3 encoder and player
URL: http://www.buddy.org/softlib.html
comp.speech refs: [1]
Castleton Network Systems - G.729 Voice Coder
URL: http://www.castleton.com/
comp.speech refs: [1]
Ciaran McElroy's Speech Coding Page
URL: http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.html
comp.speech refs: [1]
CyberVoice speech coding
URL: http://www.cybit.com/
comp.speech refs: [1]
G.729 Annex A from Sipro Lab Telecom Inc
URL: http://www.sipro.com/g729a.html
comp.speech refs: [1]
GSM 06.10 Compression
URL: http://www.cs.tu-berlin.de/~jutta/toast.html
comp.speech refs: [1]
How to Install an MPEG Audio Player for your Web Navigator
URL: http://www.mpeg.org/index.html/MPEG-audio-player.html
comp.speech refs: [1]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
International Telecommunications Union standards information
URL: http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html
comp.speech refs: [1]
International Telecommunications Union WWW site
URL: http://www.itu.ch/
comp.speech refs: [1]
Jason Woodard's Speech Coding Page
URL: http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie speech coding
URL: http://www.lhs.com/coding.html
FTP Links
Audio file format conversion for G.723, G.721, A-law, u-law and linear
URL: ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
comp.speech refs: [1]
CELP 3.2a and LPC-10
URL: ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.gz
comp.speech refs: [1]
CELP speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.Z
comp.speech refs: [1]
comp.compression FAQ
URL: ftp://rtfm.mit.edu/pub/usenet/comp.compression/
comp.speech refs: [1] - [2]
G711, G721, G723 speech coding software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G721_G723.tar.Z
comp.speech refs: [1]
G.728 CELP Compression
URL: ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ddj/gsm-107.zip
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.7.tar.gz
comp.speech refs: [1]
GSM 06.10 Compression
URL: ftp://ftp.mv.com/pub/ddj/1994.12/gsm-105.zip
comp.speech refs: [1]
LPC-10 speech coding software
URL: ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
comp.speech refs: [1]
MPEG-1 and MPEG-2 audio software from Universitaet Hannover
URL: ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/
comp.speech refs: [1]
MPEG-1 Audio Layer 1 & 2 decoder and verifier at CCETT
URL: ftp://ftp.ccett.fr/pub/mpeg/audio_new/
comp.speech refs: [1]
MPEG-1 Audio Layer 1 &2 encoder - decoder
URL: ftp://ftp.iuma.com/audio_utils/converters/source/
comp.speech refs: [1]
MPEG-2 Audio encoder and decoder at CCETT
URL: ftp://ftp.ccett.fr/pub/mpeg/mpeg2/
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.gz
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar.Z
comp.speech refs: [1]
shorten audio file compression software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.zip
comp.speech refs: [1]
StarAudio Compressor/Player technical documentation
URL: ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
comp.speech refs: [1]
Newsgroups
http://mi.eng.cam.ac.uk/comp.speech/Section3/speechlinks.html (5 of 7) [10/31/2003 8:41:47 AM]
SpeechLinks: Speech Coding
SpeechLinks Pages
WWW Links
Acuvoice, Inc. speech synthesizer
URL: http://www.acuvoice.com/
comp.speech refs: [1]
An Introduction to Text-to-Speech Synthesis: Thierry Dutoit
URL: http://kapis.www.wkap.nl/kapis/CGI-BIN/WORLD/book.htm?0-7923-4498-7
comp.speech refs: [1]
Andrew Simpson's home page
URL: http://www.phon.ucl.ac.uk/home/andrew/home.html
comp.speech refs: [1]
AsTeR text-to-speech processing
URL: http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html
comp.speech refs: [1]
AT&T Advanced Speech Products Group home page
URL: http://www.att.com/aspg/
comp.speech refs: [1] - [2] - [3]
ATT Bell Laboratories Voices
URL: http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
comp.speech refs: [1]
AT&T Watson: Engineer Training Program
URL: http://www.att.com/aspg/SSI_Class.html
comp.speech refs: [1] - [2]
URL: http://www.promotor.telia.se/NYA/cc/t-s/index.html
comp.speech refs: [1]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
Institute for Communications Research and Phonetics, University of Bonn: Hadifix speech synthesis
URL: http://asl1.ikp.uni-bonn.de/Welcome.html
comp.speech refs: [1]
Institute of Phonetic Sciences, University of Amsterdam
URL: http://fonsg3.let.uva.nl/Welcome.html
comp.speech refs: [1]
IPOX: All Prosodic Speech Synthesis Architecture
URL: http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Jon Iles' Speech Synthesis "Museum"
URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
comp.speech refs: [1] - [2]
JTS Micro Consulting Ltd: PAM, JTS Reader and Listen2
URL: http://www.islandnet.com/jts/
comp.speech refs: [1]
Kevin Lenzo's page of Speech Applications for the Macintosh
URL: http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Laureate speech synthesis from British Telecom
URL: http://www.labs.bt.com/innovate/speech/laureate/
comp.speech refs: [1]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie text-to-speech
URL: http://www.lhs.com/tts.html
comp.speech refs: [1] - [2]
Listen2 web page
URL: http://www.islandnet.com/jts/listen2.htm
comp.speech refs: [1]
Lucent Technologies Bell Labs Text-to-Speech
URL: http://www.bell-labs.com/project/tts/
comp.speech refs: [1] - [2]
Lucent Technologies Bell Labs Text-to-Speech: system description
URL: http://www.bell-labs.com/project/tts/tts-overview.html
comp.speech refs: [1]
Lyricos singing speech synthesis
URL: http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
comp.speech refs: [1]
Macintosh Speech Page: Speech Manager and PlainTalk
URL: http://www.speech.apple.com/
comp.speech refs: [1] - [2] - [3] - [4]
MacYack Pro Speech Synthesis software
URL: http://www.lowtek.com/macyack/
comp.speech refs: [1]
MBROLA speech synthesis demonstration
URL: http://tcts.fpms.ac.be/synthesis/modelcmp.html
comp.speech refs: [1] - [2]
FTP Links
Elan Informatique
URL: ftp://ftp.elan.fr/
comp.speech refs: [1]
Elan Informatique: Proverbe documentation
URL: ftp://ftp.elan.fr/Voice_products/Text-To-Speech_Synthesis_Products/ProVerbe_Speech_Engine/SDKEN.DOC
comp.speech refs: [1]
Festival Speech Synthesis System: source
URL: ftp://ftp.cstr.ed.ac.uk/pub/festival/1.1.1/
comp.speech refs: [1]
Hadifix speech synthesis demo software
URL: ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
comp.speech refs: [1]
Hadifix speech synthesis demo software
URL: ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.gz
comp.speech refs: [1]
Klatt speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z
comp.speech refs: [1]
KPE80 - A Klatt Synthesiser and Parameter Editor
URL: ftp://ftp.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps
comp.speech refs: [1]
Narrator Translator Library
URL: ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha
comp.speech refs: [1]
Narrator Translator Library
URL: ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
comp.speech refs: [1]
PAM: a talking personal assistant and text reader application
URL: ftp://ftp.islandnet.com/jts/pam_en3c.zip
comp.speech refs: [1]
Personal TrueTalk from Entropic
URL: ftp://ftp.entropic.com/pub/truetalk/README.ptt
comp.speech refs: [1]
rsynth: speech synthesis software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz
Newsgroups
http://mi.eng.cam.ac.uk/comp.speech/Section5/speechlinks.html (8 of 10) [10/31/2003 8:41:49 AM]
SpeechLinks: Speech Synthesis
URL: news:comp.sys.ibm.pc.soundcard.games
comp.speech refs: [1]
Soundcards and Music newsgroup
URL: news:comp.sys.ibm.pc.soundcard.music
comp.speech refs: [1]
Speech technology newsgroup
URL: news:comp.speech
comp.speech refs: [1] - [2] - [3] - [4]
Telecommunications newsgroup
URL: news:comp.dcom.telecom
comp.speech refs: [1]
SpeechLinks Pages
WWW Links
1stVoice Dragon Systems reseller
URL: http://www.1stvoice.com/
comp.speech refs: [1]
21st Century Eloquence: speech recognition reseller
URL: http://www.voicerecognition.com/
comp.speech refs: [1]
Advanced Recognition Technologies, Inc: smARTspeak
URL: http://www.artcomp.com/speak.htm
comp.speech refs: [1]
Applied Language Technologies, Inc.: SpeechWorks
URL: http://www.altech.com/
comp.speech refs: [1]
ART: Advanced Recognition Technologies, Inc
URL: http://www.artcomp.com/
comp.speech refs: [1]
Articulate Systems PowerSecretary speech recognition
URL: http://www.artsys.com/
comp.speech refs: [1]
AT&T Advanced Speech Products Group home page
URL: http://www.att.com/aspg/
comp.speech refs: [1] - [2] - [3]
URL: http://www.commandcorp.com/incube_welcome.html
comp.speech refs: [1]
IN CUBE Mark II Pro for Windows NT
URL: http://www.commandcorp.com/cci/pront.html
comp.speech refs: [1]
IN CUBE Voice Command for Sun SPARCstations
URL: http://www.commandcorp.com/cci/in3sparc.html
comp.speech refs: [1]
Infolingua Bibliographies
URL: http://gomer.mlink.net/infolingua.html
comp.speech refs: [1] - [2] - [3]
Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/~ifb/ipf1.html
comp.speech refs: [1] - [2] - [3]
Institute for Signal and Information Processing (ISIP) at Mississippi State University
URL: http://www.isip.msstate.edu/
comp.speech refs: [1] - [2] - [3] - [4]
International Computer Science Institute in Berkeley, CA
URL: http://www.icsi.berkeley.edu/
comp.speech refs: [1]
Johann Wolfgang Goethe-Universitat Frankfurt
URL: http://www.uni-frankfurt.de/
comp.speech refs: [1] - [2] - [3]
Kevin Lenzo's page of Speech Applications for the Macintosh
URL: http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Keyware S2 Security Service
URL: http://www.keywareusa.com/Products/S2SecurityServer/main.html
comp.speech refs: [1]
Keyware Technologies Biometric Verificaton
URL: http://www.keywareusa.com/
comp.speech refs: [1]
Keyware VoiceGuardian
URL: http://www.keywareusa.com/Products/VoiceGuardian/main.html
comp.speech refs: [1]
Keyware VoiceGuardian online demo
URL: http://www.keywareusa.com/Demos/
comp.speech refs: [1]
Kurzweil Clinical Reporter speech recognition
URL: http://www.kurzweil.com/medical/
comp.speech refs: [1]
Kurzweil Voice for Windows: speech recognition
URL: http://www.kurzweil.com/
comp.speech refs: [1]
LawTalk from WildCard
URL: http://www.wildcardtech.com/speech/info/lawtalk.htm
comp.speech refs: [1]
Lernout and Hauspie home page
URL: http://www.lhs.com/
comp.speech refs: [1] - [2] - [3] - [4] - [5] - [6]
Lernout and Hauspie speech recognition
URL: http://www.lhs.com/asr.html
comp.speech refs: [1] - [2]
Lists of References on Automatic Speaker Verification
URL: http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html
comp.speech refs: [1]
URL: http://www.isip.msstate.edu/publications/1996/speech_recognition_short_course
comp.speech refs: [1]
Speech Recognition List: Applied Speech Technology Laboratory of CLSI at Stanford
URL: http://csli-www.stanford.edu/users/bscott/SRTech.html
comp.speech refs: [1]
Speech Systems Phonetic Engine speech recognition
URL: http://www.speechsys.com/
comp.speech refs: [1]
Speech Toys
URL: http://www.speechtoys.com/
comp.speech refs: [1] - [2]
Speech Toys page on Speech Recognition
URL: http://www.speechtoys.com/spchtoys/sprec.html
comp.speech refs: [1]
SpeechPrint ID from Voice Control Systems, Inc.
URL: http://www.voicecontrol.com/speechid.html
comp.speech refs: [1]
Spoken Language Systems Group at the Massachusetts Institute of Technology
URL: http://www.sls.lcs.mit.edu/
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/HLTsurvey.html
comp.speech refs: [1] - [2] - [3]
Survey of the State of the Art in Human Language Technology: Speaker Recognition
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html
comp.speech refs: [1]
Survey of the State of the Art in Human Language Technology: Spoken Input Technologies
URL: http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html
comp.speech refs: [1]
Synapse: speech recognition sales
URL: http://www.synapseadaptive.com/
comp.speech refs: [1]
Talk Technology, Inc.: Speech recognition reseller
URL: http://www.usbusiness.com/talk/
comp.speech refs: [1]
Talk Technology: speech recognition reseller
URL: http://www.talktechnology.com/
comp.speech refs: [1]
Talking to a PC May Be Hazard To Your Throat, by Julie Chao
URL: http://www.bilbo.com/tae/bilbo/wsj.html
comp.speech refs: [1]
Talking to Computers Has its Hazards, by Gordon Arnaut
URL: http://www.bilbo.com/tae/bilbo/globmail.html
comp.speech refs: [1]
T-Netix speaker verification for cellular communications
URL: http://www.t-netix.com/
comp.speech refs: [1]
Tony Robinson's home page
URL: http://svr-www.eng.cam.ac.uk/~ajr/
comp.speech refs: [1] - [2] - [3] - [4] - [5]
ToppCopy Telecom: Speech recognition reseller
URL: http://www.toppcopy.com/
comp.speech refs: [1]
Typing Injuries Page
URL: http://alumni.caltech.edu/~dank/typing-archive.html
comp.speech refs: [1]
FTP Links
AbbotDemo speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
comp.speech refs: [1]
comp.speech ftp site: speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
comp.speech refs: [1] - [2] - [3]
Digital Dreams Speech Recognition Plug-Ins
URL: ftp://ftp.surftalk.com/
comp.speech refs: [1]
Do-it-yourself speech recognition
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognition
comp.speech refs: [1]
EARS speech recognition software
URL: ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
comp.speech refs: [1]
EARS speech recognition software
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.tar.gz
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: MSDOS version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkrtool.zip
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Research Tool: Sun version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spkr_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speaker Recognition (Identification) Tool: Sun version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_sun_v1.tar.gz
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (Germany)
URL: ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.zip
comp.speech refs: [1]
Jialong He's Speech Recognition Research Tool: MSDOS version (UK)
URL: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/spchtool.zip
comp.speech refs: [1]
Newsgroups
Artificial Intelligence newsgroup
URL: news:comp.ai
comp.speech refs: [1]
Compression newsgroup
URL: news:comp.compression
comp.speech refs: [1]
Digital signal processing newsgroup
URL: news:comp.dsp
comp.speech refs: [1] - [2]
FAQ postings for all newsgroups
URL: news:news.answers
comp.speech refs: [1]
FAQ postings for computer-related newsgroups
URL: news:comp.answers
comp.speech refs: [1]
GUS soundcard discussion group
URL: news:comp.sys.ibm.pc.soundcard.GUS
Improvements can be achieved by increasing the number of bits in sampling to 12bits or 16bits, or by
using a non-linear encoding technique such as mu-law or A-law (see Q2.7). This improves the "signal-
to-noise" ratio.
Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz, improves the frequency
response: the higher the sampling frequency the better the high frequency content will be. A 16kHz
sampling rate is a reasonable target for high quality speech recording and playback.
When doing speech recognition you need to remember that the your computer is not as good as your
ear so it will have trouble with poor quality sounds. The choice of an appropriate sampling setup
depends very much on the speech recognition task and the amount of computer power available.
● http://www.bdti.com/faq/dsp_faq.htm
● ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
● Most of the speech processing environments listed in Q1.9 including CSRE, ESPS, Kay
Elemetrics Computer Speech Lab, OGI Speech Tools, Speech Filing System, Signalyze,
Soundscope.
● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
● ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.tar.gz
Plenty of research papers have been presented on end-pointing. Try the following:
● Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints of Isolated
Utterances", Bell System Technical Journal, Vol 54, No. 2, pp 297-315, 1975.
● Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans on Communications, Vol
26, No 1, Jan 78, pp. 140-145.
● Newman, W.C. "Detecting Speech with an Adapative Neural Network." Electronic Design. 22
March 1990.
● Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE Proc. Sci. Meas. Technol.,
Vol 141, No.3, May 1994, pp 153-159.
Dr. Tony Robinson of the Engineering Dept of Cambridge University has put his Speech Analysis
course notes on the web. The base page is http://svr-www.eng.cam.ac.uk/~ajr/SA95/. There is
information on the following:
● Sampling theory
● Filter bank analysis
● Short-term fourier analysis
● Linear prediction analysis
● Formant analysis and voicing analysis
● Speech coding
● and more....
Joseph Picone of the Institute for Signal and Information Processing (ISIP) at Mississippi State
University has put two sets of course notes on the web:
The Signal Processing Home page has information on a range of DSP issues. It includes references to
a range of software and much more.
http://tjev.tel.etf.hr/josip/DSP/sigproc.html
There are many good books which discuss signal processing for speech:
Can anyone provide information for SGI, NeXT, other UNIX hardware and any other PC soundcards?
On SUN Sparc systems have a look in the directory /usr/demo/SOUND. Included are table lookup macros
for ulaw conversions. [Note however that not all systems will have /usr/demo/SOUND installed as it is
optional - see your system admin if it is missing.]
/**
** Signal conversion routines for use with Sun4/60 audio chip
**/
#include stdio.h
/*
** This routine converts from linear to ulaw
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
** Continuous PCM Companding Law," Villeret, Michel,
** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
** 1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/
unsigned char
linear2ulaw(sample)
int sample; {
static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
int sign, exponent, mantissa;
unsigned char ulawbyte;
return(ulawbyte);
}
/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
int
ulaw2linear(ulawbyte)
unsigned char ulawbyte;
{
static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
int sign, exponent, mantissa, sample;
ulawbyte = ~ulawbyte;
sign = (ulawbyte & 0x80);
exponent = (ulawbyte >> 4) & 0x07;
mantissa = ulawbyte & 0x0F;
sample = exp_lut[exponent] + (mantissa << (exponent + 3));
if (sign != 0) sample = -sample;
return(sample);
}
On the Web
The following sites provide lists of useful DSP software. Not all the software is directly applicable to
speech processing.
comp.dsp FAQ
http://www.bdti.com/faq/dsp_faq.htm
DSP Internet Resources
http://www.eg3.com/
http://www.eg3.com/dsp.htm
Poynton's Digital Signal Processing Resource List
http://www.inforamp.net/~poynton/Poynton-dsp.html
WWW Pages Relating to Sound Computation
http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html
Yahoo - Signal and Image Processing
http://www.yahoo.com/Science/Engineering/Electrical_Engineering/Signal_and_Image_Processing/
Sound Related Resources
http://pscinfo.psc.edu/~geigel/menus/sound.html
SPLIB: Signal Processing url LIBrary
http://jazz.rice.edu/splib/
Wavelet's Home Page
http://www.mat.sbg.ac.at/~uhl/wav.html
Acknowledgements
Hundreds of people and companies have made contributions to the comp.speech FAQ over the last
few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo
who have provided a wide range of information and assistance. Tony Robinson also maintains the
comp.speech ftp site which is an excellent resource for all people working with speech technology. I
am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for
supporting the FAQ on their WWW sites.
Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the information presented
here, the author assumes no responsibility for errors or omissions, or for damages resulting from the
use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as representing the views or
products of my employer, Sun Microsystems, Inc.
Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by
Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
comp.speech FAQ /
WWW
Submission of Information
Any updates of information, corrections or suggestions are welcome. Please note that it may take me a
week or two to respond.
Andrew Hunt
Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681
Email: andrew.hunt@east.sun.com
The aim of speech compression is to produce a compact representation of speech sounds such that
when reconstructed it is perceived to be close to the original. The two main measures of closeness are
intelligibility and naturalness.
The standard reference point is toll quality speech, this is the same as what would be expected over a
telephone line, for example, speech coded at 8 kHz using 8 bit ulaw coding and a maximum frequency
of about 3.3 kHz. This is a bit rate of 64 kbps, and as such represents a compressed form over (say) 16
bit, 16 kHz speech which is the standard in speech recognition work.
ulaw coding does not exploit the (normally large) sample to sample correlations found in speech.
ADPCM is the next family of speech coding techniques, and does exploit this redundancy by using a
simple linear filter to predict the next sample of speech. The resulting prediction error is typically
quantised to 4 bits thus giving a bit rate of 32 kbps (see, for example, the software in Q3.3: 32 kbps
ADPCM, G.711/721/723 Compression, shorten). The advantages of ADPCM are that is simple to
implement and has very low delay.
To obtain more compression specific properties of the speech signal must be modelling. The main
assumption is known as the source filter model of speech production. This assumes that a source
(voicing or fricative excitation) is passed through a filter (the vocal tract response) to produce the
speech. The simplest implementation of this is known as a LPC synthesiser (e.g. LPC10e). At every
frame the speech is analysed to compute the filter coefficients, the energy of the excitation, a voicing
decision, and a pitch value if voiced. At the decoder a regular set of pulses for voiced speech or white
noise for unvoiced speech is passed through the linear filter and multiplied by the gain to produce the
speech. This is a very efficient system and typically produces speech coded at 1200-2400bps. With
clever acoustic vector prediction this can be reduced to 300-600bps. The disadvantages are a loss of
naturalness over most of the speech and occasionally a loss of intelligibility.
The CELP family of coders compensates for the lack of quality of the simple LPC model by using
more information in the excitation. Each of a set of codebook of excitation vectors is tried and the
index of the one that best matches the original speech is transmitted. This results in an increase in the
bit rate to typically 4800-9600bps. Most speech coding research is currently directed towards CELP
coders. (See, for example, CELP 3.2a, a TMS implementation, a G.728 LD-CELP vocoder, and the
L&H implementation.
● Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the IEEE 63 (1975): 561 - 580.
On the WWW
comp.compression FAQ
Includes a few questions and answers on the compression of speech.
ftp://rtfm.mit.edu/pub/usenet/comp.compression/
Tony Robinson's Speech Analysis Course
A complete course on speech analysis, including some stuff on speech coding.
http://svr-www.eng.cam.ac.uk/~ajr/SA95/
http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
ITU Coding Standards
Members of the ITU (International Telecommunications Union) can obtain copies of the Series
G Recommendations (including G.711/721/723/728) from the ITU WWW site
(http://www.itu.ch/) and from http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html.
Jason Woodard's Speech Coding Page
Introduction to speech coding plus information on a series of speech coding standards.
http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html
WWW searchable online-bibiliography for Phonetics and Speech Technology
Over 8000 entries provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
Ciaran McElroy's Speech Coding Page
Introduction to many types of speech coding.
http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.html
32 kbps ADPCM
Castleton Network Systems - G.729 Voice Coder
CELP 3.2a & LPC-10
8 Kbit/s CELP on the TMS320C5x family of DSP chips
CyberVoice
Rockwell's DigiTalk
File format conversion
G.711/721/723 Compression
G.728 LD-CELP vocoder
G.728 Compression
GSM 06.10 Compression
Lernout & Hauspie Speech Coding (5 products)
Lernout & Hauspie Speech Coding SDK
MPEG Audio
shorten - a lossless compressor for speech signals
Sipro Lab Telecom Inc. Coding
Sonarc: Digital Audio Compression
StarAudio Compressor/Player
TrueSpeech from DSP Group
U.S.F.S. 1016 CELP vocoder for DSP56001
ToolVox from Voxware
Speech Synthesis
comp.speech FAQ Section 5
More sophisticated but worse in quality are algorithms which split the speech into smaller pieces. The
smaller those units are, the less are they in number, but the quality also decreases. An often used unit
is the phoneme, the smallest linguistic unit. Depending on the language used there are about 35-50
phonemes in western European languages, i.e. there are 35-50 single recordings. The problem is
combining them as fluent speech requires fluent transitions between the elements. The intellegibility
is therefore lower, but the memory required is small.
A solution to this dilemma is using diphones. Instead of splitting at the transitions, the cut is done at
the center of the phonemes, leaving the transitions themselves intact. This gives about 400 elements
(20*20) and the quality increases.
The longer the units become, the more elements are there, but the quality increases along with the
memory required. Other units which are widely used are half-syllables, syllables, words, or
combinations of them, e.g. word stems and inflectional endings.
The Museum of Speech Analysis and Synthesis has pictures of artificial speech systems going back
over 150 years: worth a visit. ( http://mambo.ucsc.edu/psl/smus/smus.html)
Q5.3: References/Books on
Synthesis
Books and Papers
On the WWW
● WWW searchable online-bibiliography for Phonetics and Speech Technology with more than
8000 entries. Provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
● Computational Speech Processing: Speech Analysis, Recognition, Understanding,
Compression, Transmission, Coding, Synthesis ; Text to Speech Systems, Speech to Tactile
Displays, Speaker Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187
Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html
❍ Eurovocs
❍ DECtalk
❍ KTH-Stockholm
http://www.cse.ogi.edu/CSLU/research/TTS
Examples of diphone speech corpora and algorithms developed at OGI for synthesis of American English and Mexican
Spanish using the Festival framework.
Lyricos
http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
Demos of the Lyricos singing voice synthesis system. Concatenation-based synthesis of singing voice from MIDI input.
Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
http://www.fb9-ti.uni-duisburg.de/demos/speech.html
Synthesis in German, English or Japanese.
TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan
http://www.speech.kth.se/info/software.html
Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish, British and American English, French, German, Italian, Spanish,
LA Spanish and Greek.
Haskins Laboratory WWW Site
http://www.haskins.yale.edu/Haskins/MISC/special.html
Examples of several types of speech synthesis. Articulatory Synthesis by HyperASY. SineWave Synthesis. Gestural
Computational Model. Pattern Playback system of the 1940's!
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
http://www.bestspeech.com/weblang.html
Eurovocs Multilingual Speech Synthesis
http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
Based on Lernout and Hauspie technology.
HADIFIX German Speech Synthesis
http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
Provided by the Instituts fur Kommunikationsforschung und Phonetik, Universitat Bonn.
Centigram's TruVoice Demo
http://www.centigram.com/centigram/TruVoice/index.html
Allows control of speech rate, pitch and other prosodic charateristics.
MBROLA: Free Speech Synthesis Project
http://tcts.fpms.ac.be/synthesis/modelcmp.html
WWW demo of MBROLA which compares the quality of PSOLA, MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic
concatenative synthesizers. Provided by the TCTS Lab, Faculti Polytechnique de Mons, Belgium
Institute of Phonetic Sciences
http://fonsg3.let.uva.nl/IFA-Features.html
Links to lots of on-line speech synthesis demonstrations provided by the Institute of Phonetic Sciences of the Faculty of Arts
of the University of Amsterdam.
Yahoo page on speech generation
http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligence/Natural_Language_Processing/Speech_Generation/
In the FAQ...
Apple Macintosh
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
Infovox Product Range
Macintosh Speech Output Applications
Macintosh Speech Synthesis Manager
MacYack Pro
MBROLA: Free Speech Synthesis Project
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
Sound Bytes DeveloperUs Kit
Macintosh Speech Synthesis Manager
DOS
CSRE: Computerized Speech Research Environment
Infovox Product Range
MBROLA: Free Speech Synthesis Project
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
spchsyn.exe
Tinytalk
ZMD Speech Synthesis
OS/2
ProVerbe Speech Engine from ELAN Informatique
ProVoice Developer's Speech Toolkit from First Byte
Sound Bytes DeveloperUs Kit
Unix
AcuVoice
AsTeR
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
DECtalk: Text-to-Speech from Digital
ETI-Eloquence
Emacspeak - A Speech Output Subsystem For Emacs
Festival Speech Synthesis System
JSRU
Klatt-style synthesiser
KPE80 - A Klatt Synthesiser and Parameter Editor
"learph": Trainable text-to-phoneme software by Antonio Lucca
Lucent Technologies Bell Labs Text-to-Speech system
Other Platforms
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
TheBigMouth (NeXT)
MBROLA: Free Speech Synthesis Project
Narrator Translator Library (Amiga)
Narrator (Amiga)
TextToSpeech Kit (NeXT)
Orator from Bellcore
SENSYN speech synthesizer
WreadFiles: File reader for Commodore Amiga
Unknown
Lernout and Hauspie Text-To-Speech (3 products)
SIMTEL
Text to Phoneme Program 1
Text to phoneme program 2
Text to phoneme program 3
Speech Recognition
comp.speech FAQ Section 6
Automatic speech understanding is the process by which a computer maps an acoustic speech signal
to some form of abstract meaning of the speech.
A speaker independent system is developed to operate for any speaker of a particular type (e.g.
American English). These systems are the most difficult to develop, most expensive and accuracy is
lower than speaker dependent systems. However, they are more flexible.
A speaker adaptive system is developed to adapt its operation to the characteristics of new speakers.
It's difficulty lies somewhere between speaker independent and speaker dependent systems.
mean?
An isolated-word system operates on single words at a time - requiring a pause between saying each
word. This is the simplest form of recognition to perform because the end points are easier to find and
the pronunciation of a word tends not affect others. Thus, because the occurrences of words are more
consistent they are easier to recognise.
A continuous speech system operates on speech in which words are connected together, i.e. not
separated by pauses. Continuous speech is more difficult to handle because of a variety of effects.
First, it is difficult to find the start and end points of words. Another problem is "coarticulation". The
production of each phoneme is affected by the production of surrounding phonemes, and similarly the
the start and end of words are affected by the preceding and following words. The recognition of
continuous speech is also affected by the rate of speech (fast speech tends to be harder).
Typically speech recognition starts with the digital sampling of speech. The next stage is acoustic
signal processing. Most techniques include spectral analysis; e.g. LPC analysis (Linear Predictive
Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling and many more.
The next stage is recognition of phonemes, groups of phonemes and words. This stage can be
achieved by many processes such as DTW (Dynamic Time Warping), HMM (hidden Markov
modelling), NNs (Neural Networks), expert systems and combinations of techniques. HMM-based
systems are currently the most commonly used and most successful approach.
Most systems utilise some knowledge of the language to aid the recognition process.
Some systems try to "understand" speech. That is, they try to convert the words into a representation
of what the speaker intended to mean or achieve by what they said.
Doug Danforth provides a detailed account in article 253 in the comp.speech archives. A summary is
provided below. It is also available by anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognition
This is a simple recognizer that should give you 85%+ recognition accuracy. The accuracy is a
function of the words you have in your vocabulary. Long distinct words are easy. Short similar words
are hard. You can get 98+% on the digits with this recognizer.
Overview:
Many variations upon the theme can be made to improve the performance. Try different filtering of
the raw signal and different processing methods.
Q6.5 contains information on public domain speech recognition software including: Lotec and Myers'
Hidden Markov Model software.
Hidden Markov Models (HMMs) are widely used in speech recognition systems. Joe Picone has put
together some demonstration software for basic discrete HMMs including Viterbi and Baum-Welch
training and evaluation, random sequence generation (generating data from a model), and model
updating (useful for incremental training). There is a simple demo program that supports all of these
modes from command line arguments. This allows experiments to test the classic coin-toss examples
commonly described in textbooks. The code closely parallels the following textbook:
● J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time Processing of Speech Signals,
MacMillan, 1993, ISBN: 0-02-328301-7.
The code is written in C++ and is intended to facilitate learning and understanding of the algorithms.
The code is available on the ISIP web site:
http://www.isip.msstate.edu/software/
● "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
● "Seybold Report on Desktop Publishing" published a nine-page, head-to-head comparison of
Dragon's DOS software with IBM's OS/2 software. March 7, 1994; Volume 8, Number 7;
Pages 3-11; ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA 19063 USA,
phone (610) 565-2480.
● McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration," published a two-page
review of IBM's Personal Dictation System software. May 1994; Volume ?, Number ?; Pages
145-146; ISSN:0360-5280; Editorial, Executive, and Circulation address: One Phoenix Mill
Lane, Peterborough, NH 03458 USA, phone ?
● The National Center for Voice and Speech provides some basic information on preserving
"Vocal Health" on their WWW site: http://www.shc.uiowa.edu/hygiene/home.html
● Voice Users Mailing List: detail in Q1.4.html of the FAQ.
● Typing Injury FAQ: http://www.cs.princeton.edu:80/~dwallach/tifaq/ has a range of
information on Typing Injuries, avoiding them, alternatives and more.
● Typing Injuries Page: http://alumni.caltech.edu/~dank/typing-archive.html has links to dozens
of useful resources.
● Voice Problems -- Prevention and Correction: advice on preserving your voice with specific
hints for using speech recognition. ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-
problems
● " Talking to a PC May Be Hazard To Your Throat", by Julie Chao in the Wall Street Journal.
● " Talking to Computers Has its Hazards", by Gordon Arnaut in The Globe and Mail
On the WWW
● Survey of the State of the Art in Human Language Technology: Report edited by Ronald A.
Cole et. al. with a section on Spoken Input Technologies.
http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html
Technical
● Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki, M.A. Jack. Edinburgh:
Edinburgh University Press, c1990
● Speech Recognition: The Complete Practical Reference Guide; T. Schalk, P. J. Foster:
Telecom Library Inc, New York; ISBN O-9366648-39-2; 377 pages; paperback only. Covers
speech recognition in a telephony environment and wish to use call processing hardware based
in PCs. It is written using Dialogic hardware as the example for the hardware.
● Automatic speech recognition: the development of the SPHINX system; by Kai-Fu Lee; Boston;
London: Kluwer Academic, c1989
● An Introduction to the Application of the Theory of Probabilistic Functions of a Markov
Process to Automatic Speech Recognition, S. E. Levinson, L. R. Rabiner and M. M. Sondhi; in
Bell Syst. Tech. Jnl. v62(4), pp1035--1074, April 1983
● Review of Neural Networks for Speech Recognition, R. P. Lippmann; in Neural Computation,
v1(1), pp 1-38, 1989.
● Automatic Speech and Speaker Recognition: Advanced Topics, C.H. Lee, F.K. Soong and K.K.
Paliwal (Eds.), Kluwer, Boston, 1996.
Course Notes
● Joseph Picone of the Institute for Signal and Information Processing (ISIP) at Mississippi State
University has put the course notes for "Fundamentals of Speech Recognition" on the WWW.
The course covers background probability and phonetics/acoustics, speech signal analysis,
dynamic programming, dynamic time warping, hidden Markov modelling, language
modelling, neural networks, etc. The WWW sites provides the syllabus and lecture notes.
WWW: http://www.isip.msstate.edu/publications/1996/ee_8993/
● WWW searchable online-bibiliography for Phonetics and Speech Technology with more than
8000 entries. Provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.
http://www.uni-frankfurt.de/~ifb/bib_engl.html
● Computational Speech Processing: Speech Analysis, Recognition, Understanding,
Compression, Transmission, Coding, Synthesis ; Text to Speech Systems, Speech to Tactile
Displays, Speaker Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187
Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html
In the FAQ:
Apple Macintosh
Digital Dreams Speech Recognition Plug-Ins
Dragon Dictation Products
Macintosh Speech Recognition Manager
PowerSecretary
DOS
DATAVOX - French
Dragon Developer Tools
Ficomp Interpreter 6000
Jialong He's Speech Recognition Research Tool
smARTspeak from Advanced Recognition Technologies, Inc.
OS/2
IBM VoiceType Dictation and Control
Unix
AbbotDemo
BBN Hark Telephony Recognizer
EARS: Single Word Recognition Package
Ficomp Interpreter 6000
Hidden Markov Model Toolkit (HTK) from Entropic
IN CUBE
Jialong He's Speech Recognition Research Tool
Lotec Speech Recognition Package
Myers' Hidden Markov Model software
NICO Artificial Neural Network Toolkit
Nuance Speech Recognition System
PureSpeech
recnet
Other Platforms
Simon Says (NeXT)
Voice Command Line Interface (Amiga)
Visus SpeechKit
Unknown
Berkeley Restaurant Project (BeRP)
Lernout & Hauspie ASR (3 products)
Voice-Trek 2.0
Voicetek Corp.
Voice Processing Corporation Speech Recognition Product Line
Jean-Pierre Lereboullet has put together a detailed list of Voice Recognition Processors which covers about 15 ICs and pieces of
related hardware (including D6106, HM2007, MSM6679, RSC-164, TC8860F/64F/65F, 5A128).
The document is available on the comp.speech ftp server:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProcessors
In addition to the entries on speech recognition in this FAQ, the following WWW sites provide information on speech recognition:
1stVoice
2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
Ph: 415-857-1320, Fax: 415-856-6996
WWW: http://www.1stvoice.com/
Email: mail@1stvoice.com
Dragon Dictation Products
21st Century Eloquence
325-A Royal Poinciana Plaza, Palm Beach, Florida 33480, USA
Ph: 800-245-2133, Fax: 407-835-4901
WWW: http://www.voicerecognition.com/
Kurzweil, IBM VoiceType, Dragon, Kolvox
Auscript (Australia)
Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000, Australia
Ph: +61-2-238 6565, Fax: +61-2-238 6566
WWW: http://www.auscript.com.au/
Dragon Systems
BRITE
WWW: http://www.brite.com/
Computer Telephony Integration & Interactive Voice Response
DAX Systems, Inc.
30 Chapin Road, Unit 1201, P.O. Box 778, Pine Brook, NJ/USA 07058
Ph: +1-201-227-8111, Fax: +1-201-227-8197
Email: info@daxsystems.com
WWW: http://www.daxsystems.com/
Computer Telephony and Integrated Voice Response
HealthCare Resources
1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
Ph: +1-310-937-5156, Fax: +1-310-937-5159
EMail: Scalif@AOL.COM
Power Secretary & Dragon Dictate. Specializing in: Medical/Dental, Motion Picture Industry, Carpal Tunnel related and
Disabled Persons.
O'Brien Resources
Ph: (540) 347-4988 (Address unknown)
Email: obrien@crosslink.net
WWW: http://www.crosslink.net/~obrien/
Kurzweil Voice Recognition Products
SCI VoiceAutomated
215 1/2 Main Street, Huntington Beach, CA 92648, USA
Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
http://www.voiceautomated.com/
Introduction
Speaker recognition is the process of automatically recognizing who is speaking on the basis of
individual information included in speech signals. It can be divided into Speaker Identification and
Speaker Verification. Speaker identification determines which registered speaker provides a given
utterance from amongst a set of known speakers. Speaker verification accepts or rejects the identity
claim of a speaker - is the speaker the person they say they are?
Speaker recognition technology makes it possible to a the speaker's voice to control access to
restricted services, for example, phone access to banking, database services, shopping or voice mail,
and access to secure equipment.
Both technologies require users to "enroll" in the system, that is, to give examples of their speech to a
system so that it can characterise (or learn) their voice patterns.
In the FAQ:
On the WWW
In the FAQ...
SpeechLinks: General
Q1.1: What is comp.speech?
Q1.2: comp.speech ftp site
Q1.3: Common abbreviations and jargon
Q1.4: Related newsgroups and mailing lists
Q1.5: Associations, publications and conferences
Q1.6: Handicap Aids
Q1.7: Speech Databases
Q1.8: Speech File Formats and Conversion
Q1.9: Speech Laboratory Environments and Audio Editors
Q1.10: Speech Research Sites
Q1.11: Miscellaneous Software and Resources
There is now a newsgroup specifically for Natural Language Processing; comp.ai.nat-lang. A FAQ
posting is available for the group:
ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language_Processing_FAQ
There is also a lot of useful information on Natural Language Processing in the comp.ai FAQ. That
FAQ lists available software and useful references. It includes a substantial list of software,
documentation and other info available by ftp.
comp.speech FAQ
Table of Contents
SpeechLinks: Speech Technology Hyperlinks Pages
List Of Software/Hardware
Update Times
Availability
SpeechLinks: General
Q1.1: What is comp.speech?
Q1.2: comp.speech ftp site
Q1.3: Common abbreviations and jargon
Q1.4: Related newsgroups and mailing lists
Q1.5: Associations, publications and conferences
Q1.6: Handicap Aids
Q1.7: Speech Databases
Q1.8: Speech File Formats and Conversion
Q1.9: Speech Laboratory Environments and Audio Editors
Q1.10: Speech Research Sites
Q1.11: Miscellaneous Software and Resources
List of Software/Hardware/Information
The comp.speech FAQ provides information on a range of software, hardware and resources.
CUSeeMe
CyberPhone
DigiPhone
InterFACE from Hijinx
FAQ: How can I use the Internet as a telephone?
Nautilus: Secure Computer Telephony
NEVOT (1.4v) from AT&T BL
PGPfone
Speak Freely
Internet Phone from VocalTec
WebPhone
WebTalk
AF version AF3R1
Voice E-Mail from Bonzi Software
MicNotePad Recording Software for Macs
MixViews
Network Audio System Release 1.1
NIST Software - SPHERE and SCORE
Sound Processing Kit
TCPplay
Auditory Modeller 1
Auditory Modeller 2
Auditory Toolbox for Matlab
Human Audio Perception Document
BEEP dictionary
CMU dictionary
CUVOLAD dictionary (Oxford Dictionary)
Comprehensive Word List
EAT: Edinburgh Associative Thesaurus
Homophone List
Moby Lexical Resources
MRC Psycholinguistic Database
WordNet
Dictionaries on the WWW
The vOICe
The Learning Company's Language Training
Wildfire - an Electronic Assistant
DOS
CSRE: Computerized Speech Research Environment
Infovox Product Range
MBROLA: Free Speech Synthesis Project
ProVoice Developer's Speech Toolkit from First Byte
SENSYN speech synthesizer
spchsyn.exe
Tinytalk
ZMD Speech Synthesis
OS/2
ProVerbe Speech Engine from ELAN Informatique
Unix
AcuVoice
AsTeR
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
DECtalk: Text-to-Speech from Digital
ETI-Eloquence
Emacspeak - A Speech Output Subsystem For Emacs
Festival Speech Synthesis System
JSRU
Klatt-style synthesiser
KPE80 - A Klatt Synthesiser and Parameter Editor
"learph": Trainable text-to-phoneme software by Antonio Lucca
Lucent Technologies Bell Labs Text-to-Speech system
MBROLA: Free Speech Synthesis Project
Orator from Bellcore
ProVerbe Speech Engine from ELAN Informatique
rsynth
SENSYN speech synthesizer
SGI Developers Toolbox Synthesiser
Speak
TrueTalk
TruVoice from Centigram
Other Platforms
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
TheBigMouth (NeXT)
MBROLA: Free Speech Synthesis Project
Narrator Translator Library (Amiga)
Narrator (Amiga)
TextToSpeech Kit (NeXT)
Orator from Bellcore
SENSYN speech synthesizer
WreadFiles: File reader for Commodore Amiga
Unknown
Lernout and Hauspie Text-To-Speech (3 products)
SIMTEL
Text to Phoneme Program 1
Text to phoneme program 2
Text to phoneme program 3
DOS
DATAVOX - French
Dragon Developer Tools
Ficomp Interpreter 6000
Jialong He's Speech Recognition Research Tool
smARTspeak from Advanced Recognition Technologies, Inc.
Votan VPC2100 Voice Card and VSP 1010 Speech Processor
OS/2
IBM VoiceType Dictation and Control
Unix
AbbotDemo
BBN Hark Telephony Recognizer
EARS: Single Word Recognition Package
Ficomp Interpreter 6000
Hidden Markov Model Toolkit (HTK) from Entropic
IN CUBE
Jialong He's Speech Recognition Research Tool
Lotec Speech Recognition Package
Myers' Hidden Markov Model software
NICO Artificial Neural Network Toolkit
Nuance Speech Recognition System
PureSpeech
recnet
Other Platforms
Simon Says (NeXT)
Voice Command Line Interface (Amiga)
Visus SpeechKit
Unknown
Berkeley Restaurant Project (BeRP)
Lernout & Hauspie ASR (3 products)
Voice-Trek 2.0
Voicetek Corp.
Voice Processing Corporation Speech Recognition Product Line
Most general purpose audio editing packages will be able to process speech but do not necessarily
have some specialised capabilities for speech (e.g. formant analysis).
● Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An Evaluation" Journal of Speech
and Hearing Research, pp 314-332, April 1992.
❍ Signal editing
synthesizer
● Requirements: PC compatible (80486DX), 1 Meg RAM (recommend 4M), DOS 3.2
(recommend 6.22), VGA graphics (640x480; 16 colors) 30 Meg of hard disk space (5 Meg for
CSRE plus space for audio recordings), and a supported audio card .
● Cost: See AVAAZ WWW Pages
● Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G 2B0
Ph: +1-519-472-7944, Fax: +1-519-472-7814
Email: info@avaaz.com
WWW: http://www.icis.on.ca/homepages/avaaz/
● Note: See also the CSRE entry in Q5.5 on speech synthesisers.
❍ Real-Time Spectrogram
❍ Sona-Match
❍ Palatometer Database
❍ CSL-Pitch
❍ Synthesis Program
❍ Phonetic Database
❍ Direct-to-Disk Program
❍ Programmers Kit
❍ Condenser Microphone
❍ Multi-Speech
for a. the speech signal b. spectrograms c. phoneme labels, and other information.
❍ A Neural Network (NOPT) training package.
including: a. PLP Analysis, b. Rasta PLP Analysis, c. Linear Predictive Coding, d. Mel
Cepstrum Coding, e. Fast Fourier Transform
❍ A set of utilities for converting file formats such as ADC, NIST, mu-law, binary files,
the user to specify a particular label or set of labels in a given context, display all
occurrences of the label, and relabel the occurrences if desired.
❍ A Vector-Quantizer based on the Linde Buzo and Gray (LBG) algorithm.
❍ A set of PERL Scripts which have been used mainly to automate the use of the OGI
Speech Tools.
❍ MAN Pages for all routines and programs developed, as well as a User manual in both
SoundScope
● Platform: Macintosh: 68K and PowerPC native
● Description: The SoundScope product family is used primarily in speech teaching & research,
with some applications in animal sounds, forensics, and general acoustic analysis. It can
record, view, analyze, play, copy, paste, store and print sound waveforms. Analysis functions
include spectrogram, fundamental frequency (Fo), Linear Predictive Coding (LPC) including
formant tracking, LPC residual, jitter (pitch perturbation), shimmer (amplitude perturbation),
HNR, frequency spectrum, spectral slice, envelope, energy and zero crossing. Includes limited
built-in filtering, runs any filter created with WLFDAP. An integrated text editor stores notes
and calculation results. SoundScope lets you design your own custom "instrument" screen,
tasks (macros) and menus. Supplied instruments include 1 channel analyser (dual snap, dual
time, spectrogram, spectrum), 2 channel analyser, segment analyser, multi-channel recorder,
etc.
● Note: Supercedes MacSpeech Lab II.
● Price: $490 to $4990, less educational discount
● Availability: In North America, directly from GW Instruments. Contact the company for
international distributors.
● Contact: GW Instruments
35 Medford Street, Somerville, MA 02143, USA
Ph: +1-617-625-4096, Fax: +1-617-625-1322
Email: info@gwinst.com
A very good and very comprehensive list of audio file formats is prepared by Guido van Rossum. The
list is posted regularly to comp.dsp and alt.binaries.sounds.misc, amongst others. It includes
information on sampling rates, hardware, compression techniques, file format definitions, format
conversion, standards, programming hints and lots more. It is also available by ftp from
WWW: ftp://ftp.cwi.nl/pub/audio/index.html
Text: ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2
http://peace.wit.com/sounds/SoundConversion/
● Description: ALL Macintosh computers come with the ability to play back sounds at any
sample rate (sample rate conversion is done in software.) Older machines have 8 bit stereo
output (hardware runs at 22254 samples/second). The newer machines have 16 bit stereo
hardare running at 44100 samples/second.
Most of the recent Macintosh computers come with sound input hardware. There are probably
exceptions to this, but the older and some of the current low-end machines have 8 bit (linear)
mono hardware running at 22254.54 samples/second. All of the PowerPC, AV, and the 500
series notebook computers come with 16 bit 44kHz stereo sampling hardware. They can also
record at 22050 samples/second. The sound manager implements an AGC (Automatic Gain
Control) function for the 8 bit hardware. The drivers have a switch to turn off the AGC.
There are a number of DSP vendors that support high quality audio. Generally this means
quieter analog sections, and more IO formats (AES/IBU, for example). Try DigiDesign and
Spectral Innovations.
The software drivers for sound are described in "Inside Macintosh: Sound". If you want to see
some sample code check out the sources for the Matlab "Sound and Image Toolbox". They can
be found at
ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx
Routines that play and record sounds using the toolbox are included (and interfaced to Matlab).
PC Audio Hardware
Note: new soundcards are becoming available all the time - the information below is definately not up
to date. Check out the following newsgroups for up-to-date information.
● comp.sys.ibm.pc.soundcard
● comp.sys.ibm.pc.soundcard.GUS
● comp.sys.ibm.pc.soundcard.advocacy
● comp.sys.ibm.pc.soundcard.games
● comp.sys.ibm.pc.soundcard.misc
● comp.sys.ibm.pc.soundcard.music
● comp.sys.ibm.pc.soundcard.tech
● http://www.wi.leidenuniv.nl/audio/
● http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
Could someone please provide information on the audio capabilities of other Unix platforms?
● Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample rate. This provides telephone
quality sampling.
● Input and Output: Stereo (2 channels). 16-bit linear sampling. Multiple sample rates (48000,
44100, 37800, 32000, 22050, 18900, 16000, 11025, 9600, 8000 Hz)
The Silicon Graphics audio Frequently Asked Questions (FAQ) is the best place to get information on
SGI audio capabilities and programming. It provides information on connecting the audio output,
using the DSP capabilities, controlling the audio output, programming, useful software and more. It is
available from:
● WWW: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
● News: comp.sys.sgi.misc
● Ftp: ftp://viz.tamu.edu/pub/sgi/faq/
● Platform: Various
● Description: A range of signal I/O, A/D, D/A and DSP products are available. There are too
many to list.
● Contact: Ariel Corp.
433 River Road, Highland Park, NJ 08904.
Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124
Note: If you don't know what a newsgroup is, then talk to your local system administration about how to get access. A
useful newsgroups for beginners is news.announce.newusers. You might also find the following documents useful.
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Usenet?
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to_Frequently_Asked_Questions_about_Usenet
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_posting_to_Usenet
ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about_FAQs
● Speech Recognition - discussion of methodologies, training, techniques, results and applications. This should
cover the application of techniques including HMMs, neural-nets and so on to the field.
● Speech Synthesis - discussion concerning theoretical and practical issues associated with the design of speech
synthesis systems.
● Phonetic/Linguistic Issues - coverage of linguistic and phonetic issues which are relevant to speech technology
applications. Could cover parsing, natural language processing, phonology and prosodic work.
● Speech System Design - issues relating to the application of speech technology to real-world problems. Includes
the design of user interfaces, the building of real-time systems and so on.
● Other matters - relevant conferences, jobs, books, software, hardware, and products.
32 kbps ADPCM
● Platform: SGI and Sun Sparcs
● Description: 32 kbps ADPCM C-source code (G.721 compatibility is uncertain)
● Contact: Jack Jansen
● Availablity: http://www.cwi.nl/ftp/audio/adpcm.shar
G.711/721/723 Compression
● Description:
❍ G.711 : CCITT u-law and A-law compression
GET ITU-3022
Standard 1016 4800 bps CELP Voice Coder," Digital Signal Processing, Academic
Press, 1991, Vol. 1, No. 3, p. 145-155.
❍ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The DoD 4.8 kbps
Standard (Proposed Federal Standard 1016)," in Advances in Speech Coding, ed. Atal,
Cuperman and Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.
The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400 bps linear
prediction coder (LPC-10) was republished as a Federal Information Processing Standards
Publication 137 (FIPS Pub 137). It is described in:
❍ Thomas E. Tremain, "The Government Standard Linear Predictive Coding Algorithm:
The voicing classifier used in the enhanced LPC-10 (LPC-10e) is described in:
❍ Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced Classification of Speech
Capacity :
❍ Two half-duplex or one full duplex channels on the 20 MIPS 'C5x (at 95% and 55%
● Contact:
CVI Inc.
443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
Tel: (604) 987 1719 Fax: (604) 986 8139
Email: cvi@extropia.wimsey.com
❍ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16
❍ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16
❍ Input / Output Signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16
❍ Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with up to 16
❍ Input signal: A-Law or mu-Law PCM (64 kbps); Linear signal with 12-15 bits per
CyberVoice
● Description: Cybernetics InfoTech, Inc. offers the following products
❍ Telephone voice compression at 1.2, 2.4, 4.8 and 6.0 kbit/s with good-communications-
coded voice;
❍ Internet Voice E-mail software with voice editing, high-quality low-data-rate voice
Rockwell's DigiTalk
● Description: The DigiTalk coder operates at a sampling rate of 8KHz and transmits 223 bits of
coded speech every 26ms, giving an overall bit rate of 8.577Kbps. The algorithm is based on
analysis-by-synthesis predictive coding with vector-coded excitation, in which the excitation
signal is optimized by minimizing the perceptually weighted error between the original and
synthesized speech. More information and results of perceptual tests are available on the
WWW.
● Availablity: See the WWW page: http://www.nb.rockwell.com/ref/digitalk/
G.728 Compression
● Description: G.728 low delay celp package written by Alex Zatsman of Analog Devices, Inc.
● Availability: By anonymous ftp from
ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
MPEG Audio
MPEG (Moving Pictures Experts Group) is a standard methods for compression and transmission of
digital video and audio. Detailed FAQs and WWW sites are available for MPEG:
(22 KHz) sound. For Pentium-90 or above machines, MetaSound requires 40% CPU
bandwidth to deliver CD quality (44.1 KHz) sound.
❍ Portability: it can take less than one month to port to new hardware video decoders.
❍ User interface with full set of functions: volume control, stop, pause, forward,
backward, mute, resume, select the previous/next program track (Video CD 2.0),
randomly select a program track (Video CD 2.0).
❍ Error Recovery: can automatically skip error bitstreams.
Proprietary Standards
1. ACELP 8 v2.0 codec (flexible dual rate codec equipped with a VAD)
2. ACELP 4.8 codec
● Contact: Sipro Lab Telecom Inc.
770, Chemin Lucerne, Ville Mont-Royal (Quebec), H3R 2H6 CANADA
Ph: (514) 737-5874, Fax: (514) 737-2327
E-mail: sales@sipro.com
WWW: http://www.sipro.com/
StarAudio Compressor/Player
● Platform: Win95
● Description: Using a time-domain process delivers lossless decompressed data. Processes any
source of .wav file format, high quality 16-bit audio data at any sampling rate. Requires no
special hardware and decompression speed is real-time on most 486's and on any Pentium. The
higher the sampling rate the higher the compression ratio; minimum compression of 4:1 for
11k data, and usually exceeding 7:1 for 44k data. Full bandwidth of signal is preserved with
default compression options. Compression options allow increase of compression ratio further
with a slight trade off in the reduction of the output quality. A decompression library is
available for application development.
● Demo: Download the shareware version of the program from the STR WWW site.
● Misc: A technical paper is available in Word 6.0 format:
ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
● Contact: Speech Technology Research Ltd.,
Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
Ph: +1-250-477-0544
Email: products@speechtech.com
WWW: http://www.speechtech.com/home/speechtech/
❍ Wireless/cellular applications
❍ Games, Education
The TrueSpeech encoder is available for free in the Sound System of Windows 95 and
Windows NT. The DSPG WWW pages have information on how to add TrueSpeech capability
to your WWW pages.
● Contact: DSP Group, Inc.
3120 Scott Boulevard, Santa Clara, CA 95054-3317, USA
Phone: (408) 986-4300 Fax: (408) 986-4323
Email: Webster@dspg.com
WWW: http://www.dspg.com/index.html
rsynth
● Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI Irix4.x, Linux)
● Description: Public domain text-to-speech systm assembled from a variety of sources. It
supports CMU and BEEP format dictionaries (as described in Q1.10) and now utilises stress
marks in the dictionary in synthesising intonation.
● Price: Free
● Misc: Axel Belinfante has implemented a WWW rsynth demo:
http://wwwtios.cs.utwente.nl/say.
● Availability: by anonymous ftp from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.Z
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz
or compatible personal computers. The board can also be connected via the serial port.
Language and control program for downloading into RAM or mounted on EPROMs
❍ Platform: DOS/Windows with IBM PC, XT, AT, PS/2 model 30 or compatible
❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech
manager.
● Product name: INFOVOX 700, DESKTOP UNIT
❍ Product description: Desktop unit with built in Infovox 600 to be connected to any
❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech
manager
● Product name: INFOVOX 650, OEM BOARD
❍ Product description: OEM-board built with CMOS IC's. Language and control program
❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech
manager
● Product name: INFOVOX 750, DESKTOP UNIT
❍ Product description: Desktop unit with built in Infovox 650 to be connected to any
bit sound. Delivered on 3.5" diskettes with user lexicon and a complete documentation.
❍ Platform: Apple Macintosh with minimum 68030, 33 MHz microprocessor.
microprocessor.
❍ Delivered standard interfaces: Standard interface to Microsoft Windows 3.1 and sound
MacYack Pro
● Platform: Macintosh
● Description: MacYack Pro is a commercial speech package for Macintosh that uses the
PlainTalk Text-to-Speech synthesis software. Features include:
❍ Add speech to any word processor.
● Price: $29.95 for a limited time, reduced from $49.95 regular price. 30 days money back
guarantee.
● Contact: Scantron Quality Computers
20200 Nine Mile Rd. St. Clair Shores, MI 48080
Ph: 1-800-777-3642, Fax: 810-774-2698
E-mail: sales@sqc.com
WWW: http://www.sqc.com/
Product Info: http://www.lowtek.com/macyack/
AcuVoice
● Platform: Windows, Solaris
● Description: AcuVoice is a natural sounding text-to-speech system built using a concatenative
approach. Currently it is available for an American English Male Voice. Software Developer
Kits are available for the Windows Platform (32-Bit) and also for the Solaris Platform. More
information and samples are available on the Acuvoice web site.
● Contact: AcuVoice, Inc.
84 W. Santa Clara Street, Suite 720, San Jose, CA 95113-1810
Ph: 1(408)289-1661, Fax: 1(408)289-1201
Demo: 1(408)289-1177
Email: AcuVoice1@AOL.COM
WWW: http://www.acuvoice.com/
The AT&T Advanced Speech Products Group home page provides more detailed information
including a Frequently Asked Questions list, information for application developers on the
Independent Software Vendor (ISV) Program (including info on the SDK, licensing, and the
training program).
● Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz or higher (uses <
50% CPU).
● Cost and Availability: WATSON is a software-based speech platform with a Software
Developers Kit (SDK) that allows application developers to use voice processing in their
applications. It is not available as a stand-alone product.
Licensing information (inc. price) is provided in the AT&T Advanced Speech Products Group
home page
● See also: Watson BLASR speech recognition in Q6.5, Microsoft Speech API, and Advanced
Speech API.
● Contact: AT&T Advanced Speech Products Group
Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
Email: aspg@attmail.com
WWW: http://www.att.com/aspg/
Creative TextAssist
● Platform: Windows
● Description: Based on DECtalk speech synthesis. A detailed description of TextAssist is
provided on the Creative WWW pages. TextAssist TextReader provides a convenient
Windows user interface for text reading.
● Availability: Creative TextAssist is bundled with most (all?) Creative Sound Blaster audio
cards. TextAssist preview software is available from the Creative Labs TextAssist home page.
● Contact: Creative Labs, Inc.
Address, phone, email etc unknown
WWW: http://www.creaf.com/ : http://www.creaf.com/wwwnew/tech/devcnr/tassist.html
can be integrated with any Intel 486 processor-based system running DOS or Windows.
Applications can be interfaced to the bus via a DOS Terminate and Stay Resident (TSR)
driver or a Windows Dynamic Link Library (DLL). This option is available with an
external speaker with volume control and headphone jack.
❍ DECtalk Express external package: An external, portable package that you can plug in
to any PC or serial port. The external package includes a built-in speaker and
headphone jack, plus combined on/off and volume controls and a rechargeable battery
pack.
❍ DECtalk Software solution: Software-only text to speech for Alpha or Intel systems
DECtalk Software
● Platform: Digital UNIX and Windows NT
● Description: DECtalk converts standard ASCII text into natural, intelligible speech. Speech
output through any audio device is supported by Microsoft Video for Windows or Multimedia
Services for Digital UNIX. An API gives developers direct access to text-to-speech functions.
Provides nine voice personalities (4 female, 4 male, 1 child). Provides punctuation and tonal
control, supports customized pronunciation of trade jargon and acronyms. Common
programming interface works with both Alpha and Intel platforms.
● More Information:
Digital Equipment Corporation WWW pages: http://www.digital.com/
DECtalk Software page: http://www.systems.digital.com/DIcatalog/html/DECtalk-
Software.html
WWW: http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis.html
Ph: 1-800-DIGITAL
ETI-Eloquence
● Platform: MS Windows (Win95,NT,3.1), Solaris, SunOS, SGI, RS/6000
● Description: ETI-Eloquence is a software based text-to-speech system. It generates waveforms
completely algorithmically instead of by concatenating waveforms, for maximum flexibility
and naturalism. For instance, when the user requests a deeper voice, the software simulates a
larger vocal tract, instead of simply pitch-shifting samples. It uses high-level linguistic parsing,
which obviates the need for a huge dictionary. It handles numbers, acronyms, currency, etc. It
includes a set of annotation symbols, for placing stress on particular words, expressing
excitement/boredom, etc. Also allows phonetic input. Supports MS SAPI.
Produces male and female voices for General American English. Dialects under development
include Alabama and Brooklyn.
● Price: Flexible license agreements on application.
● Availability:Eloquent Technology, Inc.
2389 North Triphammer Road, Ithaca, NY 14850 , USA
Ph: (607) 266-7025, Fax: (607) 266-7030
Email: info@eloq.com
WWW: http://www.eloq.com/
HADIFIX
● Platform: Windows
● Description: German speech synthesis system developed at the Institute for Communications
Research and Phonetics , University of Bonn. Provides conversion of input text to phonemes,
automatic prediction of stress, phrasing and pitch, and speech generation by concatenation of
small units of natural speech. Demisyllables and similar units are used; they comprise all
consonants before the vowel and the beginning of the vowel (initial demisyllable) or the end of
the vowel and the following consonants (final demisyllable). For example, the word 'Strolch' is
formed by concatenating 'Stro' and 'olch'.
● Demo: Windows demo software available. Limited to synthesis of one short text (text.txt) at a
time. Speech format limitations too. 1.3MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
A 1993 version is available with unlimited synthesis from a string of phonemic symbols and
accent markers. 6MB file.
ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
● WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
● On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
Tinytalk
● Platform: DOS / Windows???
● Description: Shareware package is a speech 'screen reader' which is used by many blind users.
● Price: Tinytalk is now $150. There are package deals on Tinytalk with various speech
synthesizers.
● Availability: Tinytalk is available by anonymous ftp from the following site
Files: ttexe167.zip and ttdoc167.zip (executable and documenation)
ftp://ftp.netcom.com/pub/eb/ebohlman/
(Note: it is a busy ftp server.)
● Contact: Eric Bohlman
OMS Development
610-B Forest Ave., Wilmette, IL 60091
Ph: (800)831-0272 Fax: 708-251-5793
Outside North America: (708)-251-5787
Email: ebohlman@netcom.com
❍ No vocabulary restrictions
abbreviations.
❍ Multiple languages available: American English, Latin American Spanish, German,
French, Italian
❍ Flexible pitch, volume and speech rate
❍ Supports navigational capabilities such as, pause, resume and jump forward / jump back
WinSpeech
● Platform: Windows
● Description: WinSpeech is a text-to-speech application that reads text and produces speech to
the audio output. Features basic text editing tools, talk from editing window, DDE server
allows other Windows applications to send text for talking, coach mode for providing audio
instructions throughout the program, dictionary editing tools for customizing pronunciation.
WSPLIB text-to-speech DLL is a speech functions library for developers. More information
available by email.
● Requirements: System requirements: IBM PC or compatible computer with Windows 3.1 or
higher. Sound card is recommended but not required.
● Availability: Freeware available through the PC WholeWare WWW page.
● Contact: PC WholeWare
33 Justin Street, Lexington, MA 02173, U.S.A.
Email: info@pcww.com
WWW: http://www.pcww.com/index.html
spchsyn.exe
● Platform: DOS
● Availability: By anonymous ftp as a self extracting DOS archive.
ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
● Requirements: May require special TI product(s), but all source is there.
AsTeR
● Platform: UNIX
● Description: TTS front-end program which encodes structural information about documents in
speech synthesis. For more information check out:
http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html
● Operation requirements: Lisp: Lucid, clisp
● Contact: T. V. Raman
WWW: http://www.research.digital.com/CRL/personal/raman/raman.html
Email: raman@adobe.com
JSRU
● Platform: UNIX and PC
● Cost: 100 pounds sterling (from academic institutions and industry)
● Description: A C version of the JSRU system, Version 2.3 is available. It's written in Turbo C
but runs on most Unix systems with very little modification. A Form of Agreement must be
signed to say that the software is required for research and development only.
● Contact: Dr. E.Lewis eric.lewis@bristol.ac.uk)
Klatt-style synthesiser
● Platform: Unix
● Cost: Free
● Description: Software posted to comp.speech in late 1992.
● Availability: By ftp from the comp.speech ftp site
❍ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.gz
❍ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.04.tar.Z
TrueTalk
● Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or SGI Indy/Indigo/Indigo2
with IRIX 5.2. More platforms in development.
● Description: Personal TrueTalk, by Entropic Research Laboratory, Inc., is an all-software Text-
to-Speech (TTS) system designed to voice-enable UNIX X-Windows workstations. It
combines a graphical interface with a powerful TTS engine based on technology developed by
AT&T Bell Laboratories. Features include:
❍ Intelligible, prosodically natural speech.
❍ Text taken from file input, highlighted X selections, the interface scratch pad, other
programs connected through a TCP/IP socket, or Tcl/Tk applications via the Tk "send"
mechanism.
❍ Stop, pause and resume while speech is in progress.
❍ Nine speaking voices, with Male and Female versions of each voice.
● Misc: A more detailed description of TrueTalk is available on the Entropic WWW server:
http://www.entropic.com/truetalk.com
● Availability: You can obtain Personal TrueTalk through the Internet. For details, see
ftp://ftp.entropic.com/pub/truetalk/README.ptt
Personal TrueTalk is available free of charge for evaluation purposes. You can fully-enable
your evaluation copy at any time by purchasing a license key from Entropic.
● Requirements: 12MB disk space, 8MB process size (24MB system RAM recommended).
● Cost: US$495; US$395 academic
● Contact: Entropic Research Laboratory, Inc.,
Washington, D.C.
Voice: 1-800-ENTROPIC (North America), (202) 547 1420
Fax: (202) 547-6648
Email: truetalk@entropic.com
WWW: http://www.entropic.com/
Eurovocs
● Platform: Various - RS232 hardware connection
● Description: Eurovocs is a stand-alone text-to-speech synthesizer which uses the text-to-
speech technology of Lernout and Hauspie Speech Products. Available for Dutch, French,
German and American English with other languages planned for release soon. One Eurovocs
device can support two different languages. Eurovocs can be connected to any computer via a
standard serial interface (RS232). It supports personal dictionaries, generation of DTMF tones,
and pronunciation of special character sequences such as digit strings, telephone-numbers, date
and time indications, abbreviations, alphanumeric strings etc.
● Contact: Technologie & Revalidatie
Postbus 128, B-9000 Gent, Belgium
Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
E-mail: noe@elis.rug.ac.be
WWW: http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
Narrator
● Platform: Amiga
● Description: Formant based speech synthesis. Includes a Engish-to-phoneme translation
library, and a SPEAK: pseudo-device for speech output.
● Hardware: Standard Amiga hardware
● Availability: Part of AmigaOS
● See Also: The Narrator Translation library
TextToSpeech Kit
● Platform: NeXT Computers
● Description: The TextToSpeech Kit does unrestricted conversion of English text to
synthesized speech in real-time. The user has control over speaking rate, median pitch, stereo
balance, volume, and intonation type. Text of any length can be spoken, and messages can be
queued up, from multiple applications if desired. Real-time controls such as pause, continue,
and erase are included. Pronunciations are derived primarily by dictionary look-up. The Main
Dictionary has nearly 100,000 hand-edited pronunciations which can be supplemented or
overridden with the User and Application dictionaries. A number parser handles numbers in
any form. A letter-to-sound knowledge base provides pronunciations for words not in the Main
or customized dictionaries. Dictionary search order is under user control. Special modes of text
input are available for spelling and emphasis of words or phrases. The actual conversion of text
to speech is done by the TextToSpeech Server. The Server runs as an independent task in the
background, and can handle up to 50 client connections.
● Misc: The TextToSpeech Kit comes in two packages: the Developer Kit and the User Kit. The
Developer Kit enables developers to build and test applications which incorporate text-to-
speech. It includes the TextToSpeech Server, the TextToSpeech Object, the pronunciation
editor PrEditor, several example applications, phonetic fonts, example source code, and
developer documentation. The User Kit provides support for applications which incorporate
text-to-speech. It is a subset of the Developer Kit.
● Hardware: Uses standard NeXT Computer hardware.
● Cost:
❍ TextToSpeech User Kit: $175 CDN ($145 US)
❍ Pronunciations for many place names, personal names, foreign names, foreign
❍ Used with A1000 (OS 1.3), A3000 (OS 2.04-2.1), and A4000 (OS 3.0)
❍ Aminet ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha
❍ The use of control sequences to customize TTS output (adding pauses, using phonetic
input, etc.).
❍ Switching between languages at run time.
❍ Input formats: orthographic input, phonetic input, phonetic input with prosodic
information.
● tts2000/T
❍ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.
● tts2000/M
❍ Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
linear PC.
❍ Sampling Frequency: 8/10/11.025 kHz
68040
❍ Two processor platform examples: {Intel 386/486/Pentium or Motorola 68030} and
SIMTEL
A wide range of speech related software, sound-blaster software and signal processing software for
PCs is available on SimTel and its mirror sites. It can be obtained by ftp from:
ftp://ftp.coast.net/SimTel/msdos/voice/
http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
Voicemaker
The archives include the program Voicemaker which synthesises speech from phonemes using
"concatenation" of phonemes recorded by the user. Voicemaker is a freeware program. It requires an
IBM or compatible, 512KB RAM, sound blaster compatible sound card.
ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip
❍ hmm-1.03.tar.gz
❍ ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
sci.lang - Language.
Discussion about phonetics, phonology, grammar, etymology and lots more. A sci.lang FAQ is
available.
alt.sci.physics.acoustics
Some discussion of speech production & perception.
Mailing Lists
Voice-Users Mailing List
For discussion of any aspect of using voice recognition systems.
❍ Using such systems safely, without muscle or voice strain
foNETiks
A moderated monthly newsletter distributed by e-mail. It carries job advertisements, notices of
conferences, and other news of general interest to phoneticians, speech scientists and others.
The editors are Linda Shockey and Gerry Docherty. To subscribe send the following 1 line
message to
❍ mailbase@mailbase.ac.uk
Covers lots of areas include some speech topics including speech coding and speech
compression. Mail Peter Decker dec@dfv.rwth-aachen.de to subscribe.
Dragon NaturallySpeaking
● Platform: Windows
● Description: General purpose, continuous speech dictation system. Personal Edition has a
30,000 word active vocabulary and comes with a 200,000+ word pronunciation dictionary;
users can also add their own words or phrases.
More information on Dragon's NaturallySpeaking web site.
● Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM (Windows NT
4.0), supported sound card.
● Price: see Dragon's NaturallySpeaking web site.
● Related products: see general information below
● Contact: see general information below
● Platform: Windows
● Description: Speech-to-text dictation system. Discrete dictation; continuous command/control;
speaker-adaptive. Also provides mouse movement for hands-free operation of Windows.
Comes with a 120,000 word pronunciation dictionary; users can also add their own words or
phrases. Dictate directly into any application. Available in US and UK English, French, Italian,
German, Spanish, and Swedish. Add-on vocabularies for medicine, law, business and finance,
computers and technology, journalism.
Available as DragonDictate Singles Editions (10,000 words active), DragonDictate Personal
Edition (10,000 words active), DragonDictate Classic Edition (30,000 words active),
DragonDictate Power Edition (60,000 words active).
Includes Office97 support.
More information on the Dragon Systems web site.
● Requirements: 486/66, 7-10 MB dedicated RAM (depending on edition), Windows 3.1x, NT
3.51, or 95.
Supported sound boards: Creative Labs Sound Blaster 16, Microsoft Windows Sound System,
IBM M-Audio Capture/Playback Adapter, many notebooks with built-in audio.
See Dragon Systems Compatibility list for details.
● Price: Check at the Dragon Systems web site.
● Related products: see general information below
● Contact: see general information below
Dragon PowerSecretary
General Information
● Dragon NaturallySpeaking
● DragonDictate for Windows
● Dragon PowerSecretary
● General Information
● Dragon PhoneQuery
● DragonXTools
● Dragon SpeechTool
● Dragon VoiceTools
Contact:
Dragon PowerSecretary
● Platform: Apple
● Description: Information moved to the page on Dragon Dictation products including Dragon
PowerSecretary
(Previously Articulate PowerSecretary.)
❍ 50 word name vocabulary or 100 word phrase real-time recognition with 95% accuracy
The AT&T Advanced Speech Products Group home page provides more detailed information
including a Frequently Asked Questions list, information for application developers on the
Independent Software Vendor (ISV) Program (including info on the SDK, licensing, and the
training program).
● Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz or higher CPU
(uses < 50% CPU).
● Cost and Availability: WATSON is a software-based speech platform with a Software
Developers Kit (SDK) that allows application developers to use voice processing in their
applications. It is not available as a stand-alone product.
Licensing information (inc. price) is provided in the AT&T Advanced Speech Products Group
home page
● See also: Watson FlexTalk speech synthesis in Q5.5, Microsoft Speech API, and Advanced
Speech API.
● Contact: AT&T Advanced Speech Products Group
Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
Email: aspg@attmail.com
WWW: http://www.att.com/aspg/
Dragon PhoneQuery
● Platform: Windows NT
● Description: Software for building voice response systems. Callers are able to do the
following: Ask for information using completely natural and continuous language. Have a
spoken dialog to fine tune a request. Request information to be faxed, sent by electronic mail,
or read over the phone, using text-to-speech.
More information on the Dragon Systems telephony pages.
● Requirements: Pentium or Pentium Pro PC running Windows NT 4.0. Telephone interconnect
requirements vary by application.
● Related products: see general information below
● Contact: see general information below
DragonXTools
● Platform: Windows
● Description: VBX and OCX controls that allow an application to control DragonDictate's
capabilities, ranging from small vocabulary command and control to customized large
vocabulary dictation. More information is available on the Dragon Developer pages
● Related products: see general information below
● Contact: see general information below
Dragon SpeechTool
● Platform: Windows
● Description: Create small, optimized vocabularies for your speech-enabled applications, or
supplement DragonDictate's extensive built-in vocabularies with specialized terms and names.
More information is available on the Dragon Developer pages
● Related products: see general information below
● Contact: see general information below
Dragon VoiceTools
● Description: integrate small-vocabulary speech recognition directly into your DOS and
Windows 3.1x applications. More information is available on the Dragon Developer pages
● Related products: see general information below
● Contact: see general information below
General Information
● Dragon NaturallySpeaking
● DragonDictate for Windows
● Dragon PowerSecretary
● General Information
● Dragon PhoneQuery
● DragonXTools
● Dragon SpeechTool
● Dragon VoiceTools
Contact:
❍ UK: Legal
❍ IT: Radiology
IN CUBE
● Platform: Three versions for Windows 95, Windows NT and Sun SPARCstations
● IN CUBE for Windows 95: Developed for general purpose Windows 95 users. It is packaged
for online distribution with a full working demo and an option to register and unlock the full
product. The system uses Command Corp's Mark II continuous speech recognition engine and
handles changable lexicons of up to 75 commands.
❍ Price: $49.95 US
microphone.
❍ Requirements: Windows NT, Windows NT-compatible audio board (16-bit audio
recommended).
❍ Availability: http://www.commandcorp.com/cci/pront.html
Demo available.
● IN CUBE Voice Command for Sun SPARCstations: Provides continuous realtime speech
recognition system for window navigation and voice macro command input to the workstation.
Speaker-dependent training and ability to add new commands and macros.
An IN CUBE Application Programming Interface is available with a library of linkable object
modules is available for developers.
❍ Price: $495 per seat. The developer's API sells for $695.
❍ Requirements: SUN OS 4.1.x or Solaris 2.x with OpenWindows and Motif. Works with
❍ VoiceMED for Primary Care for family medicine, internal medicine and pediatrics
● Platform: Windows 95
● Description: Provides command and control speech recognition using SAPI (the Microsoft
Speech API) and "Whisper", Microsoft's speech recognition technology. Features include:
❍ Speaker independent, continuous, sub-word modeling, context free grammars
❍ Has its own letter-to-sound rules means it can recognize any words in a grammar.
NCC Dictate
● Platform: Windows
● Description: NCC Digital DictateTM is an add-on, enhanced interface for use with IBM's
VoiceType(TM) Dictation for Windows and various Windows 3.1 applications (e.g. MS Word,
WordPerfect). Digital DictateTM provides faster corrections and dictation rates and various
other features. This version is not a stand alone product; it requires VoiceTypeTM Dictation to
provide the speech recognition engine and the Windows application. Features include:
❍ Direct dictation into Windows applications with access to all functions while dictating.
❍ Versions for MS Word, WordPerfect, Ami Pro, and other Windows applications.
● Description: Dictation of medical findings using continuous speech recognition. Designed for
German speaking radiologists and encompasses the complete radiology vocabulary. The
authors use dictation stations (PCs) which are fitted with microphones. The transcriptionists
use editing stations (also PCs) which are additionally fitted with headphones and footswitches.
The SP6000s has a single speech recognition unit serving all users, and it offers automatic data
transfer as well as the advantages of digital dictation functions. For more information visit the
Philips SP6000s WWW page.
● More Information: For more information visit the Philips SP6000s WWW page or the Philips
Speech home page.
Whisper
See the new page for Microsoft speech recognition software.
DATAVOX - French
● Platform: PC / DOS
● Description: Continuous speech - speaker independent or dependent.
● Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an A/D - D/A module
(ASA116)
● Misc: Application software may dialog with DATAVOX through 2 types of interfaces :
❍ Keyboard overlay: The application software may be used with any PC compatible
simplification software).
❍ Large vocabulary : DATAVOX can recognize vocabularies of several thousand words
as long as there are no more than 500 words in the active vocabulary at any given node.
It takes less than 1 second to change syntax and vocabulary.
❍ Training controlled by the system (use of co-articulation models).
❍ Synthetis (ADPCM) can be heard simultaneously while recognition is being carried out.
● Contact: VECSYS
Le Chene rond, 91570 Bievres, France
Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30
● Votan VSP 1010 speech-processor board: can service a single voice channel, providing
recognition, voice output, and telephone interfacing. Digital signal processing is performed by
a TMS320 integrated circuit.
● Costs: Unknown
● WWW: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
● Contact: Votan Division, MOSCOM Corporation
6920 Koll Center Parkway, Suite 214, Pleasanton, CA 94566, USA
Ph: +1-510-426-5600, Fax: +1-510-426-6767
AbbotDemo
● Platform: SunOS4, IRIX, Linux, HU-UX
● Description: Large vocabulary, speaker independent, continuous automatic speech recognition
system. Uses recurrent neural networks and hidden Markov models with a 5,000 word
vocabulary upgradable) and a trigram word grammar. Includes a front end for waveform
capture and display (including spectrogram) and a graphical display of the phoneme
representation as well as a rewriting display of the best guess word sequence.
● Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster processor, 16 bit soundcard,
reasonable quality microphone and a copy of the Wall Street Journal newspaper.
● Price: Free for non-commercial use
● Availability: By anonymous ftp from
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
● Note 1: This is not a complete system for dictation.
● Note 2: At present there are no sources with this distribution. For sources for an earlier version
see the recnet entry.
● Note 3: Not supported.
● Contact: AbbotDemo@compute.demon.co.uk
Tony Robinson
Cambridge University Engineering Department
Trumpington Street, Cambridge, CB2 1PZ, UK
Tel: +44-1223-332815 Fax: +44-1223-332662
recnet
● Platform: UNIX
● Description: Speech recognition for the speaker independent TIMIT and Resource
Management tasks. It uses recurrent networks to estimate phone probabilities and Markov
models to find the most probable sequence of phones or words. The system is a snapshot of
evolving research code. There is no documentation other than published research papers. The
components are:
❍ A preprocessor which implements many standard and many non- standard front end
processing techniques.
❍ A recurrent net recogniser and parameter files
❍ Two Markov model based recognisers, one for phone recognition and one for word
recognition
❍ A dynamic programming scoring package. The complete system performs
competatively.
● Cost: Free
● Requirements: TIMIT and Resource Management databases
● Contact: Tony Robinson: ajr@eng.cam.ac.uk
● Availability: by anonymous ftp
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/recnet-1.3.tar.Z
daughtercard.
● Note: This recognizer is in its late "beta" stage of development and is available for U.S.
English vocabularies. Other languages are presently under development.
● Price: VCS software is priced at $350 per recognizer for unit quantities with volume discounts
available.
● See also: VCS Continuous Recognition above, VCS Isolated Word Speech Recognition below,
and the VCS 2030 & 2060 Voice Dialers.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/
automotive accessories.
❍ Consumer electronics: such as voice controllers for video games or VCRs and
televisions.
● Platform: Include Intel-X86, TI-C5X, C3X, C4X and C2X, OKI 6679, and NEC-V20 and
V30, and can operate on 16 bit microcontrollers. As a benchmark, 8 recognizers can run on an
Intel 486-33 DX.
● Availability: The technology is available under software licenses direct from VCS or by
purchasing hardware from an OEM. VCS OEMs include: Dialogic, Oki Semiconductor,
Intervoice, Periphonics, etc.
● Cost: VCS isolated word recognition software is available under a volume pricing license
agreement. Small quantity royalties are in the $500.00 per recognizer range while large
(millions) quantity royalties are less than $1.00 per recognizer.
● See also: VCS Continuous Speech Recognition and VCS Phonetic Dictionary Recognizer
above, and the VCS 2030 & 2060 Voice Dialers.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/
Visus SpeechKit
● Platform: NeXT
● Description: SpeechKit is based on SPHINX, a speaker-independent, 1000 word or so,
continuous speech recognition system which allows you to incorporate speech recognition into
your applications. You can design your vocabulary and grammars.
● Contact: Visus - no address or phone provided. A possible contact is Robert Brennan at
Carnegie Mellon University. email: Robert_Brennan@cmu.edu
❍ Line adaptation.
❍ Push to talk.
● asr1000/T
❍ Single channel platform examples: Motorola 56156, TI TMS320C2X/C3X/C5X
Motorola 96000
❍ Input: 8 kHz telephone sampling
● asr1000/M
❍ Single processor platform examples: Intel 486/Pentium
Voice-Trek 2.0
● Platform: Unknown.
● Description: VoiceTrek is primarily used by the United States Postal Service to sort mail.
Tardis Technology Inc. was created to develop and market applications that utilize speech
recognition. They do consulting work as well as turnkey systems.
● Contact: Tardis Technology Inc., Voice Recognition Div.
6444 E. Spring St., #286, Long Beach, CA 90815-1500, USA
Phone: +1-310-497-0077, Fax: +1-310-497-0080
Voicetek Corp.
● Platform: Unknown.
● Description:Voicetek Corporation provides voice processing solutions, training and consulting
services and an object-oriented, graphical Generations Platform for development of integrated
computer telephony systems.
● Contact: Voicetek Corporation
19 Alpha Road, Chelmsford, MA 01824, USA
Ph: +1-508-250-9393, Fax: +1-508-250-9378
WWW: http://www.voicetek.com/
The following are descriptions of the Voice Processing Corporation's VPro Product Line:
VProContinuous, VPro/XD, VPro/RT, VProCel, VProSpeller, VProPRL, VPro hardware
platforms, and the application Osprey.
More information is available on these products at the VPC WWW site: http://www.vpro.com/
● VProContinuous(TM) is a speaker-independent, continuous digit recognizer. It recognizes digit
strings spoken in a continuous manner, by any caller, without unnatural beeps or pauses.
VProContinuous uses out-of-vocabulary rejection and word spotting technologies to reject
extraneous words and phrases often spoken by callers. The VProContinuous vocabulary
consists of the words "zero" through "nine," "yes," "no," and "oh." The product is language-
independent. American English, Australian English, Brazilian Portuguese, Canadian French,
Castilian Spanish, French, German, Italian, Mexican Spanish, Portuguese, Swiss German and
U.K. English versions are available.
● VPro/XD(TM) is a discrete or multiword speech recognizer for extra-demanding applications
and/or vocabularies. This robust discrete product recognizes isolated discrete utterances (words
or very short phrases). VPro/XD utilizes proprietary out-of-vocabulary rejection and word-
spotting technologies. VPro/XD is speaker-independent and includes Talkover capability
allowing speech-interrupt over prompts. Pre-trained vocabulary libraries are available in
American English, Australian English, Brazilian Portuguese, Canadian French, Castilian
Spanish, Central American Spanish, German, Italian, Mandarin Chinese, Mexican Spanish,
Portuguese, Swiss German and UK English. Pre-trained vocabularies consisting of voice mail
words, voice dialing words, call control words, banking, and emergency words are available in
American English (both cellular and land-line).
● VPro/RT(TM) is a discrete speech recognizer for rapid training of vocabularies in the field.
This robust discrete product recognizes isolated discrete utterances. Application designers and
end-users define the vocabulary of their choice and train the system in real-time either prior to
system start-up, or adapting on-the-fly while the system is running live. Vocabularies can be
subset, and applications involving thousands of words can be developed quickly. VPro/RT,
which also supports Talkover, is suited to speaker-dependent recognition tasks, such as the
personal directory of names in a voice-activated dailing application. VPro/RT is also good for
applications that require speaker-independent vocabularies to be developed quickly in the field
or those that require many vocabularies. VPro/RT can also be used as a tool for quick
prototyping of applications.
● VProCel consists of speaker-independent VProContinuous, VPro/XD and speaker-dependent
VPro/RT specifically tuned for the cellular environment. The speaker-dependent discrete
feature of VProCel allows for a user-defined 20-word personal directory, with a one-pass
enrollment whereby users need only speak their chosen commands once. In addition, cellular-
ready VPro/XD vocabularies consisting of voice-activated dialing command words are also
available. VProCel is suited to voice-activated dialing applications using either digit strings or
a listing of words in a personal directory.
● VProSpeller is a recognizer that can determine which name or word is being spelled by a
caller. Users may spell a string of letters (up to 32 letters) in an uninterrupted manner (without
prompts or beeps between each letter). VProSpeller can recognize confusable letters by
conducting an automated search of a database of words maintained by the application for the
best candidates to match.
● VProPRL Designed for customers who wish to enable VPC speech recognition technologies on
platforms other than those supported by VPro hardware, the VProPRL is a portable recognizer
library of VProContinuous, VPro/XD and VPro/RT, which can be embedded into a wide
variety of hardware platforms. It consists of a library of object modules which can be linked
with a user application or task.
● VPro Hardware Platforms: VPro-42, VPro-84, VPro-88 : The VPro platforms are ISA
compliant PC/AT boards. Each supports four to eight Virtual Speech Processors (VSPs). Each
VSP, depending on load factors, can handle multiple telephone lines. Application and host
computers communicate with each of the VSPs as separate autonomous units. VPro platforms
use Texas Instruments TMS320C31 microprocessors which provide up to 133 MFLOPS of
compute power. The platforms can have up to 8 megabytes of memory shared among all
processors. In addition, each processor has 512K bytes of local memory. Both the PEB and
MVIP PCM audio buses are supported by all VPro platforms.
● Osprey is a call management software application that performs the kinds of telephone related
activities typically done by a personal assistant, such as answering the phone, screening callers,
routing calls, and taking and delivering messages. It is an automated phone attendant.
● Price and availability: Contact Voice Processing Corporation
● Contact: Kelli V. Smith
Voice Processing Corporation
1 Main Street, Cambridge, MA, 02142 USA
Ph: (617)494-0100 Fax: (617)494-4970
e-mail: KSmith@vpro.com
WWW: http://www.vpro.com/
For more information on features, hardware and software requirements, pricing and
availability, contact Voice Control Systems, Inc. or visit their the VCS WWW site or the
SpeechPrint ID WWW page.
● See also: VCS speech recognition products in Q6.5.
● Contact: Voice Control Systems, Inc.
14140 Midway Rd., Dallas, Tx. 75244, USA
Ph: +1-214-386-0300, Fax: +1-214-386-5555
Email: sales@vcsi.com
WWW: http://www.voicecontrol.com/
recognition
❍ Real-time vocabulary generation directly from text
❍ Database integration
❍ "Barge-in" capability
❍ Support for multiple platforms and operating systems (e.g., SCO UNIX, WindowsNT,
etc.)
● DialogModules: manage the "conversation" between the system and the caller within an
application. They provide high-level application building blocks which enable developers to
quickly and easily add speech interfaces to computer telephony applications. Each
DialogModule accomplishes a particular task within an application, ranging from "simple"
tasks such as capturing a yes/no response or a phone number, to more complex tasks such as
capturing credit card information or name and address information.
DialogModules provide "out-of-the-box" functionality. They contain pre-built grammars, user-
interface design, internal call flow and error recovery routines, parameters for customization
and a set of C++ class libraries and C APIs.
● SpeechBuilder: provides tools for customizing the DialogModules and for developing and
maintaining applications. A GUI-based Vocabulary Editor provides the ability to generate and
maintain vocabulary or word lists. Pronunciations can be generated automatically using the
built-in dictionary or can be automatically generated using a set of text-to-phoneme rules.
● Product Bundles: are available which combine SpeechWorks and multiple DialogModules into
application templates for a set of generic application categories.
❍ SpeechForms SpeechForms provides an interactive method for entering data over the
phone, such as ordering products, filling out surveys and completing registration forms.
Typical applications include: order entry, reservations, catalog and literature requests,
catalog shopping, subscriptions, change of service, claims, credit card activation, home
banking, stock transactions, and warranty reservations.
❍ SpeechQuery SpeechQuery is used to deliver information in response to voice requests
over the phone, such as airline information, product delivery status and retirement
benefit information. Typical applications include: order status, product information,
account balance, flight status, movie listings, job listings, stock quotes, guide
services,classified ads, claims status, dealer locator services, and technical support.
❍ SpeechAgent SpeechAgent provides a set of modules for automating telephone-based
attendants.
More information: is available on the Nortel Multimedia Network Applications WWW page
for AudioGram Delivery Service.
● Nortel's Voice-Activated Auto Attendant (VAAA):
Replaces touch tone menu with easy-to-use voice interface. Geared to businesses and
corporations to provide more effective management of incoming customer calls. Residing on
the Network Applications Vehicle (NAV) platform, VAAA uses Flexible Vocabulary
Recognition (speaker-independent) technology to recognize spoken words, and directs calls
accordingly. Other features include:
❍ Cost-saving common service platform (NAV)
❍ Handles incoming calls for all corporate users (Centrex, PBX, or key systems)
More information: is available on the Nortel Multimedia Network Applications WWW page
for Voice-Activated Auto Attendant.
● Nortel's Voice-Activated Dialing (VAD):
Phoneme-based speech dialing capabilities provided through speaker-trained and speaker-
independent technologies. Residing on the Network Applications Vehicle (NAV) platform,
VAD enables subscribers to dial using speech, as well as to create and customize personal
telephone directories. Other features include:
❍ Cost-saving common service platform (NAV)
❍ Speech Recording
❍ Word-spotting
❍ Directory sharing
❍ Talk-through
More information: is available on the Nortel Multimedia Network Applications WWW page
for Voice-Activated Dialing.
● Nortel's Voice-Activated Premier Dialing (VAPD):
Enables businesses to take advantage of the public network directories to stimulate customer
calls. Residing on the Network Applications Vehicle (NAV) platform, VAPD uses Flexible
Vocabulary Recognition (speaker-independent) technology to recognize business names, and
routes calls to the appropriate business entity. VAPD promotes cost savings by utilizing a
common service platform, the Network Applications Vehicle (NAV). It services DTMF callers
as well as rotary dialers, and handles incoming calls for all corporate users: Centrex, PBX, and
key systems. More information: is available on the Nortel Multimedia Network Applications
WWW page for Voice-Activated Premier Dialing.
● Platform: This speech-based service operates on the Network Applications Vehicle (NAV)
platform. NAV is a multi-application, digital signal processing platform supporting both
speech- and display-based applications. The NAV platform provides the speech recognition
capabilities and application logic used by NAV features an open, modular hardware
architecture and flexible software design. Other features include:
❍ Scalable hardware - from 24 to over 2000 ports per NAV node; 1 to 24 independent
processing support
❍ Reliability - N+1, N+M, and 2N redundancy
● See Also: Nortel Feature Planning Guide, reference number 50004.11; NAV Applications and
Planning Guide, reference number 50118.16.
Nortel's Multimedia web pages: http://www.nortel.com/entprods/multimedia/
● Contact: NORTEL
Multimedia Communications Systems Division
Multimedia Network Applications
1000 Park Forty Plaza
Durham, NC 27713 USA
Ph: 1-800-4NORTEL
WWW: http://www.nortel.com/entprods/multimedia/
● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/
Comp.speech Archives
The comp.speech ftp site provides full archives of the comp.speech newsgroup dating back to the
creation of the group in 1991. The postings are stored in the order in which they arrive. Batches of
1000 articles are grouped into gzip'ed tar file. Matching files listing the subjects are also provided.
● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/
The comp.speech ftp site includes a wide range of useful software and resources. Tony has arranged it
into a series of sub-directories:
Associations
Institute of Electrical and Electronics Engineers (IEEE)
Europe, It will help users and developers of European language resources, as well as
government agencies and other interested parties, exploit language resources for a wide variety
of uses. It will also oversee the distribution of language resources via CD-ROM and other
means and promote standards for such resources.
● More info: see the ELRA Home page for membership information, lists of resources etc.
● Contact: K. Choukri, Executive Director ELRA
87, Avenue d'Italie, 75013 Paris, FRANCE
Ph: +33 1 45 86 53 00, Fax: +33 1 45 86 44 88
Email: elra@calvanet.calvacom.fr
WWW: http://www.icp.grenet.fr/ELRA/home.html
● Conference: SST, the Australian conference on Speech Science and Technology, is held bi-
annually. SST-96 will be held in Adelaide.
● WWW: Home Page: http://cslab.anu.edu.au/~bruce/assta/
List of members: http://ciips.ee.uwa.edu.au/~roberto/assta-users/
Linguistic Associations
Industry Publications
ASR News
● Description: Monthly newsletter covering developments in the speech recognition and speech
synthesis marketplace.
● Note: Voice Information Associates also publish "Automatic Speech Recognition: A study of
the world-wide market" (revised 1995) and "Text-to-Speech Technology Markets: 1995-2000"
(revised 1995)
● Contact: Voice Information Associates, Inc.
14 Glen Road South, P.O. Box 625, Lexington, MA 02173, USA
Ph: +1-617-861-6680, Fax: +1-617-863-8790
Email: asrnews@tiac.net
WWW: http://www.tiac.net/users/asrnews/
Voice News
● Description: Monthly newsletter reporting on voice mail, voice response, speech recognition,
speech synthesis, digital voice record/playback and related technologies, markets and company
activities. Review copy available on request.
● Contact: Stoneridge Technical Services
P.O. Box 1891, Rockville, MD, 20849, USA
Ph: +1-301-424-0114, Fax: +1-301-424-8971
Email: info@stoneridgetech.com
WWW: http://www.stoneridgetech.com/
● Description: Monthly news and analysis of speech recognition markets, applications and
technology.
A free sample copy is available by contacting TMA Associates.
● Also: TMA Associates also publishes market studies, including The Advanced Speech
Technology Market: Recognition, Synthesis and Compression (1996) and Voice ID (1996).
● Contact: TMA Associates
6021 Wish Avenue, Encino, CA 91316, USA
Ph: +1-818-708-0962, Fax: +1-818-345-2980
Email: 72162.3172@compuserve.com
http://www.tmaa.com/
● Description: Follows integrated PC LAN messaging (voice, fax, mail, video) and speech
technology. It follows the merging computer and telephone technologies, provides insights into
business and marketing opportunities and offers executive timely information on industry trend
analysis.
● Contact: Phillips Business Information
1201 Seven Locks Rd., Potomac, Maryland, 20854, USA
Ph: 1-800-777-5006 OR +1-301-340-1520
Subscription FAX: +1-301-309-3847
Editorial FAX: +1-424-4297
Telleconnect
● Contact: +1-212-691-8215
Computer Telephony
● Contact: +1-212-691-8215
● Contact: 1-800-854-3112
Speech Technology
Speech Communication
● Description: A Web Journal dedicated to the state of the art in human language technology.
Past volumes, editorial and submission information, and so on are
● Contact: Editor-In-Chief: Ron Cole: cole@cse.ogi.edu
WWW: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html
● Description: online access to all abstracts published in Linguistics Abstracts since 1985, plus
all current material as it becomes available. Over 250 publications are indexed. Free trial
available.
http://www.blackwellpublishers.co.uk/labs/
Computational Linguistics
● Description: Focuses on speech technology and its applications, and promotes research and
description of all aspects of speech input and output: applications, base technology, theory,
approach, experiment, and testing.
● Publisher: Kluwer Academic Publishers
101 Philip Drive, Norwell, MA 02061, USA
Ph: +1-617-871-6300, Fax: +1-617-871-0449
● Submissions to: International Journal of Speech Technology
Journals Editorial Office, Ms. Kelly Riddle
Kluwer Academic Publishers
(Address, phone, fax as above)
Email: krkluwer@world.std.com
Conferences
ICSLP: Intl. Conference on Spoken Language Processing
Next: 30 Nov to 4 Dec, 1998, Sydney, Australia
Held in even years.
Eurospeech
Man-Machine Interfacing
SpeechViewer II
Some databases are free but most are not. The databases normally require lots of storage space (100's
of MBytes is not unusual). Do not expect to be able to ftp large amounts of speech data.
In addition to the descriptions of speech databases and speech database providers below, information
can be obtained from
Most speech research sites have links to other speech research sites somewhere in their WWW pages.
CUSeeMe
CyberPhone
DigiPhone
InterFACE from Hijinx
FAQ: How can I use the Internet as a telephone?
Nautilus: Secure Computer Telephony
NEVOT (1.4v) from AT&T BL
PGPfone
Speak Freely
Internet Phone from VocalTec
WebPhone
WebTalk
AF version AF3R1
Voice E-Mail from Bonzi Software
MicNotePad Recording Software for Macs
MixViews
Network Audio System Release 1.1
NIST Software - SPHERE and SCORE
Sound Processing Kit
TCPplay
Auditory Modeller 1
Auditory Modeller 2
Auditory Toolbox for Matlab
Human Audio Perception Document
BEEP dictionary
CMU dictionary
CUVOLAD dictionary (Oxford Dictionary)
Comprehensive Word List
EAT: Edinburgh Associative Thesaurus
Homophone List
Moby Lexical Resources
MRC Psycholinguistic Database
WordNet
Dictionaries on the WWW
Dynastat, Inc.
Speech Intelligibility Testing with Diagnostic Rhyme Test (DRT), Modified Rhyme Test
(MRT), Phonetically Balanced Word Lists (PB), Diagnostic Medial Consonant Test (DMCT),
Diagnostic Alliteration Test (DALT), ICAO Spelling Alphabet Test (SpAT)
Speech Quality (Acceptability) Evaluation with Diagnostic Acceptability Measure (DAM),
Very Miscellaneous
The vOICe
The Learning Company's Language Training
Wildfire - an Electronic Assistant
Man-Machine Interfacing
● Description: Offers a service designed for people with physical challenges. Can successfully
implement a computerized voice controlled system adapted to unique needs.
They have developed a free-standing microphone and signal processing system to compensate
for speech/articulation distortions, and background noise produced by electronic devices such
as wheelchairs and respirators.
● Contact: Man-Machine Interfacing
P.O. Box 5371, Evanston, IL 60204
Ph: 1-888-425-2001, Fax : (847) 328-7975
Email: jwhite@mcs.com
WWW: http://www.speechrec.com/
SpeechViewer II
● Platform: IBM Machines from Mod 25 on.
● Description: SpeechViewer II is a speech therapy tool. It provides graphical feedback of
various speech features so that speech impaired individuals can improve their speech. It works
with an audio bandwidth of 7.3 Khz and thus allows the therapist to work with sustained
vowels and fricatives. A wide range of graphics are used to provide adequate variability to hold
client interest. An extensive set of statistics are gathered which allows a therapist to do
research or keep therapy records. The speech therapy modules are:
❍ Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing
A multilingual option is available which provides support for 12 languages: Danish, Dutch,
Finnish, French, German, Icelandic, Italian, Norwegian, Portuguese, Spanish, Swedish, and
UK English. With the Multilingual Option, clinicians can use SpeechViewer II as a training
tool for English as a second language and for foreign language training.
● Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture Playback Adapter). It has
a TI TMS320C25 DSP chip. The input sampling rate is 44.1 Khz stereo, 88.2 Khz mono. This
is a 16 bit card. It has the following jacks: mic in, stereo line in, stereo line out, speaker out.
Note: This card is being replaced by Mwave technology. For more info on Mwave contact
Texas Instruments.
● Price:
❍ The software is $2130 list, $1491 educational, part number 92F2066.
Speech Corpora
Text Corpora
Lexical Databases
Contact information:
● CSLU has released for universities its Continuous English Speech Corpus. The corpus contains
recorded speech from 690 different speakers, with label files at various levels - including word
level and phonetic labels. The data were collected as part of the OGI Multi-language telephone
corpus. CSLU provides speech corpora to all universities without charge. To order a corpus,
print the license agreement/order form, complete it, and fax it to the CSLU. A description of
the corpora and an order form are available:
http://www.cse.ogi.edu/CSLU/
ftp://speech.cse.ogi.edu/pub/releases
● Contact: Mike Noel: noel@cse.ogi.edu
● Description: The UCLA Sounds of the World's Languages are available for Macintosh users
(no DOS based system currently available). The sounds are stored in a Hypercard database
developed at the UCLA Phonetics Laboratory. The aim is to illustrate and teach about the
range of sounds used in human languages with material on more than 80 languages. The set
demonstrates particular highlights of the sound systems focusing especially on rarer sounds
that students may not otherwise have a chance to hear from a native speaker. The recordings
are based on the archives of recordings collected at UCLA, with additional contributions from
outside collaborators. All the languages can be accessed from the list of language names, or by
clicking on the language name in a set of maps. Support for part of this work was provided by
NSF. The database currently includes examples of languages from Agul and Akan to Zulu.
● Availability: 15 DSDD disks, requiring about 35 meg of disk space when expanded. Available
for $50 individual $100 institutions. Prepayment in US dollars (checks or international money
orders payable to "UC Regents") must accompany all orders.
● Contact: The UCLA Phonetics Laboratory
Linguistics Department, UCLA, Los Angeles, CA 90095 1543
Tel: (310) 825-1254
E-mail: oldfogey@ucla.edu
NOISEX-92
● Description: Database of recording of various noises available on 2 CDROMs. Some material
from the same source is available by anonymous ftp in the IEEE's Signal Processing
Information Base. The samples include
❍ Voice babble
❍ Factory noise
❍ Various military noises; fighter jets (Buccaneer, F16), destroyer noises (engine room,
● Availability 1: The cost of this database is 135 Pounds Sterling for the set of two CD-ROMs.
Send payment with order to:
The Speech Research Unit,
Ex1, DRA Malvern, St.Andrew's Road,
Malvern, Worcestershire, WR14 3PS, UK
Tel +44-684-894074 Fax +44-684-894384
Note: The supply of CD-ROMs is limited so please check that they are still available before
placing an order. The only acceptable methods of payment are cheques (from the UK only) or
bank drafts in Pounds Sterling drawn on a UK bank. They should be made payable to:-
Public Sub Account HMG 4768.
● Availability 2: Information on how to obtain a copy of the NATO RSG.10 NOISE-ROM-0 can
be obtained from the DRA Speech Research Unit (address above) or from:
Dr. Herman Steeneken,
TNO Institute for Perception,
P.O. Box 23, 3769 ZG Soesterberg,
The Netherlands.
● Availability 3 (WWW): Examples of the NOISEX database are available on the Rice University
Digital Signal Processing (DSP) group home page. (Note the files are large (>20MB).
http://spib.rice.edu/spib/select_noise.html
Phonemic Samples
● Some basic data. The following ftp sites have samples of English phonemes (American accent
I believe) in Sun audio format files. See Question 1.8 for information on audio file formats.
ShATR
● Description: Multi-simultaneous-speaker corpus available on one CDROM. This specialised
corpus is primarily intended to provide acoustic material for studies in auditory scene analysis.
However many researchers in the speech sciences, ranging from acoustics to discourse analysis
may find it a valuable source of information. The corpus has been transcribed and aligned at
four different levels of analysis. An overlap analysis between the individual speaker channels
and word counts are available. There is also a general tool for accessing concurrent events in
transcribed multi-sound-source databases.
● Cost: 30 Pounds Sterling for one CD-ROM. Availability, licensing and ordering information is
provided on ShATR's home page.
● Examples: Samples of the ShATR database are available on ShATR's home page and by
anonymous ftp
ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
● Contact: Speech and Hearing Research Group
Department of Computer Science, University of Sheffield
Regents Court, 211 Portobello Street, Sheffield S1 4DP, U.K.
WWW: http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.html
GoldWave
● Platform: Windows
● Description: GoldWave is a digital audio editor for Microsoft Windows. It features realtime
amplitude/spectrum oscilloscopes, large file editing, effects, and support for a wide variety of
sound formats.
❍ Editing of multiple waveforms and large waveforms
❍ Effects: distortion, Doppler, echo, filter, mechanize, offset, pan, volume shaping, invert,
Khoros
● Platform: Any Unix - source code available.
● Description: Khoros is a technical computing environment for image and signal processing,
visual programming and software development.
● Price: On request.
● Availability: Khoral Research Inc.
6001 Indian School Rd. NE Suite 200, Albuquerque, NM 87110, USA
Ph: (505)837-6500, Fax: (505) 881-3842
Email: info@khoral.com
ftp: ftp://ftp.khoral.com/
WWW: http://www.khoral.com/
N!Power
● Platform: SUN, DEC and HP workstations.
● Description: An object-oriented software package with a MOTIF GUI interface and a range of
functionality for data analysis/editing, signal analysis, speech processing, real-time A/D and
D/A, and 2D/3D interactive graphics. N!Power replaces ILS.
N!Power can provide a Block Diagram user interface, menus, pop-ups, and a high-level IEEE
standard symbolic scripting language. You can customize the blocks, menus and pop-ups with
mouse point-and-click operations.
● Contact: Signal Technology, Inc.
104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126
Phone: +1-805-899-8300, Fax: +1-805-899-4344
Email: stisales@signal.com
WWW: http://www.silcom.com/~stilarry/
Ptolemy
● Platform: Sun SPARC, DecStation (MIPS), HP (hppa).
● Description: Ptolemy provides a highly flexible foundation for the specification, simulation,
and rapid prototyping of systems. It is an object oriented framework within which diverse
models of computation can co-exist and interact. Ptolemy can be used to model entire systems.
Ptolemy has been used for a broad range of applications including signal processing,
telecomunications, parallel processing, wireless communications, network design, radio
astronomy, real time systems, and hardware/software co-design. Ptolemy has also been used as
a lab for signal processing and communications courses. Ptolemy has been developed at UC
Berkeley over the past 3 years. Further information, including papers and the complete release
notes, is available from the FTP site.
● Cost: Free
● Availability: The source code, binaries, and documentation are available by anonymous ftp
from
ftp://ptolemy.berkeley.edu/pub/README
❍ Speech Recognition: provides detailed control of a speech recognition engine for both
❍ Conference calling.
❍ Voice mail.
❍ Caller identification.
Windows 95 comes with a telephony application, DIALER.EXE, that can dial voice calls, act
as a proxy for applications making simple telephony requests, and maintain a call log.
● More information: The Win32 Software Development Kit (SDK) contains documentation,
tools, and sample code for TAPI including the Microsoft Telephony Programmer's Reference
and the Microsoft Telephony Service Provider Interface (TSPI) for Telephony.
WWW: Tapping in TAPI, TAPI White Paper
● See also: SAPI: Microsoft Speech API
CUSeeMe
● Platform: Macintosh and Windows
● Description: Cornell University software for audio and video conferencing over the Internet.
● Requirments: Macintosh to RECEIVE video:
❍ Macintosh platform with a 68020 processor or higher
❍ Quicktime installed
For Windows:
❍ Video receive only 386SX, Video send & receive 386DX, Video receive w/Audio
❍ Winsock
❍ Video camera and a video capture board that supports Microsoft Video For Windows
❍ For audio: Windows Sound board that conforms to the Windows MultiMedia
CyberPhone
● Platform: Sun Workstations running Solaris 2.x (SunOS 5.x)
● Description: Provides voice communications over the internet. Has a graphical user interface
and requires no additional hardware. An optional centralized server system is available to make
finding and connecting to other users easier.
● Availability: a free demonstration is available by anonymous ftp
ftp://magenta.com/pub/cyberphone
● Contact: Email: cyberphone@magenta.com. More information is available on the WWW:
http://magenta.com/cyberphone/.
DigiPhone
● Platform: Macintosh, Windows 3.1 and Windows 95
● Description: DigiPhone provides two-way phone conversations by dialing direct and over the
Internet. Includes encryption for privacy, caller ID, call screening, call timer, adjustable sound
and compression quality, messaging, and access to the Global Directory providing a database
of DigiPhone users.
❍ DigiPhone v1.03: provides the standard features listed above. [ More information].
❍ DigiPhone Deluxe: provides the standard features of DigiPhone v1.03 and adds
conference calling, mute, speed dial, call recording and playback, voice effects,
customizations, and internet tools. [ More information].
❍ DigiPhone for Mac: provides the standard features listed above, plus cross-platform
❍ What is multicasting?
❍ Windows: Speak Freely, CU-Seeme, Internet Phone, Digiphone, Internet Voice Chat, Internet Global
● Availability:
By Email
Mail voice-faq-request@northcoast.com
with "Subject: archive"
and "Body: send voice-faq"
FTP
ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_How_can_I_use_the_Internet_as_a_telephone?
WWW:
http://rpcp.mit.edu/~asears/voice-faq.html
● Contact: Andrew Sears: asears@mit.edu
Kevin Savetz: savetz@northcoast.com
❍ ftp://ripem.msu.edu/pub/crypt/README
PGPfone
● Platform: Macintosh and Windows
● Description: Pretty Good Privacy Phone is free secure audio connection software for the
internet. It uses speech compression and strong cryptography protocols to give you the ability
to have a real-time secure telephone conversation via a modem-to-modem connection.
● Requirements (Mac): Fast modem: at least 14.4 Kbps V.32bis (28.8 Kbps V.34 recommended).
An Apple Macintosh with at least a 25MHz 68LC040 processor (PowerPC recommended),
running System 7.1 or above, Thread Manager 2.0.1, ThreadsLib 2.1.2, and Sound Manager
3.0. (These are available from Apple's FTP sites.)
● Requirements (Windows): Fast modem: at least 14.4 Kbps V.32bis (28.8 Kbps V.34
recommended). A multimedia PC running Windows 95 or NT, with at least a 66 MHz 486
CPU (Pentium recommended), sound card, microphone, and speakers or headphones.
● Contact: Jeffrey I. Schiller
Email: jis@mit.edu
WWW: http://web.mit.edu/network/pgpfone/
Speak Freely
● Platform: Windows and Unix
● Description: Free "Internet Phone" software supporting voice mail, multicasting, encryption
and several coding methods. Includes 4 forms of data compression and encryption with DES,
IDEA and PGP. The Windows and Unix versions are compatible. You can designate a bitmap
file to be sent to users who connect so they can see who they're talking to. The Unix version
does not have the graphical user interface of the Windows edition, but supports all its
compression and encryption modes.
● More information: http://www.fourmilab.ch/netfone/windows/speak_freely.html
WebPhone
● Platform: Windows
● Description: WebPhone provides telephone quality, real-time, full duplex, encrypted, point-to-
point voice communication over the Internet and other TCP/IP based networks. (More detail
provided on the NetSpeak WWW pages).
● Requirements: 80486DX-33 MHz running Windows 3.1 or higher, 4 MB of RAM, MCI
compliant sound card, Winsock 1.1 compliant stack, 14.4Kbps modem, VGA card capable of
displaying 256 colors. Full duplex audio card required for full duplex.
● Price: $49.95 (US)
● Availability: via the WWW: http://www.netspeak.com/getphone.html
● Contact: NetSpeak Corporation
902 Clint Moore Rd., Boca Raton, Fl. 33487, USA
Ph: +1-407-997-4001, Fax: +1-407-997-2401
Email: info@netspeak.com
WWW: http://www.netspeak.com/
WebTalk
● Platform: Windows 3.1/95
● Description: Full-duplex or half duplex, telephone-quality voice, supports many commercial
web browsers.
● Contact: Quarterdeck Corporation
13160 Mindanao Way, 3rd Floor, Marina Del Rey, CA 90292-9705, USA
Ph: +1-310-309-3700, Fax: +1-310-309-4217
Email: info@quarterdeck.com
WWW: http://www.quarterdeck.com/
AF version AF3R1
● Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI
● Description: The AF System is a device-independent network-transparent system including
client applications and audio servers. With AF, multiple audio applications can run
simultaneously, sharing access to the actual audio hardware.
The AF3R1 distribution of AF includes server support for Digital RISC systems running
Ultrix, Digital Alpha AXP systems running OSF/1, SGI Indigo running IRIX 4.0.5, Sun
Microsystems SPARCstations running SunOS 4.1.3, and Sun Microsystems SPARCstations
running Solaris 2.3. The servers support audio hardware ranging from the built-in CODEC
audio on SPARCstations and Personal DECstations to 48 KHz stereo audio using the
DECaudio TURBOchannel module or the SPARCstation DBRI interface
● Availability: The source kit is distributed by anonymous ftp from
ftp://crl.dec.com/pub/DEC/AF
WWW: http://www.research.digital.com/CRL/projects/AF/home.html
● Contact: af-request@crl.dec.com
MixViews
● Description: A Unix/X sound editor. Does waveform play/record, and cut/splice. Has various
filters, handles native file formats, FFT, LPC and more
● Availability: by anonymous ftp including SunOS 4 and IRIX 5 binaries.
ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews
❍ Filtering
❍ Distortion
❍ Signal routing
● Availability:
Full documentation on the WWW:
http://www.music.helsinki.fi/research/spkit/documentation/SPKit.html
Software distribution:
http://www.music.helsinki.fi/research/spkit/distribution/spkit.tar.Z
● Contact: Kai Lassfolk
University of Helsinki Music Research Laboratory
Email: spkit@elisir.helsinki.fi
TCPplay
● Description: TCPPlay lets you use your mac as an audio server for your Unix box. Provided
with source code. Written by Bill Stafford, Rich Tsoi and Malcolm Slaney.
● Availability: Anonymous ftp from
ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx
ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx
Auditory Modeller 1
● Description: John Holdsworth's implementation of a gammatone filter bank and Roy
Patterson's spiral model, in C (with X-window display).
● Availability: By anonymous ftp from
ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
Auditory Modeller 2
● Description:Lowel O'Mard's implementation of peripheral filtering, Ray Meddis's hair cell
model and other stuff in C (as a library of routines).
● Availability: By anonymous ftp from
ftp://suna.lut.ac.uk/public/hulpo/lutear
❍ Spectrogram
❍ AuditoryToolbox.psc.Z
❍ AuditoryToolbox.sea.hqx
❍ AuditoryToolbox.tar
❍ AuditoryToolbox.tar.Z
The ".mif.Z" file is a Unix compressed version of the FrameMaker documentation. The
".psc.Z" file is a Unix compressed version of the Postscript documentation. The ".tar" and
".tar.Z" files are Unix TAR archives containing all of the m-functions and C-MEX source
code. Finally, the ".sea.hqx" file is a Macintosh self-extracting archive that has been encoded
using BinHex. There is precompiled version of the three MEX function for the Macintosh.
● Misc: Our lawyers ask you to remind you that there is no warranty. We've done some testing
but we undoubtably missed things.
● Contact: Malcolm Slaney, Interval Resarch.
Email: malcolm@interval.com
WWW: http://www.interval.com/~malcolm/
BEEP dictionary
● Description: Phonemic transcriptions of over 250,000 English words. (British English
pronunciations)
● Availability: By anonymous ftp:
BEEP dictionary README file
svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.7.README
BEEP Dictionary (1.1M)
svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep.tar.gz
CMU dictionary
● Description: Phonemic transcriptions of 100,000 words with American English pronunciation.
● Availability - WWW: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
● Availability - ftp: By anonymous ftp from the directory
ftp://ftp.cs.cmu.edu/project/fgdata/dict/
with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1
Homophone List
● A list of homophones in General American English is available by anonymous FTP from the
comp.speech archive site:
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homophones-1.01.txt
WordNet
● Description: WordNet is an on-line lexical reference system in which English nouns, verbs,
adjectives and adverbs are organized into synonym sets, each representing one underlying
lexical concept. Different relations link the synonym sets.
WordNet was developed in the Cognitive Science Laboratory at Princeton University under the
direction of Professor George Miller.
● Availability:
WWW Interface
http://www.cogsci.princeton.edu/~wn/w3wn.html
Source Distributions
Unix (9.1MB), PC (5.8MB), Macintosh (7.5MB), Prolog (database only, 4.2MB).
ftp://clarity.princeton.edu/pub/wordnet/
Extended interfaces developed by WordNet users (for X, Lisp etc) are listed in the WordNet
home page.
● Further information: Email: wordnet@princeton.edu
WWW: WordNet home page: http://www.cogsci.princeton.edu/~wn/
README: ftp://clarity.princeton.edu/pub/wordnet/README
Publications: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps
CMU Dictionary
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Institute of Phonetic Sciences, Amsterdam
Electronic dictionaries, including French, Norwegian Swahili and English.
http://fonsg3.let.uva.nl/Other_pages.html
1913 Webster's Revised Unabridged Dictionary
Available as a searchable HTML form at the University of Chicago ARTFL project site, and as
a tagged working file and downloadable version (45MB) of the HTML at Project Gutenberg.
Martin Ramsch's Englisch-Worterbucher aller Art
Lists of on-line dictionaries, translation dictionaries, technical dictionaries, etc.
http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/englisch.html
Galaxy's list of dictionaries etc.
A comprehensive list of dictionaries, acronym lists, translation resources, and a Thesaurus.
http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-Information/Dictionaries-
etc.html
Webster's dictionary online
http://c.gp.cs.cmu.edu:5103/prog/webster
❍ Gopher: gopher://gopher.sil.org/11/gopher_root/computing/software/fonts/
Also available through the SIL email server. Send either of the following commands to
MAILSERV@sil.org.
Windows:
SEND/MODE=BLOCK/ENCODING=UUENCODE
[FTP.FONTS.WIN]SILIP12A.EXE
Mac:
SEND [FTP.FONTS.MAC]SILIPA12.SEA_HQX
Finally, they are available on diskette from the address below. $US5 to cover the cost of
shipping.
● Contact: International Academic Bookstore
Summer Institute of Linguistics
7500 W. Camp Wisdom Road, Dallas, TX 75236 U.S.A.
Ph: 214-709-2404, Fax: 214-709-2433
e-mail: academic.books@sil.org
WWW: http://www.sil.org/
The vOICe
● Description: Peter Meijer's Java applet/application for sound analysis and synthesis.
❍ Platform: All (where Java VM available)
❍ Image sonification
additional references.
● Paris, Ce'cile L.; Swartout, William R.; Mann, William C.: Natural Language Generation in
Artificial Intelligence and Computational Linguistics. Boston: Kluwer Academic Publishers,
1991.
❍ The book describes the most current research developments in natural language
generation and all aspects of the generation process are discussed. The book is
comprised of three sections: one on text planning, one on lexical choice, and one on
grammar.
● Readings in Natural Language Processing, ed by B. Grosz, K. Sparck Jones and B. Webber,
Morgan Kaufmann, 1986
❍ A collection of classic papers on Natural Language Processing. Fairly complete at the
time the book came out (1986) but now seriously out of date. Still useful for ATN's, etc.
● Klaus K. Obermeier, Natural Language Processing Technologies in Artificial Intelligence: The
Science and Industry Perspective, Ellis Horwood Ltd, John Wiley & Sons, Chichester,
England, 1989.
Journals
The major journals of the field are
● Computational Linguistics and Cognitive Science for the artificial intelligence aspects,
● Cognition for the psychological aspects,
● Language and Linguistics and Philosophy and Linguistic Inquiry for the linguistic aspects.
● Artificial Intelligence occasionally has papers on natural language processing.
Conferences
The major NLP conferences are
Most AI conferences have a NLP track; AAAI, ECAI, IJCAI and the Cognitive Science Society
conferences usually interesting for NLP. CUNY is an important psycholinguistic conference. Other
conferences include NELS, the conference of the Chicago Linguistic Society (CLS), WCCFL, LSA,
the Amsterdam Colloquium, and SALT.
❍ semantic and pragmatic analyzer, such as NLL (University of the Saarland, Germany)
Laboratory)
❍ applications programs (misc.)
● If you have developed a piece of software for natural language processing that other
researchers might find useful, you can include it by returning the questionnaire available from
the sources below.
● ftp://ftp.dfki.uni-sb.de/pub/registry
● e-mail: registry@dfki.uni-sb.de
● Natural Language Software Registry
Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
Stuhlsatzenhausweg 3
D-66123 Saarbruecken
Germany
● Other ftp sites are
ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry
ftp://ftp.cs.jhu.edu/pub/brill/
The FAQ is not meant to discuss any topic exhaustively. It will hopefully provide readers with
pointers on where to find useful information, especially material available on the Internet.
If you have not already read the Usenet introductory material posted to news.announce.newusers,
please do. For help with FTP (file transfer protocol) look for a regular posting of anonymous FTP
FAQ in comp.misc, comp.archives.admin or news.answers.
● Australia: http://www.speech.su.oz.au/comp.speech/
● Britain: http://svr-www.eng.cam.ac.uk/comp.speech/
● Japan: http://www.itl.atr.co.jp/comp.speech/
● USA: http://www.speech.cs.cmu.edu/comp.speech/
● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete
● ftp://rtfm.mit.edu/pub/usenet/comp.speech/*
Or by sending email to mail-server@rtfm.mit.edu with the following line in the body of the message:
● send usenet/news.answers/comp-speech-faq/*
If you only have email access to the internet, then I suggest you obtain the Internet-by-email guide.
Send email to mail-server@rtfm.mit.edu with the following line in the body of the message:
● send usenet/news.answers/internet-services/access-via-email
Admin
Minor changes each month. Thanks to all the companies and individuals who send in information.
Acknowledgements
Hundreds of people and companies have made contributions to the comp.speech FAQ over the last
few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo
who have provided a wide range of information and assistance. Tony Robinson also maintains the
comp.speech ftp site which is an excellent resource for all people working with speech technology. I
am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for
supporting the FAQ on their WWW sites.
Disclaimer
The comp.speech FAQ and WWW pages are provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the information presented
here, the author assumes no responsibility for errors or omissions, or for damages resulting from the
use of the information contained herein.
The comp.speech FAQ and WWW pages should not be construed as representing the views or
products of my employer, Sun Microsystems, Inc.
Maintainer
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by
Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
The FAQ is posted every 4 weeks to comp.speech, comp.answers & news.answers. This reminder is
posted weekly to comp.speech.
The best way to read the comp.speech FAQ in on the World Wide Web:
● Australia: http://www.speech.su.oz.au/comp.speech/
● UK: http://svr-www.eng.cam.ac.uk/comp.speech/
● Japan: http://www.itl.atr.co.jp/comp.speech/
● USA: http://www.speech.cs.cmu.edu/comp.speech/
● ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete
● ftp://rtfm.mit.edu/pub/usenet/comp.speech/*
Or by sending email to mail-server@rtfm.mit.edu with the following line in the body of the message:
● send usenet/news.answers/comp-speech-faq/*
The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by
Andrew Hunt
Speech Applications Group, Sun Microsystems Laboratories
Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
Ph: (508) 442 2681
andrew.hunt@east.sun.com