Page 525
Index
A
Abbreviations, pronunciation of, 142-143
Acoustic
interactions, 122
inventory elements, 126
models/modeling, 26, 36, 85, 95, 117, 122, 182-183, 476
Kratzenstein's resonators, 78, 80
speech recognition, 64, 182-183
speech synthesis, 137
terminal analog synthesizer, 117
Advanced Research Projects Agency. See also Airline Travel Information System
Benchmark Evaluation summaries, 224-225
common speech corpora, 181-182
continuous speech recognition program, 175-176, 181-182
Human Language Technology Program, 108
research funding, 349
Speech and Natural Language Workshop, 359
Speech Language Understanding Program, 262-263
Spoken Language Systems program, 218-219, 220, 230, 232-233, 250, 254, 255-256, 262-263, 265, 405
Resource Management corpus, 181-182, 184, 185, 188, 376, 377
Wall Street Journal corpus, 184-185, 186, 187
Air traffic control, 365-366
Airline Travel Information System (ATIS), 376
context-dependent utterances, 61
corpus, 61, 184-185, 234, 219, 250, 256, 257-258, 491
degree of difficulty, 383-385
human performance on, 162
Page 526
Airline Travel Information System (cont'd) language understanding methods, 258, 268
N-best filtering in, 227
and Online Airline Guide, 219
order in problem solving, 229
overview of, 46
spontaneous input, 234
template-based approach, 259
understanding errors, 262
Algorithms
ambiguity-handling, 56
assessment of, 391-392, 405, 409-412
Baum-Welch training (forward-backward), 178-179, 199, 202-207, 489
beam search, 202, 210, 212, 214
databases, 405-409
inside/outside, 263-264, 489-490, 491
intonation contours, 45
large vocabularies, 307
nonlinear interpolation, 97
part-of-speech assignment, 143
probabilistic parsing, 56
prosodic phrase generation, 146, 147, 151
reference resolution, 57
robustness, 391, 392, 405, 412-416
search, 180-181, 189, 199, 202, 208, 209, 264-265
speech recognition, 28, 409-411, 412, 417-418, 431, 468, 469
speech synthesis, 468
standardization, 7
text-to-speech, 25
Viterbi, 173, 180, 199, 202, 208-209, 210, 213
voice coding, 7
Allophone models, 182
American Automobile Association, 354
Analog-to-digital converter, 22-23, 189, 350
Analysis-by-synthesis systems
articulatory data, 125
and automatic learning, 127
bit-rate reduction and, 23
and "break index "data, 148-149
defined, 118
linear predictive coding, 24, 26-27, 119
PSOLA methods 119-120
source-filter technique, 119
in speech analysis, 26-27
in speech recognition, 30
text-to-speech conversion as, 136
Apple Macintosh, 52
Applications of voice communications. See Assistive technology for disabled persons
Deployment of applications
Military and government applications
Telecommunications
Telephony
air travel information systems, 46, 85-86, 162
aircraft pilots, 40, 41, 44, 45, 359, 365, 509
assessment criteria, 409-410
automatic teller machines, 86
computer-aided instruction, 151
databases for, 406-408
baggage handlers, 40
consumer electronics
development environment, 400-401
driving instructions, 354
economic impact of, 280
expectations for, 505-506
foreign language learning, 44
Page 527
Applications of voice communications (cont'd)
hands/eyes-busy tasks and, 39-41
in information society, 506-508
limited keyboard/screen option and, 41-43
medical report generation, 351
motor vehicle navigation, 44
multimodal systems, 63-64
portable computers, 43
reading lessons, 44
real-time support, 403-405
smart interfaces, 508-509
speech interface technology, 347-356
success factors, 289-290
security, 7
speech recognition, 28-29, 30-32, 81, 275-282, 283-284, 318, 377-379, 451, 457, 458, 471, 508-510
stock quotation service, 283, 292, 293, 299, 354, 437, 438, 439
technology trends, 399-405
text-to-speech synthesis, 43, 109, 280, 282, 302, 354, 451
user-friendly, 508-510
video/audio conferencing system, 99-100
and VLSI technology, 40, 54, 510-511
voice input, 39-44
voice output, 44-45
wire installers, 40
Army. See also Military and government applications
Avionic Research and Development Activity (AVRADA), 362
Communications and Electronics Command (CECOM) program, 361-362
Articulatory models, 88, 95, 117, 118, 120, 122, 124-125, 152-153, 461-463, 476
Artificial intelligence, 484
Artificial neural networks, 2, 21, 124, 190, 191-193, 381, 479
Assembler language, 399-400, 401
Assistive technology for disabled persons
assistive listening devices, 315-316
augmentative and alternative communication, 130, 335-337
carpal tunnel syndrome, 43
categories of sensory aids, 316
cochlear implants, 314, 328-331, 332-333
computer-assisted instruction, 336
deaf-blind, 327
direct stimulation of auditory system, 328-331
dysarthric speech, 337
extracochlear implant, 329-330
eyeglass speechreader, 320-322
hearing aids and assistive listening devices, 278, 311, 312, 315-318, 328-331,332
hearing impaired, 43, 292, 302-304, 312, 314-333
limitations of, 318
mobility control, 312
noise reduction, 331-333
reading machines for blind, 349
research and development efforts, 313-314
sound/speech spectrograph, 319, 325, 349
with speech/language disabilities, 311, 313, 325
speech recognition, 275-279
speech processing for sightless people, 279, 313, 329, 333-335, 349
speechreading cues, 320-321, 325, 327, 328
tactile sensory aids, 314, 324-328
talking books, 333
Telephone Relay Services (TRS),292, 302-303, 322
teletypewriters, 323
Page 528
Assistive technology for disabled persons (cont'd)
Terminal Device for the Deaf (TDD), 302, 314, 322
use trends, 312-313
visible speech translator, 319-320
visual sensory aids, 319-324
voice control, 278-279, 313, 337
voice output devices, 334, 336
ATR International, 9, 10, 42, 83, 108-109, 115, 119, 128, 130, 176, 513
AT&T
800 Speech Recognition service, 298
articulatory models, 124-125
Directory Assistance Call Completion, 292, 301-302
control of network fraud, 291
government funding, 349
hidden Markov models, 175
Hobbit chip, 511-512
HuMaNet, 454
operator services automation, 291,292, 293
packet data network [XUNET], 99
speech synthesis technology, 107-108, 112, 124-125
spoken language translation, 9, 10, 130
Telephone Relay Services (TRS), 292, 302-303
telephone speech database, 407
Terminal Device for the Deaf (TDD), 302
text-to-speech system, 348-349
voice dialing system, 300, 383-385
Voice English-Spanish Translation (VEST), 10
Voice Interactive Phone, 292, 300-301
voice processing vision, 285-286
Voice Prompter, 292
Voice Recognition Call
Processing (VRCP), 292, 293-295, 383-385
voice response systems market, 281, 282
Who's Calling service, 282
wordspotting techniques, 305
Auditory modeling, 24, 26, 91, 92, 94, 97
B
Bandwidth compression, 81
Basilar membrane filtering, 97
Bell, Alexander Graham, 77-78
Bell Atlantic, 291
Bell Mobility (BM), 302
Bell Northern Research (BNR), 176, 282, 283, 292, 293, 294, 295, 299, 383-386, 437, 438, 439
Bell System, 6
Bellcore, 291-293
Bigram models, 201, 209, 211, 213, 214, 222
Bit rates
and image processing, 101
speech coding and, 23, 24, 81, 83-84
text-to-speech synthesis, 29, 77
Bolt, Beranek, and Newman (BBN) Systems and Technologies
Delphi system, 259
directory service, 438
hidden Markov models, 175
N-best filtering and rescoring, 267
word lattice parsing, 265
C
C cross compiler, 399-400
Cambridge University, 176
Carnegie-Mellon University (CMU).
See also Airline Travel Information System ATIS, 46, 261
dialogue state information, 229
Page 529
Carnegie-Mellon University (cont'd)
HMM applications in speech recognition, 175
recursive transition networks, 222
spoken language translation, 9
Cepstrum techniques, 28, 86, 178, 182-183, 476
Classification and decision tree techniques, 152
Classification and regression tree techniques, 147
CNET, 130
COCOSDA group, 130
Coding. See Linear Predictive coding; Music coding; Speech coding
Compact disc technology, 334
Compression
bandwidth, 81
image, 99
two-channel amplification, 332
Computation
models of language, 78, 81, 86, 90-91
of pronunciation, 139
research needs, 30
speech recognition systems, 30
speech synthesis, 137
teraflop capability, 97
Viterbi algorithm, 173
Computer Search and Language, 2-3
Consonents
alveolar flapped, 142
modeling, 123
Consortium for Lexical Research, 241
Context-oriented clustering, 126
Corpora
Airline Travel Information Service, 61, 184-185, 219, 250, 256, 257-258, 491
common speech, 181-182
connected digit, 184-185
IBM/Lancaster Treebank, 495
large linguistic, 447
Resource Management, 181-182, 184, 185, 188, 376, 377
optimization, 113
telephone speech, 408-409
Wall Street Journal, 184-185, 186, 187
Creak (vocal), 122
CRIM, 176
Cross-word effects, 182
CSELT, 176
CSTR, 130
Currency, pronunciation of, 143
D
Databases. See also Corpora
algorithms, 405-409
for applications, 406-408
dialect considerations, 409
large tagged, 152
natural language interfaces, 240
NTIMIT, 409
Official Airline Guide, 46, 219
relational, 53-54
for research, 405-406
remote access to, 42, 44, 278, 296-299, 348, 349, 351
retrieval system, product quality, 57
simulated telephone lines, 408-409
speech, 387, 405, 407, 468, 472
StockTalk, 383-386, 437, 438, 439
WordNet, 499
DEC, 130
Decision criteria, 305
Page 530
Defense Advanced Research Projects Agency. See Advanced Research Projects Agency
Deployment of applications
degree of difficulty and, 375-386
hardware considerations, 381, 382-383
language understanding task dimensions and, 379-381
military technology transfer, 367-369
obstacles to, 374-375
procedure for, 386-388
speech recognition task dimension and, 377-379
speech synthesis task dimensions and, 381-382
system integration requirements, 383
technical challenges in, 280-281
Desert Storm, 360
Dialogue
clarification/confirmation, 56, 62-63
continuous speech, 431-432
convergence of styles, 60
conversational dynamics, 431-432
engineering constraints, 387-388
feedback and confirmation, 437-438
finite state transition network, 63, 85
flow, 435-436
interaction and, 61-63
models, 62-63
natural language, 17, 56, 61-63
quantity of text and, 381
real-time processing function, 403-404
robustness of, 66
speech recognition, 63
spoken language systems, 47, 60, 61-63, 66, 229
talk-over, 431
task-specific voice control, 452
transcript, 433-434
Dictation devices, automatic, 50, 77, 81, 426, 428, 437-438. See also Text, typewriters
Digital
encryption, 83
filtering, 19
telephone answering machines, 7-8
Digital computers. See also Digital signal processors
and speech signal processing, 19, 78, 81, 189, 393-396
and microelectronics, 19-21, 81
Digital-to-analog converter, 23, 398
Digital signal processors/processing
development environment, 399-405
distributed control of, 404-405
integer, 383
for LSP synthesis, 398
mechanisms, 393
microphone arrays, 97
technology status, 393-396
transputer architecture, 396, 397
workstation requirements, 189
Digitizing pens, 52
Diplophonia, 122
Discourse
natural language processing, 246
and prosodic marking, 149-151
in spoken language systems, 227-230
in text-to-speech systems, 145
Dragon Systems, Inc., 176, 380, 401, 402
Dynamic grammar networks, 265-266
Dynamic time warping (DTW), 28
Page 531
E
Electronic mail (e-mail), 8, 306, 381
ESPRIT/Polyglot project, 123, 129, 130, 406
Etymology
proper name estimates, 92
trigram statistics, 141
Experiments
capabilities, 32
real-time, 32
research cycle, 183-184
Extralinguistic sounds, 122
F
Fast Fourier Transform (FFT), 28, 84, 475
FAX machines, 5
Feature
extraction, 177-178
delta, 182-183
vectors, 182-183
Federal Aviation Administration, 365-366, 509
Federal Bureau of Investigation, 367
Fiber optics, 6
Filters/filtering
basilar membrane, 97
digital, 19
high-pass, 332
language understanding component for, 22
linear time-varying, 477-478
transverse, 415
Flex-Word, 292
Fluid dynamics, principles in speech production, 87-90
Foreign language. See also Multilingual systems; Spoken language translation
learning, 44
word incorporation in text-to-speech systems, 138
Frequency-domain representation, 24, 476
G
Gestural inputs, 65
Government. See Military and government applications
Grammars
ambiguity, 380
bigram, 179
combinatory categorical, 490
context-free, 264, 461, 490, 491-494
covering, 493
dynamic grammar networks, 265-266
features-value structures in, 264
formalisms, 490
hand-coded linguistic, 483
lexicalized, 490
lexicalized tree-adjoining, 490
Markov, 179-180
natural language understanding and, 37-38, 264, 380, 491-494
perplexity, 180, 185, 229, 378
probabilistic context-free, 491-494
size, 37-38
speech analysis and, 28, 36-38
speech recognition, 36-37, 41-42, 63, 81, 85-86, 179-180, 185-186, 265-66
training speech, 179-180, 185-186
unification, 461
Graphical user-interface. See also User interfaces
Page 532
Graphical user-interface (cont'd)
growth of, 108
guidelines, 66-67
hierarchical menu structure, 54
speech compared with, 54-55
strengths, 52-53
H
Handwriting
recognition, 402-403
screen-based channel, 64
Hardware technology. See also Digital signal processors; Microcomputers; Personal computers; Workstations
advances in, 391
Hobbit chip, 511-512
Intel x86 series, 392
microprocessors, 383, 391, 392-393, 396
Motorola 68000 series, 392
speech-processing equipment and systems, 383, 396-405, 510-511
V810 multimedia processing chip, 511-512
Health Interview Survey on Assistive Devices, 312
Hidden Markov models (HMM)
defined, 171-173
estimation of statistical parameters of, 199, 202-208
feature extraction, 177-178
fenonic case, 207
grammar-state-transition table, 266
limitations of, 189-190
and mel-frequency cepstral coefficients, 178
neural nets combined with, 193-194
part-of-speech tagging, 487-488, 490
phonetic, 166, 173-175, 178-179, 182, 188
and semantics, 221
speaker recognition systems, 30, 85
speech recognition, 28, 30, 85, 170-175, 177-178, 199, 200-208, 377, 394, 396, 397, 478-479
speech variability and, 28, 415-416
and talker verification, 86
three-state, 172
theory development, 175
training and analysis, 30, 178-179, 181-182, 478-479
trellis representation, 203, 208, 212
trigram, 201-202, 212, 213-214
unigram, 210
Viterbi algorithm and, 210
word models, 179
wordspotting, 397
Human-human communication
conversational dynamics, 431-432
language imitation, 60
repair rates, 260
studies, 50-51
I
Image compression, 99
Information processing
speech technologies, 453
Information retrieval, 54-55, 57
INFOVOX, 130
Institute for Defense Analyses, 175, 234-235
Institute for Perception Research, 127
INTELLECT, 57
Integrated Services Digital Network (ISDN), 84
Page 533
Interaction. See also User interfaces
acoustic, 122
and dialogue, 61-63
failures, cost of, 426-427
large-vocabulary conversational, 101-102
speech recognition, 36
spoken language systems, 51-57, 60, 61-63
system requirements for voice communications applications, 383
Intonation
contours, 45
cues, 432
parts-of-speech distinctions and, 151
structures, 129
models, 127
J
Joysticks, 52
K
Karlsruhe University, 83
Kratzenstein's acoustic resonators, 78, 80
Kurzweil Applied Intelligence, 380
L
Language
acquisition, theory of, 2
imitation, 60
processing, 239; see also Natural language processing
variability, 380
Language modeling. See also Natural language
bigram, 201, 209, 211, 213, 214, 222, 461
computational, 78, 81, 86, 90-91
etymology estimates for proper names, 92
future of, 307
speech recognition, 29, 81-82, 90-91, 168-169, 183, 263, 307
speech synthesis, 128
statistical, 263-264, 461, 472-473
trigram, 92, 183, 209-210, 212, 213-214, 461
by users, 60
Laryngalization, 122
Law enforcement, 367
Lexicons, 138, 140, 141-142, 178-179, 188, 296, 499
LIMSI, 176
Linear predictive coding
analysis by synthesis, 24, 26-27, 119
mapping code book, 128
code-excited (CELP), 24, 26, 83, 101
mixed-excitation (MELP), 24
multipulse excited (MPLPC), 24, 26
pitch-excited, 24
robustness of, 97
AND SPEECH ANALYSIS, 575
Linguistic analysis, 59-60, 259, 263, 382, 461,484
Linguistic Data Consortium, 181, 241, 252
Linguistics. See also Parsing; Semantics; Syntax
after-thoughts, 256
consonent cluster, 138
discourse-level effects, 149-151
English lexical stress system, 141-142
letter-to-sound relationships, 138, 140-141
morphonemics, 141-142
orthographic conventions, 142-143
Page 534
Linguistics (cont'd)
parts-of-speech assignment, 143, 151
prosodic marking, 145-149
spontaneous speech, 255-258
vocalic suffixes, 139
word-level analysis, 138-139
M
Machine translation, 240
Masking, time and frequency, 84, 93-94, 177-178
Massachusetts Institute of Technology
articulatory models, 124
HHMs for speech recognition, 176
MITalk, 123
multilingual synthesis, 130
speech synthesis, 111, 123, 124
TINA language understanding system, 222, 223, 259
Matsushita, 130
MCI, 300
Mel-frequency cepstral coefficients (MFCC), 178, 182-183
Microcomputers. See also Personal computers
device density, 20
digital signal processing, 19
projected advances in, 102-103
speech processing and, 19-20, 81,396-399
Microelectronics
chip densities, 102
digital computation and, 19-21
research, 21
revolution, 108
speech signal processing, 19-20
Microphones
autodirective arrays, 86-89, 96, 97, 99-100, 102
beamforming systems, 87, 88, 99
characteristics, 414
digital signal processors, 97
environmental variation in speech input, 412-413, 460
noise reduction, 331-332, 414-415
reflection and reverberation, 414
speaker distance from, 414
and speech recognition, 379, 414
technology projections, 102
three-dimensional, 96, 97, 99-100
Microsoft Windows, 52
Military and government applications. See also Advanced Research Projects Agency; other government agencies
Agent's Computer, 367
air traffic control, 365-366
aircraft carrier flight deck control and information management, 363
combat team tactical training, 364-365, 366
Command and Control on the Move (C2OTM), 360-361
law enforcement, 367
Multi-Role Fighter, 365
Navy, 363-365
Pilot's Associate system, 365
Soldier's Computer, 360, 361-362, 367
SONAR supervisor command and control, 363-364
technology transfer issues, 367-369
Page 535
Mixed-mode communication. See Multimodal systems
Models/modeling. See also Hidden Markov models; Language modeling
acoustic, 26, 36, 64, 85, 95, 117, 122, 182-183, 476
allophone, 182
articulation, 88, 95, 117, 118, 120, 122, 124-125, 152-153
auditory, 24, 26, 91, 92, 94, 97
bigram, 201, 209, 211, 213, 214
computational, 78, 81, 86, 90-91
consonents, 123
cross-word effects, 182
dialogue, 62-63
intonation, 127
Klatt, 123
left-to-right, 175
natural language understanding, 238-253, 262-264
noise excitation, 122
phonetic, 173-174, 190-191, 193
prosody, 117
segmental, 125, 173-174, 190-191,193
sinusoidal, 24
sound source, 462
source/system, 22, 118, 120-122
speech perception, 26
speech production, 22
speech recognition requirements, 168-169
speech synthesis, 109, 116-130
speech variability, 176
spoken language systems, 48
stochastic segment, 190-191
trigram, 201-202
vocal tract, 95, 118, 122, 124, 125
wave propagation, 26
Modulation theory, 26
Morphology, speech synthesis, 110, 111, 112, 113, 137, 141-142, 489
Multilingual systems. See also Foreign language; Spoken language translation; Telephony
future of, 513-514
INTERTALKER, 513-514
Japanese kana-kanji preprocessor, 403
MITalk, 130
PIVOT, 512-513
speech synthesis, 42, 101, 117, 129-130, 151-152
Multimodal systems. See also User interfaces
advantages of, 426
error avoidance, 64
error correction, 64
HuMaNet, 454
referent determination difficulties, 61
robustness, 64
situational and user variation, 64-65
synergistic integration of sensory modalities, 100-101, 102
user interfaces, 32, 56, 63-65
Multiprocessing, 21
Music coding, 84
N
N-Best interface, 217, 221, 226, 233
National Institute of Standards and Technology, 377
Natural language. See also Speech recognition; Spoken language
anaphora, 55
Page 536
Natural language (cont'd)
modeling, 128; see also Language modeling
and speech recognition systems, 17, 262-267, 388
and spoken-language systems, 59-61
typed, 57
Natural language processing
ambiguity-handling algorithms, 56
applications, 240, 241, 250-253
clarification/confirmation subdialog, 56, 62-63
components of, 243-250
constraints on, 17, 59-61, 262-268, 388, 482-484, 491
cooperating process view of, 248-250
database interfaces, 240
domain model extraction, 250
history of, 240-241
ideal systems, 55-56
inputs to, 241-243
integration architecture, 265-267
machine translation, 240
menu-based system, 56
outputs, 243
parsers, 59, 247, 483, 489-495
portability of systems, 252
problems, 241-243
product quality database retrieval system, 57
prosodic information in, 268-269
reasoning, 246-247
reference resolution algorithms, 57
research directions, 460-461, 500-502
response planning and generation, 246-247
rule-based, 482-484
semantics, 245-246, 247, 250, 486, 495-500
simplified systems, 247-248
speech processing and, 460-461
state of the art, 252-253
statistical techniques, 484
training [learning], 56, 57, 58, 249, 250, 252
verbal repair detection, 269
weaknesses, 56-57
Natural language understanding.
See also Linguistics; Speech recognition; Spoken language understanding
accuracy/error rates, 47, 251, 252, 255, 261, 262, 388
applications, 379-381
architecture, 485-487
background, 238-239
current capabilities, 10, 506
defined, 239
grammar, 37-38, 263, 380, 491-494
language variability and, 380
off-the-subject input and, 287, 380, 388
part-of-speech tagging, 487-489
preprocessing and, 489
search process, 248-249
speech constraints in, 268-269
stochastic parsing, 489-495
task difficulty and, 379-381
TINA system, 222
vocabulary size and, 37-38
unknown words, 488-489
Naval. See also Military and government applications
Air Technical Training Center (Orlando), 363-364
Combat Team Tactical training, 366
Ocean Systems, 363
Personnel Research and Development Center, 364-365
Page 537
Naval (cont'd)
Resource Management task, 376
Underwater Systems Center, 363
Navier-Stokes equation, 89
Neural nets. See Artificial neural networks
Neural transduction, 97
New Mexico State University, 241
Nippon Electric Corporation (NEC), 9, 10, 42, 82, 176, 383, 506, 507-511
Nippon Telephone and Telegraph (NTT)
analysis-synthesis systems, 119
ANSER (Automatic Answer Network System for Electrical Requests), 283, 291, 292, 296-297, 398-399, 407-409, 410, 417
concatenative synthesis, 126
HMM applications, 176
systematic optimization techniques, 115
telephone speech database, 407
Noise
additive, 459
and algorithm robustness, 413
excitation, 122
immunity, 305
reduction technology, 331-333, 414-415
sources, 122
and speaker variation, 415-416
and speech recognition, 288, 305, 379, 388, 414-415, 469, 473-474
white, 122
Northern Telecom, 278, 291, 295, 299
Numbers, pronunciation of, 143, 288
NYNEX, 282, 283, 291, 292, 300, 301-302, 407, 409, 436
0
Occam parallel programming language, 396
Octel, 281
Official Airline Guide database, 46, 219
Olive, Joseph, 107
Operating systems
speech, 417
Optical character recognition technology, 43, 349
Oregon Graduate Institute, 407
P
Packet data network (XU-NET), 99
Paget, Richard, 15-16
Palantype keyboard, 335
Parallel processing, 89, 383, 400
Parsing/parsers
ambiguous, 147-148
crossing brackets, 491
natural language, 59, 247, 483, 489-495
probabilistic, 56
and prosodic marking, 56, 144, 146-147
in speech synthesis, 137, 139, 144-145
stochastic, 489-495
of unrestricted text, 144
word lattice, 265
Pause insertion strategies, 129
Performance structures, 146
Personal Communication Devices, 306
Personal Communication Networks, 306
Personal Communication Services, 306
Personal computers
hand-held, 355
portable, 64-65
speech interfaces for, 511
Page 538
Phoneme
conversion to acoustic events, 429
intelligibility, 411
recognition systems, 182
Phonetics
hidden Markov models, 166, 173-175, 178-179, 182, 188
segmental models, 125, 173-174, 190-191,193
and speech recognition, 167, 169-170, 188
in training speech, 30, 178-179, 182-183
text-to-speech synthesis, 85, 125, 174
typewriter, 511
Pierce, John, 283
Pitch-synchronous overlap-add approach (PSOLA), 114, 119-120, 128-129
Pitch-synchronous analysis, 127
Pragmatic structure, 144, 246, 150, 246, 250
Pronunciation
abbreviations and symbols, 142-143
computational, 139
numbers and currency, 143, 288
part of speech and, 144
speech recognition, 44
surnames/proper names, 140-141, 288
symbols, 142-143
Proper names, 92, 288, 458, 484
Prosodic phenomena
articulation as a basis for, 152-153
and conversational dynamics, 431
discourse-level effects, 149-151
modeling, 117
multiword compounds, 147
in natural language processing, 268-269
pauses, 431
PSOLA technique for modifying, 128-129
in speech synthesis, 88, 117, 119, 124-125, 128-129, 145-149, 288-289
and speech quality, 88, 118, 288-289
Psychoacoustic behavior, 78, 91, 94
Pulse Code Modulation, adaptive differential (ADPCM), 24, 82-83, 101
Q
Quasi-frequency analysis, 177
Query language, artificial, 57
R
Recursive transition networks (RTNs), 222
Repeaters, electromechanical, 81
Research methodology, spoken language vs. types language, 47-48
Robust processing techniques, 259-260, 263
Robustness
algorithms, 391, 392, 405, 412-416
ATIS system, 262
case frames and, 258
classification of factors in, 412-413
dialogue systems, 66
environmental variation in speech input and, 412-414
lexical stress system, 142
Page 539
Robustness (cont'd)
linear predictive coding, 97
multimodal systems, 64
natural language systems, 56, 59, 262
noise considerations, 413
research, 417-418
speaker variation and, 415-416
speech analysis, 97
speech recognition systems, 29-30, 44, 184, 261-262, 459-460
speech synthesis, 139
speech variation and, 413
spoken-language understanding systems, 66, 258-259
templates and, 258-259
user interfaces, 56
word error rates, 182-183, 184, 185-186
Royal Institute of Technology (KTH), 122, 123, 124, 125, 129
Rutgers University, CAIP Center, 98, 99
S
Security applications
seaker verification, 9, 30, 86, 300, 305
low bit-rate coding for transmission, 7
Semantics
ambiguity, 380
compositional, 486
First-Order Predicate Logic, 245-246
natural language, 245-246, 247, 250
pragmatics and, 144
propositional logic, 245
and speech recognition, 305-306
and spoken-language understanding, 220-221
Sensimetrics Corporation, 123
Signal modeling techniques, 19, 101
Signal processing
enhancement, 102
research, 21
Sinusoidal models, 24
Software technologies, 391
Sound
source model, 462
Sound Pattern of English, 126
Sound/speech spectrograph, 319, 325, 349
Source-filter decomposition, 128
Speak 'N Spell, 110
Speaker
atypical, 187-188
dependence, 36
recognition/identification, 9, 30, 85, 348
variation, 415-416
verification, 9, 30, 86, 300, 305
Speaking characteristics and styles, 128-129, 378-379
Spectrum analysis, 19
Speech
behaviors, conversational, 430-432
casual informal conversational, 82
connected, 97
continuous, 36, 78, 95, 323, 427-428, 430-431
dialect, 409
dysarthric speech, 337
gender differences, 129
information processing technologies, 453
interactive, 36
knowledge about, 117
machine-generated, 335
Page 540
Speech (cont'd)
noninteractive, 48
pause insertion strategies, 129
perception models, 26
preprocessor, 403
production, 21-22, 26, 77, 87-90, 137-138
prolongation of sounds, 322
psychological and physiological research, 462
signal processing systems, 19
slips of the tongue, 257
spontaneous, 58-59, 185, 255-260, 303, 460, 461, 469-471
standard model of, 267
synthetic, 428-428; see also Speech synthesis; Speech synthesizers
toll quality, 23, 24
type, 36
ungrammatical, 257
variability, 28, 176, 378, 413, 459-460, 480
Speech analysis
acoustic modeling, 26
analysis-by-synthesis method, 26-27
auditory modeling, 26
defined, 22
dimensions, 36-38
importance, 21
interactivity, 36
language modeling, 26
linear predictive coding, 24
robustness, 97
speech continuity, 36
speech type, 36
vocabulary and grammar, 28, 36-38
vocal tract representation in, 90, 91
Speech coding, 26
applications, 82-83
articulatory-model-based, 125
audio perception factors in, 84, 85
in cochlear implants, 331
concatenation using speech waveforms, 117
bit rates and, 23, 24, 81, 83-84
predictive, 117
psychoacoustic factors in, 101
research challenges in, 76
rule-based diphone system, 118
stereo coding, 84-85
terminal analog, 118
wideband audio signals, 84
Speech processing
articulatory and perceptual constraints in, 461-463
equipment and systems, 19-20, 81, 396-399
evaluation methods, 463-464
in hearing aids, 317
and natural language processing, 460-461
obstacles to, 373
research challenges, 76-77
psychoacoustic behavior and, 94
for sightless people, 333-335
and speech technology development, 76, 78
Speech recognition
accuracy, 28, 37, 41, 46-47, 86, 159, 181-189, 377, 378, 470, 473
acoustic modeling, 64, 182-183
adverse conditions, 459-460
algorithms, 28, 409-411, 412, 417-418, 469
alternative models, 189-193
analysis-by-synthesis, 30
applications, 28-29, 30-32, 81, 275-282, 283-284, 318, 377-379, 451, 457, 458, 471, 508-510
Page 541
Speech recognition (cont'd)
articulation and, 152-153
assessment techniques, 410-411, 463-464
''barge in" (interruption of conversation) and, 277, 287, 292, 295, 298-299, 388, 404
common speech corpora, 181-182
complexity, 17
connected digit corpus, 184-185
continuous speech, 78, 165-194, 323, 471,506
decision criteria, 305
decoding, 209-214
dialogue grammar approach models, 63
dimensions of task difficulty, 376, 377-379
domain independent (DI), 187
dynamic grammar networks, 265-266
dynamic programming matching, 509
environmental factors, 413-414
error correction, 64, 261-262, 388
feature extraction, 177-178, 180
Flexible Vocabulary Recognition, 295
generalization, 479
Hidden Markov models and, 28, 30, 85, 170-175, 177-178, 199, 200-208, 377, 397, 478
historical overview, 175-176
improvements in performance, 181-184, 388
interactivity, 36
language modeling, 29, 81-82, 90-91,168-169, 183, 263
large-vocabulary systems, 183, 193, 277, 292, 506
lip reading, 64
linguistic rules, 82
market for technology, 350-351, 416-417
most likely path, 208-209
most likely word sequence, 209-214
N-best filtering or rescoring, 267
natural language and, 17, 262-267, 388
neural networks, 191-193
new words, 188-189
noise immunity and channel equalization, 288, 305, 379, 388, 414-415, 469, 473
normalization of speakers in, 30, 456-457, 459, 460
pattern matching, 474, 478-479
perplexity of language model and, 37, 180, 185, 229, 378, 463
phonetics and, 167, 169-170, 188, 410
processes, 167-168, 180-181, 199, 451,453-454, 473-474
pronunciation and, 44
prototype systems, 34
real-time, 189
rejection of irrelevant input, 287, 388
and repetitive stress injuries, 43
research challenges, 29-30, 44, 76, 108, 183-184, 304-306, 417-418
robustness, 29-30, 44, 184, 261-262, 459-460, 473, 474
sample performance figures, 184-185
search algorithms, 180-181, 248, 264-265
segmental models, 190-191, 473-474
sheep and goats phenomenon, 456
speaker-adaptive, 36, 187-188, 288, 388, 479
speaking characteristics and styles and, 128, 377, 378-379, 415-416, 460
Page 542
Speech recognition (cont'd)
speaker-dependent, 28, 36, 54, 186-187, 292, 509-510
speaker expertise and, 378
speaker-independent, 28, 36, 37, 46, 184, 186-187, 188, 362-363, 378, 397, 425, 433-434, 506, 507
spontaneous speech and, 58-59, 185, 460, 461, 469, 471
SR-1000 system, 507
SR-3200 system, 507
subword units, 287-288, 299, 388
successful systems, 239
system structure, 27-28, 398,401,402
talker verification, 86
task completion rate, 410
technology status, 8-9, 18, 81, 85-86, 112-113, 159-164, 165-166, 181-189, 286-288, 428,468
terminal-type, 508-510
training data, 178-180, 185-186, 457, 459, 473, 478-479
transputer-based, 397
trials, 417
units of speech and, 168-170
user tolerance of errors and, 379
vocabulary and grammar and, 36-37, 41-42, 81, 85-86, 185-186, 265-266, 277, 378, 457
Wizard of Oz assessment technique, 410-411, 439
word lattice parsing, 265
wordspotting, 286-287, 292, 295, 298-299, 305, 387, 388, 397, 404
Speech research
computational models of language, 90-91
critical directions in, 87-101
historical background, 78-82
language modeling, 26
physics of speech generation, 87-90
unification of coding, synthesis, and recognition, 94-95, 97
Speech synthesis. See also Text-to-speech synthesis
acoustic models, 85, 95, 117, 122, 476
analysis-synthesis systems, 117, 118, 119, 125
applications, 30-32, 108, 109, 110, 278, 381-382
articulatory models, 88, 117, 118, 120, 124-125, 152-153, 476, 480
assessment of, 411-412
automatic learning, 127
concatenative, 110, 114, 117, 118-119, 126, 168, 406
concept-to-speech systems, 38-39
content, 45
corpus-based optimization, 113
defined, 22, 109, 110, 116, 348
digitized speech, 22-23, 25, 38
dimensions of task difficulty, 381-382
discourse-level effects, 149-151
error rates, 112
evaluation of, 130
expectations of listeners, 382
flexibility needs, 117-118
fluid dynamics in, 89-90
formant-based terminal analog, 117, 118, 122-123, 125
forms, 38-39
frequency domain approach, 119
higher-level parameters, 123-124
history of development, 111-115
individual voices, speaking styles, and accents and, 117-118
input, 109
Page 543
Speech synthesis (cont'd)
letter-to-sound rules, 140-141
linguistic aspects of, 135-153
market for, 351
microelectronics revolution and, 108
morphophonemics and lexical stress, 110, 111, 112, 113, 137, 141-142
multilingual, 42, 101, 117, 129-130, 151-152
natural speech coding and, 117, 128
naturalness, 129, 149, 381, 429, 456
noise sources, 122
and objective distortion metrics, 114-115
obstacles to, 117
orthographic conventions, 142-143
output, 118
part-of-speech assignment, 143
phonetic HMM functions and, 174, 429
predictive coding, 117
process, 167-168, 135, 428-429, 453, 454, 479
prosody, 88, 117, 119, 124-125, 128-129, 145-149, 288-289
PSOLA (pitch-synchronous overlap-add approach), 114, 119-120, 128-129
quantity of text and, 381
real-time, 108
research, 25-26, 29-30, 44-45, 76, 108, 113-114, 128
rule-based, 111, 118, 125, 126-127, 140-145, 429
segmental, 113-114, 115, 125, 145, 479-480
sentence length and grammatical complexity, 382
sound generation, 118
source/system models, 22, 118, 120-121
speech quality, 130
structures and processes, 109-110
systematic optimization methods, 114
techniques, 118
technology status, 18, 29, 81, 85-86, 107-115, 411-412, 468
testing, 114-115
time functions, 111, 113, 118, 119, 476-478
variability of text and, 381-382
vocabulary, 119
vocal tract model, 95, 118, 122, 125
waveform concatenation (simple), 118-119, 383, 476
word-level analysis, 138-139
Speech synthesizers
acoustic terminal analog, 117
cartridge-type, 510
cascade, 122-123
future, 455-456
large-vocabulary, 349
neural network controller, 124
OVE, 123
terminal analog, 510
voice quality, 456
Speech technology, See Deployment of applications
capabilities and limitations, 427-430
commercial developments, 352-354
foundations, 77-78
growth of, 2
information processing, 453
readiness evaluation, 440
service trials, 417
status, 82-87
trends, 117
Page 544
Speech technology (cont'd)
voice input, 427-428
voice output, 428-429
Speech Technology Laboratory, 123
Speech transmission, low-bit-rate, 23, 24, 29, 77, 81, 83-84, 97, 474
Speech understanding, 17, 34, 37-38, 307, 379
Spoken language systems (SLS) ARPA, 218-220
comparison of modalities, 46-58
constraints on, 227-230
dialogue, 47, 60, 61-63, 66, 229
discourse in, 227-230
efficiency of language-based modalities, 48-51
error recovery, 439
human factors obstacles to, 58-63
interfacing speech and language, 221-224
linguistic analysis, 59-60, 259
mixed initiative, 228-229
N-best interface, 217, 221, 233
natural language, 51-57, 59-61
order in problem solving, 229
reference, 227-228
simulation methods, 66
speaker-independent, 65
spontaneous speech and, 58-59, 234, 255-260, 427-428
SUNDIAL, 228-229
research methodology, 47-48
technology development, 81
typed language contrasted with, 47-51, 60
user adaptation to, 60
Spoken language translation
current capabilities, 9-10, 42
defined, 9-10
directory assistance, 295-296
laboratory systems, 9-10
projections, 102
VEST (Voice English-Spanish Translator), 10, 42
voice output, 29
Spoken language understanding, 47
approaches to, 220-221
defined, 255
error repair, 260
limits on, 379
progress in, 224-226
spontaneous speech and, 258-260
Sprint, 300
SQL, 57
SRI International, 52, 176, 213
Stereo coding, 84-85
StockTalk, 383-386, 437, 438, 439
Stored voice, 110
SUNDIAL spoken language systems, 229
Surnames, pronunciation of, 140-141, 288
Symbols, pronunciation of, 142-143
Symbolic learning techniques, 501
Syntax, 137. See also Parsing
natural language processing system, 244-245, 247, 269
speech recognition systems, 305-306
and spoken language understanding, 220-221
Syntactico-semantic theory, 447
System technologies. See Hardware technology; Workstations
T
Tactile technology, 101, 324-328
Talker. See Speaker
Page 545
Technology transfer issues, 367-369
Telecommunications. See also Telephony
Baudot code, 323
conferencing, 101
cost-reduction applications, 290-291
digital speech coding, 82-83
information access from remote databases, 42, 44, 278, 296-299, 348, 349
interfaces, 397
market for speech technology, 290-304
personal communication networks and services, 306
predictions, 307-308
revenue opportunities in, 291-293
shaping user language, 60-61
speaker verification, 305
speech technology and, 7, 41-42, 285-286
technical challenges, 304-306
Telegraph, 80-81
Telephony. See also Telecommunications
Automated Alternate Billing Services, 292, 293, 431
Automated Customer Name and Address, 302
automatic interpreting, 513-514
bandwidth conservation, 19
banking by phone, 283, 291, 398-399, 407-408, 425
cellular, 6, 7, 81, 83, 374, 383-385, 507-508
digital channels, 101
directory assistance, 41, 278, 282, 283, 291, 292, 295-296, 301-302, 355-356, 438, 458
history, 81
language translation, 10, 42, 77, 81, 82, 83, 108-109, 513-514
operator services, 8-9, 277, 282, 284, 291, 292, 293-296, 351, 353-354, 374, 380, 383-385, 387
simulated telephone lines, 278, 408-409
speech databases, 407
speech recognition technology, 428
teleconferencing, 454-455
telephone relay service, 302-304, 322
voice-controlled automated attendant, 356
voiced-based dialers, 40, 292, 299-300, 355, 374, 376, 383-386, 436, 507-508
voice-interactive phone service, 292, 300-301,351
Voice Recognition Call Processing (VRCP), 292, 293-295, 376, 383-385
Telephone answering machines, digital, 7-8
Texas Instruments (TI), 110, 176, 184-185, 291, 300, 349, 377, 407
Text-to-speech synthesis. See also Speech analysis
acoustic phonetics and, 85
address, date, and number processing, 288
advances in, 288-289
algorithms, 25
applications, 43, 109, 280, 282, 302, 354, 451
articulatory synthesis in, 124-125
cartridge-type device, 510
components of, 38
constraints on speech production, 137-137
development tools, 126-127
discourse analysis in, 145
Page 546
Text-to-speech synthesis (cont'd)
error rate, 262
formant-based terminal analog, 122-123
hardware requirement, 383
language modeling and, 26, 78, 90-91
linguistic analysis in, 382
multilingual, 42, 129, 397-398
naturalness, 381
output, 29
parsing, 144-145
part-of-speech assignment, 143
phonemic-based, 348
phonetic factors, 125
proper name pronunciation, 288
research challenges, 26, 304, 306, 324
rule system, 125
sound generation, 124
source models and, 120
speaker identity and normalization, 30
speaking characteristics and styles and, 128-129
structural framework, 136-137, 398
waveform approach, 24-25
word-level analysis, 138-139
Text preprocessors, 381-382
Time Assignment Speech Interpolation, 81
Tools. See Computer-aided tools
Touch screens, 50
Touch-Tone keypad, 335
Trackballs, 52
Training
natural language interactive systems, 56, 57, 58
neural nets, 193
shaping user language, 60-61
speech, 322
tactical, combat team, 364-365
Training speech [learning]
automatic, 263-264
databases for, 387, 405, 407, 468, 472
discriminative, 479
natural language processing, 56, 57, 58, 249, 250, 252, 263-264
phonetic HHMs and lexicon, 30, 178-179, 182-183
speech recognition, 178-180, 185-186, 457, 459, 473, 478-479
syntactico-semantic theory and, 447
Transatlantic radio telephone, 81
Transatlantic telegraph cables, 81
Transform coders, 24
Treebank Project, 241, 491, 495
Trigrams, 92, 183, 201-202, 209-210, 212, 213-214, 229
Triphones, 182
Turing's test, 35
Tuttle, Jerry 0., 363
U
United Kingdom, Defense Research Agency, 365
University of Indiana, 130
University of Pennsylvania, 181, 241,252, 491,495
US West, 300-301
Usability/usefulness. See also Applications of voice communications
determinants of, 31-32
pronunciation and, 44
voice input, 39-44
voice output, 44-45
User interfaces. See also Graphical user-interface
artificial query language, 57
capabilities and limitations, 51-52, 387, 427-430, 434
cost of interaction failures, 426-427
Page 547
User interfaces (cont'd)
design strategies, 387, 423-424, 426, 433-440
dialogue flow, 435-436
direct manipulation, 51, 52-55, 57-58
error recovery, 438-440
evaluation of, 440
feedback and confirmation, 434, 437-438, 445
heirarchical, 454
information requirements of, 425-426
instructions, 438
keyboard dialogs, 49-50
metaphor, 54
multimodal systems, 32, 56, 63-65, 505, 508-510
natural language interaction, 55-57
personal computer, 511-512
research directions, 56, 511-512
revisions suggested, 435
robustness, 56
smart, 512-513
system capabilities, 429-430
task modalities, 426
task requirement considerations, 424-427
telecommunications, 397
training issues, 58
user expectations and expertise and, 430-432
voice-actuated, 360
voice input, 427-428
Users
conversational speech behaviors, 430-432
expectations and expertise, 430-432
language modeling by, 60
novices vs. experts, 432
satisfaction, 429-430
tolerance of speech recognition errors, 379
USS Ranger, 363
V
Vector quantiization, 28
Verbal repair, 269
Videophones, 5-6
Virtual reality technology, 454-455
Visual sensory aids, 319-324
Vocabulary
algorithms, 307
confusability, 378
conversational, 101-102
Flexible Vocabulary Recognition, 295
large, 101-102, 183, 193, 277, 292, 307, 349, 351, 506
and natural language understanding, 37-38
operator services, 277
speech analysis and, 28, 36-38
speech recognition and, 36-37, 41-42, 81, 85-86, 183, 185-186, 193, 265-266, 277, 292, 378, 457, 506
speech synthesis, 101-102, 119, 349, 351
user-specific dictionaries, 335-336
wordspotting techniques, 292, 305
Vocal tract modeling, 95, 118, 122, 124, 125
Voice
control, assistive, 278-279, 313, 337, 360, 452
conversion system, 128-129
dialog applications, 375-377
fundamental frequency, tactile display, 326-327
messaging systems, 281
mimic, 94-95
response, 25
task-specific control, 452
Page 548
Voice coding
algorithm standardization, 7
current capabilities, 7-8
defined, 7-8
research challenges, 306
security applications, 7
source models, 120-122
storage applications, 7-8
Voice communication, human-machine
art of, 387-388
current capabilities, 469
degree-of-difficulty considerations, 375-386
expectations for, 505-506
implementation issues, 18
natural language interaction, man-farm animal analogy, 16
process, 374
research and development issues, 511-513
research methodology, 47-48
role of, 34-67; see also Applications
scientific bases, 15-33
scientific research on, 65-67
successful, 423
system elements, 17-18
and task efficiency, 48-49
transcript, 433-434
voice control, 337
VSLI technology and, 510-511
Voice processing
network-based, 292
market share, 281
research, 6
technology elements, 6-7
technology status, 467-468
telecommunications industry vision, 285-286
Voice synthesis
current capabilities, 8
defined, 8
text-to-speech, 99
von Kemplen's talking machine, 78, 80
Vowel
clusters, 140
digraphs, 140
reduction, 129
W
Wave propagation, 26
Waveform coding techniques. See also Speech coding
adaptive differential PCM (ADPCM), 24
speech synthesis, 118, 119, 136, 137, 381,474
Wavelets, 21
Wideband audio signals, 84
Wizard of Oz (WOZ) assessment technique, 410-411, 439
Word-level analysis, 138-139
Word processors, speech only, 50
Word recognition systems, 182, 188
Workstations
Hewlett-Packard 735
RISC chips in, 393
Silicon Graphics Indigo R3000, 189
speech input/output operating systems, 401-403
speech processing board, 397
Wheatstone, Charles, 80
X
Xerox, 52
Z
Zipf's law, 489