National Academies Press: OpenBook
« Previous: Author Biographies
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 525

Index

A

Abbreviations, pronunciation of, 142-143

Acoustic

interactions, 122

inventory elements, 126

models/modeling, 26, 36, 85, 95, 117, 122, 182-183, 476

Kratzenstein's resonators, 78, 80

phonetics, 85, 95

speech recognition, 64, 182-183

speech synthesis, 137

terminal analog synthesizer, 117

Advanced Research Projects Agency. See also Airline Travel Information System

Benchmark Evaluation summaries, 224-225

common speech corpora, 181-182

continuous speech recognition program, 175-176, 181-182

Human Language Technology Program, 108

research funding, 349

Speech and Natural Language Workshop, 359

Speech Language Understanding Program, 262-263

Spoken Language Systems program, 218-219, 220, 230, 232-233, 250, 254, 255-256, 262-263, 265, 405

Resource Management corpus, 181-182, 184, 185, 188, 376, 377

Wall Street Journal corpus, 184-185, 186, 187

Air traffic control, 365-366

Airline Travel Information System (ATIS), 376

context-dependent utterances, 61

corpus, 61, 184-185, 234, 219, 250, 256, 257-258, 491

degree of difficulty, 383-385

error rates, 252, 486

human performance on, 162

interactive dialogue, 227, 228, 233

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 526

Airline Travel Information System (cont'd) language understanding methods, 258, 268

N-best filtering in, 227

and Online Airline Guide, 219

order in problem solving, 229

overview of, 46

recognition errors, 261, 262

spontaneous input, 234

template-based approach, 259

understanding errors, 262

Algorithms

ambiguity-handling, 56

assessment of, 391-392, 405, 409-412

Baum-Welch training (forward-backward), 178-179, 199, 202-207, 489

beam search, 202, 210, 212, 214

compression, 83, 381

databases, 405-409

inside/outside, 263-264, 489-490, 491

intonation contours, 45

large vocabularies, 307

learning, 249, 250, 263-264

nonlinear interpolation, 97

part-of-speech assignment, 143

probabilistic parsing, 56

prosodic phrase generation, 146, 147, 151

reference resolution, 57

robustness, 391, 392, 405, 412-416

search, 180-181, 189, 199, 202, 208, 209, 264-265

speech processing, 21, 393

speech recognition, 28, 409-411, 412, 417-418, 431, 468, 469

speech synthesis, 468

Stack, 202, 208

standardization, 7

text-to-speech, 25

Viterbi, 173, 180, 199, 202, 208-209, 210, 213

voice coding, 7

wordspotting, 404, 431

Allophone models, 182

American Automobile Association, 354

Ameritech, 291, 292, 293, 302

Analog-to-digital converter, 22-23, 189, 350

Analysis-by-synthesis systems

articulatory data, 125

and automatic learning, 127

bit-rate reduction and, 23

and "break index "data, 148-149

defined, 118

linear predictive coding, 24, 26-27, 119

PSOLA methods 119-120

source-filter technique, 119

in speech analysis, 26-27

in speech recognition, 30

text-to-speech conversion as, 136

Apple Macintosh, 52

Applications of voice communications. See Assistive technology for disabled persons

Deployment of applications

Military and government applications

Telecommunications

Telephony

air travel information systems, 46, 85-86, 162

aircraft pilots, 40, 41, 44, 45, 359, 365, 509

assessment criteria, 409-410

automatic teller machines, 86

computer-aided instruction, 151

databases for, 406-408

baggage handlers, 40

consumer electronics

programming, 43, 353

development environment, 400-401

driving instructions, 354

economic impact of, 280

expectations for, 505-506

force feedback glove, 98, 101

foreign language learning, 44

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 527

Applications of voice communications (cont'd)

hands/eyes-busy tasks and, 39-41

in information society, 506-508

limited keyboard/screen option and, 41-43

medical report generation, 351

motor vehicle navigation, 44

multimodal systems, 63-64

parcel sorters, 40, 509

portable computers, 43

reading lessons, 44

real-time support, 403-405

smart interfaces, 508-509

speech interface technology, 347-356

success factors, 289-290

security, 7

speech recognition, 28-29, 30-32, 81, 275-282, 283-284, 318, 377-379, 451, 457, 458, 471, 508-510

stock quotation service, 283, 292, 293, 299, 354, 437, 438, 439

technology trends, 399-405

text-to-speech synthesis, 43, 109, 280, 282, 302, 354, 451

user-friendly, 508-510

video/audio conferencing system, 99-100

and VLSI technology, 40, 54, 510-511

voice input, 39-44

voice output, 44-45

wire installers, 40

Army. See also Military and government applications

Avionic Research and Development Activity (AVRADA), 362

Communications and Electronics Command (CECOM) program, 361-362

Articulatory models, 88, 95, 117, 118, 120, 122, 124-125, 152-153, 461-463, 476

Artificial intelligence, 484

Artificial neural networks, 2, 21, 124, 190, 191-193, 381, 479

Assembler language, 399-400, 401

Assistive technology for disabled persons

assistive listening devices, 315-316

augmentative and alternative communication, 130, 335-337

captioning, 314-315, 322-323

carpal tunnel syndrome, 43

categories of sensory aids, 316

cochlear implants, 314, 328-331, 332-333

computer-assisted instruction, 336

deaf-blind, 327

direct stimulation of auditory system, 328-331

dysarthric speech, 337

extracochlear implant, 329-330

eyeglass speechreader, 320-322

hearing aids and assistive listening devices, 278, 311, 312, 315-318, 328-331,332

hearing impaired, 43, 292, 302-304, 312, 314-333

limitations of, 318

mobility control, 312

noise reduction, 331-333

reading machines for blind, 349

research and development efforts, 313-314

sound/speech spectrograph, 319, 325, 349

with speech/language disabilities, 311, 313, 325

speech recognition, 275-279

speech processing for sightless people, 279, 313, 329, 333-335, 349

speechreading cues, 320-321, 325, 327, 328

tactile sensory aids, 314, 324-328

talking books, 333

Telephone Relay Services (TRS),292, 302-303, 322

teletypewriters, 323

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 528

Assistive technology for disabled persons (cont'd)

Terminal Device for the Deaf (TDD), 302, 314, 322

text telephone, 322, 323, 324

use trends, 312-313

visible speech translator, 319-320

visual sensory aids, 319-324

voice control, 278-279, 313, 337

voice output devices, 334, 336

ATR International, 9, 10, 42, 83, 108-109, 115, 119, 128, 130, 176, 513

AT&T

800 Speech Recognition service, 298

articulatory models, 124-125

Directory Assistance Call Completion, 292, 301-302

control of network fraud, 291

government funding, 349

hidden Markov models, 175

Hobbit chip, 511-512

HuMaNet, 454

Intelligent Network, 292, 298

operator services automation, 291,292, 293

packet data network [XUNET], 99

speech synthesis technology, 107-108, 112, 124-125

spoken language translation, 9, 10, 130

Telephone Relay Services (TRS), 292, 302-303

telephone speech database, 407

Terminal Device for the Deaf (TDD), 302

text-to-speech system, 348-349

voice dialing system, 300, 383-385

Voice English-Spanish Translation (VEST), 10

Voice Interactive Phone, 292, 300-301

voice processing vision, 285-286

Voice Prompter, 292

Voice Recognition Call

Processing (VRCP), 292, 293-295, 383-385

voice response systems market, 281, 282

Who's Calling service, 282

wordspotting techniques, 305

Auditory modeling, 24, 26, 91, 92, 94, 97

B

Bandwidth compression, 81

Basilar membrane filtering, 97

Bell, Alexander Graham, 77-78

Bell Atlantic, 291

Bell Mobility (BM), 302

Bell Northern Research (BNR), 176, 282, 283, 292, 293, 294, 295, 299, 383-386, 437, 438, 439

Bell System, 6

Bellcore, 291-293

Bigram models, 201, 209, 211, 213, 214, 222

Bit rates

and image processing, 101

speech coding and, 23, 24, 81, 83-84

text-to-speech synthesis, 29, 77

Bolt, Beranek, and Newman (BBN) Systems and Technologies

ATIS, 46, 261

Delphi system, 259

directory service, 438

hidden Markov models, 175

N-best filtering and rescoring, 267

word lattice parsing, 265

''Break index" data, 147, 148

C

C cross compiler, 399-400

Cambridge University, 176

Carnegie-Mellon University (CMU).

See also Airline Travel Information System ATIS, 46, 261

dialogue state information, 229

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 529

Carnegie-Mellon University (cont'd)

HMM applications in speech recognition, 175

multilingual systems, 42, 83

Phoenix system, 258, 259, 260

recursive transition networks, 222

spoken language translation, 9

Cepstrum techniques, 28, 86, 178, 182-183, 476

Chaos, 21, 26

Classification and decision tree techniques, 152

Classification and regression tree techniques, 147

CNET, 130

COCOSDA group, 130

Coding. See Linear Predictive coding; Music coding; Speech coding

Compact disc technology, 334

Compound words, 142, 147

Compression

algorithms, 83, 381

bandwidth, 81

image, 99

speech, 23, 83, 474

two-channel amplification, 332

Computation

models of language, 78, 81, 86, 90-91

of pronunciation, 139

research needs, 30

speech recognition systems, 30

speech synthesis, 137

speed, 19-20, 97

teraflop capability, 97

Viterbi algorithm, 173

Computer-aided tools, 21, 510

Computer Search and Language, 2-3

Consonents

alveolar flapped, 142

clusters, 138, 140

modeling, 123

Consortium for Lexical Research, 241

Context-oriented clustering, 126

Corpora

Airline Travel Information Service, 61, 184-185, 219, 250, 256, 257-258, 491

American English, 489, 495

annotated, 493, 494-495

Brown, 489, 495, 499

common speech, 181-182

connected digit, 184-185

IBM/Lancaster Treebank, 495

large linguistic, 447

Penn Treebank, 241, 491, 495

Resource Management, 181-182, 184, 185, 188, 376, 377

optimization, 113

telephone speech, 408-409

Wall Street Journal, 184-185, 186, 187

Creak (vocal), 122

CRIM, 176

Cross-word effects, 182

CSELT, 176

CSTR, 130

Currency, pronunciation of, 143

Cybernetics, 445-446, 448-449

D

Databases. See also Corpora

algorithms, 405-409

for applications, 406-408

dialect considerations, 409

interfaces, 240, 252

large tagged, 152

natural language interfaces, 240

NTIMIT, 409

Official Airline Guide, 46, 219

relational, 53-54

for research, 405-406

remote access to, 42, 44, 278, 296-299, 348, 349, 351

retrieval system, product quality, 57

simulated telephone lines, 408-409

speech, 387, 405, 407, 468, 472

StockTalk, 383-386, 437, 438, 439

WordNet, 499

DEC, 130

Decision criteria, 305

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 530

Defense Advanced Research Projects Agency. See Advanced Research Projects Agency

Deployment of applications

degree of difficulty and, 375-386

hardware considerations, 381, 382-383

language understanding task dimensions and, 379-381

military technology transfer, 367-369

obstacles to, 374-375

procedure for, 386-388

speech recognition task dimension and, 377-379

speech synthesis task dimensions and, 381-382

system integration requirements, 383

technical challenges in, 280-281

Desert Storm, 360

Dialogue

capabilities, 85, 403-405

clarification/confirmation, 56, 62-63

continuous speech, 431-432

convergence of styles, 60

conversational dynamics, 431-432

engineering constraints, 387-388

feedback and confirmation, 437-438

finite state transition network, 63, 85

flow, 435-436

grammars, 62, 63

interaction and, 61-63

models, 62-63

natural language, 17, 56, 61-63

quantity of text and, 381

real-time processing function, 403-404

research, 63, 66

robustness of, 66

speech recognition, 63

spoken language systems, 47, 60, 61-63, 66, 229

talk-over, 431

task-specific voice control, 452

transcript, 433-434

Dictation devices, automatic, 50, 77, 81, 426, 428, 437-438. See also Text, typewriters

Digital

encryption, 83

speech coding, 25, 82-83, 85

filtering, 19

telephone answering machines, 7-8

Digital computers. See also Digital signal processors

and speech signal processing, 19, 78, 81, 189, 393-396

and microelectronics, 19-21, 81

Digital-to-analog converter, 23, 398

Digital signal processors/processing

applications, 350, 400-401

capabilities, 391, 393-394

development environment, 399-405

distributed control of, 404-405

floating-point, 383, 394-396

growth of, 19, 78, 81

integer, 383

for LSP synthesis, 398

mechanisms, 393

microphone arrays, 97

technology status, 393-396

transputer architecture, 396, 397

workstation requirements, 189

Digitizing pens, 52

Diplophonia, 122

Discourse

natural language processing, 246

and prosodic marking, 149-151

speech analysis, 145, 149-151

in spoken language systems, 227-230

in text-to-speech systems, 145

Dragon Systems, Inc., 176, 380, 401, 402

Dynamic grammar networks, 265-266

Dynamic time warping (DTW), 28

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 531

E

Electronic mail (e-mail), 8, 306, 381

ESPRIT/Polyglot project, 123, 129, 130, 406

Etymology

proper name estimates, 92

trigram statistics, 141

Experiments

capabilities, 32

real-time, 32

research cycle, 183-184

Extralinguistic sounds, 122

F

Fallside, Frank, 1-3, 445-446

Fast Fourier Transform (FFT), 28, 84, 475

FAX machines, 5

Feature

extraction, 177-178

delta, 182-183

vectors, 182-183

Federal Aviation Administration, 365-366, 509

Federal Bureau of Investigation, 367

Fiber optics, 6

Filter bank outputs, 28, 475

Filters/filtering

adaptive, 332, 414, 456-457

basilar membrane, 97

digital, 19

high-pass, 332

language understanding component for, 22

linear time-varying, 477-478

N-best, 227, 267

transverse, 415

Flex-Word, 292

Fluid dynamics, principles in speech production, 87-90

Force feedback glove, 98, 101

Foreign language. See also Multilingual systems; Spoken language translation

learning, 44

word incorporation in text-to-speech systems, 138

Formants, 122-123, 125

Fractals, 21, 26

Frequency-domain representation, 24, 476

G

Gestural inputs, 65

Government. See Military and government applications

Grammars

ambiguity, 380

bigram, 179

combinatory categorical, 490

context-free, 264, 461, 490, 491-494

covering, 493

dialogue, 62, 63

dynamic grammar networks, 265-266

features-value structures in, 264

finite-state, 266, 379-380

formalisms, 490

hand-coded linguistic, 483

lexicalized, 490

lexicalized tree-adjoining, 490

Markov, 179-180

modeling, 28, 63

natural language understanding and, 37-38, 264, 380, 491-494

perplexity, 180, 185, 229, 378

probabilistic context-free, 491-494

size, 37-38

speech analysis and, 28, 36-38

speech recognition, 36-37, 41-42, 63, 81, 85-86, 179-180, 185-186, 265-66

statistical n-gram, 183, 224

training speech, 179-180, 185-186

trigram, 141, 179-180, 183

unification, 461

Graphical user-interface. See also User interfaces

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 532

Graphical user-interface (cont'd)

growth of, 108

guidelines, 66-67

hierarchical menu structure, 54

speech compared with, 54-55

strengths, 52-53

weaknesses, 53, 57-58

H

Handwriting

recognition, 402-403

screen-based channel, 64

Hardware technology. See also Digital signal processors; Microcomputers; Personal computers; Workstations

advances in, 391

CISC architecture, 392, 392

Hobbit chip, 511-512

Intel x86 series, 392

microprocessors, 383, 391, 392-393, 396

Motorola 68000 series, 392

RISC chips, 383, 392-393

speech-processing equipment and systems, 383, 396-405, 510-511

V810 multimedia processing chip, 511-512

Health Interview Survey on Assistive Devices, 312

Hidden Markov models (HMM)

bigram, 201, 211, 213, 214

defined, 171-173

estimation of statistical parameters of, 199, 202-208

feature extraction, 177-178

fenonic case, 207

grammar-state-transition table, 266

limitations of, 189-190

Markov chains, 170-171, 172

and mel-frequency cepstral coefficients, 178

neural nets combined with, 193-194

part-of-speech tagging, 487-488, 490

phonetic, 166, 173-175, 178-179, 182, 188

and semantics, 221

speaker recognition systems, 30, 85

speech recognition, 28, 30, 85, 170-175, 177-178, 199, 200-208, 377, 394, 396, 397, 478-479

speech variability and, 28, 415-416

and talker verification, 86

three-state, 172

theory development, 175

training and analysis, 30, 178-179, 181-182, 478-479

trellis representation, 203, 208, 212

trigram, 201-202, 212, 213-214

unigram, 210

Viterbi algorithm and, 210

word models, 179

wordspotting, 397

Human-human communication

conversational dynamics, 431-432

language imitation, 60

repair rates, 260

studies, 50-51

I

IBM, 9, 175, 349, 380, 495

Image compression, 99

Image processing, 78, 101

Information processing

in auditory systems, 91, 94

speech technologies, 453

Information retrieval, 54-55, 57

INFOVOX, 130

Institute for Defense Analyses, 175, 234-235

Institute for Perception Research, 127

INTELLECT, 57

Integrated Services Digital Network (ISDN), 84

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 533

Interaction. See also User interfaces

acoustic, 122

and dialogue, 61-63

failures, cost of, 426-427

large-vocabulary conversational, 101-102

natural language, 51-57, 58

speech recognition, 36

spoken language systems, 51-57, 60, 61-63

system requirements for voice communications applications, 383

Intonation

contours, 45

cues, 432

parts-of-speech distinctions and, 151

structures, 129

models, 127

J

Joysticks, 52

K

Karlsruhe University, 83

Klatt, Dennis, 111, 123

Kratzenstein's acoustic resonators, 78, 80

Kurzweil Applied Intelligence, 380

L

Language

acquisition, theory of, 2

generation, 38, 241

imitation, 60

processing, 239; see also Natural language processing

variability, 380

Language modeling. See also Natural language

bigram, 201, 209, 211, 213, 214, 222, 461

computational, 78, 81, 86, 90-91

etymology estimates for proper names, 92

future of, 307

research needs, 26, 29

speech recognition, 29, 81-82, 90-91, 168-169, 183, 263, 307

speech synthesis, 128

statistical, 263-264, 461, 472-473

trigram, 92, 183, 209-210, 212, 213-214, 461

by users, 60

Laryngalization, 122

Law enforcement, 367

Lexicons, 138, 140, 141-142, 178-179, 188, 296, 499

LIMSI, 176

Linear predictive coding

analysis by synthesis, 24, 26-27, 119

mapping code book, 128

code-excited (CELP), 24, 26, 83, 101

mixed-excitation (MELP), 24

multipulse excited (MPLPC), 24, 26

pitch-excited, 24

robustness of, 97

self-excited (SEV), 24, 26

AND SPEECH ANALYSIS, 575

Linguistic analysis, 59-60, 259, 263, 382, 461,484

Linguistic Data Consortium, 181, 241, 252

Linguistics. See also Parsing; Semantics; Syntax

after-thoughts, 256

consonent cluster, 138

discourse-level effects, 149-151

English lexical stress system, 141-142

letter-to-sound relationships, 138, 140-141

metonymy, 256-257, 257-258

morphonemics, 141-142

orthographic conventions, 142-143

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 534

Linguistics (cont'd)

parts-of-speech assignment, 143, 151

prosodic marking, 145-149

spontaneous speech, 255-258

vocalic suffixes, 139

word-level analysis, 138-139

M

Machine translation, 240

Markov chains, 170-171, 172

Masking, time and frequency, 84, 93-94, 177-178

Massachusetts Institute of Technology

articulatory models, 124

ATIS, 46, 227, 261

HHMs for speech recognition, 176

MITalk, 123

multilingual synthesis, 130

speech synthesis, 111, 123, 124

TINA language understanding system, 222, 223, 259

Matsushita, 130

MCI, 300

Mel-frequency cepstral coefficients (MFCC), 178, 182-183

Message processing, 241, 251

Microcomputers. See also Personal computers

computation speed, 19-20, 97

device density, 20

digital signal processing, 19

projected advances in, 102-103

speech processing and, 19-20, 81,396-399

Microelectronics

chip densities, 102

digital computation and, 19-21

research, 21

revolution, 108

speech signal processing, 19-20

Microphones

applications, 86-87, 102

autodirective arrays, 86-89, 96, 97, 99-100, 102

beamforming systems, 87, 88, 99

characteristics, 414

digital signal processors, 97

directional, 333, 414-415

electret, 87, 88, 97, 102

environmental variation in speech input, 412-413, 460

in hearing aids, 331-332, 333

noise reduction, 331-332, 414-415

reflection and reverberation, 414

speaker distance from, 414

and speech recognition, 379, 414

technology projections, 102

three-dimensional, 96, 97, 99-100

track-while-scan mode, 87, 89

Microsoft Windows, 52

Military and government applications. See also Advanced Research Projects Agency; other government agencies

Agent's Computer, 367

Air Force, 359, 365

air traffic control, 365-366

aircraft carrier flight deck control and information management, 363

Army, 359, 360-363

combat team tactical training, 364-365, 366

Command and Control on the Move (C2OTM), 360-361

law enforcement, 367

Multi-Role Fighter, 365

Navy, 363-365

Pilot's Associate system, 365

Soldier's Computer, 360, 361-362, 367

SONAR supervisor command and control, 363-364

technology transfer issues, 367-369

voice control of systems, 360, 362, 365

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 535

Mixed-mode communication. See Multimodal systems

Models/modeling. See also Hidden Markov models; Language modeling

acoustic, 26, 36, 64, 85, 95, 117, 122, 182-183, 476

allophone, 182

articulation, 88, 95, 117, 118, 120, 122, 124-125, 152-153

auditory, 24, 26, 91, 92, 94, 97

bigram, 201, 209, 211, 213, 214

computational, 78, 81, 86, 90-91

consonents, 123

context-dependent, 182, 246

cross-word effects, 182

dialogue, 62-63

grammar, 28, 63, 380

intonation, 127

Klatt, 123

left-to-right, 175

natural language understanding, 238-253, 262-264

noise excitation, 122

phonetic, 173-174, 190-191, 193

prosody, 117

segmental, 125, 173-174, 190-191,193

signal, 19, 101

sinusoidal, 24

sound source, 462

source/system, 22, 118, 120-122

speech perception, 26

speech production, 22

speech recognition requirements, 168-169

speech synthesis, 109, 116-130

speech variability, 176

spoken language systems, 48

stochastic segment, 190-191

trigram, 201-202

vocal tract, 95, 118, 122, 124, 125

wave propagation, 26

word, 179, 207

Modulation theory, 26

Morphemes, 137, 139, 140

Morphology, speech synthesis, 110, 111, 112, 113, 137, 141-142, 489

Morphs, 138-139, 140

Motorola, 383, 392

Mouse, 52, 350-351, 402-403

Multilingual systems. See also Foreign language; Spoken language translation; Telephony

future of, 513-514

INTERTALKER, 513-514

Japanese kana-kanji preprocessor, 403

MITalk, 130

PIVOT, 512-513

speech synthesis, 42, 101, 117, 129-130, 151-152

Multimodal systems. See also User interfaces

advantages of, 426

error avoidance, 64

error correction, 64

HuMaNet, 454

referent determination difficulties, 61

robustness, 64

situational and user variation, 64-65

synergistic integration of sensory modalities, 100-101, 102

user interfaces, 32, 56, 63-65

Multiprocessing, 21

Music coding, 84

N

N-Best interface, 217, 221, 226, 233

N-Best Paradigm, 191, 193

National Institute of Standards and Technology, 377

Natural language. See also Speech recognition; Spoken language

anaphora, 55

dialogue, 17, 56, 61-63

interaction, 51-57, 58

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 536

Natural language (cont'd)

modeling, 128; see also Language modeling

research directions, 56, 241

and speech recognition systems, 17, 262-267, 388

and spoken-language systems, 59-61

spontaneous speech, 59, 263

typed, 57

Natural language processing

ambiguity-handling algorithms, 56

applications, 240, 241, 250-253

clarification/confirmation subdialog, 56, 62-63

components of, 243-250

constraints on, 17, 59-61, 262-268, 388, 482-484, 491

context modeling, 57, 246

cooperating process view of, 248-250

database interfaces, 240

domain model extraction, 250

evaluation, 250-252, 483

grammars, 264, 483

history of, 240-241

ideal systems, 55-56

inputs to, 241-243

integration architecture, 265-267

machine translation, 240

menu-based system, 56

outputs, 243

parsers, 59, 247, 483, 489-495

portability of systems, 252

pragmatics, 246, 250

problems, 241-243

product quality database retrieval system, 57

prosodic information in, 268-269

reasoning, 246-247

reference resolution algorithms, 57

research directions, 460-461, 500-502

response planning and generation, 246-247

rule-based, 482-484

semantics, 245-246, 247, 250, 486, 495-500

simplified systems, 247-248

speech processing and, 460-461

state of the art, 252-253

statistical techniques, 484

strengths, 55-56, 58

syntactic, 244-245, 247

training [learning], 56, 57, 58, 249, 250, 252

verbal repair detection, 269

weaknesses, 56-57

Natural language understanding.

See also Linguistics; Speech recognition; Spoken language understanding

accuracy/error rates, 47, 251, 252, 255, 261, 262, 388

applications, 379-381

architecture, 485-487

background, 238-239

current capabilities, 10, 506

defined, 239

grammar, 37-38, 263, 380, 491-494

language variability and, 380

models of, 238-253, 262-264

off-the-subject input and, 287, 380, 388

part-of-speech tagging, 487-489

preprocessing and, 489

search process, 248-249

speech constraints in, 268-269

stochastic parsing, 489-495

task difficulty and, 379-381

TINA system, 222

vocabulary size and, 37-38

unknown words, 488-489

Naval. See also Military and government applications

Air Technical Training Center (Orlando), 363-364

Combat Team Tactical training, 366

Ocean Systems, 363

Personnel Research and Development Center, 364-365

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 537

Naval (cont'd)

Resource Management task, 376

Underwater Systems Center, 363

Navier-Stokes equation, 89

Neural nets. See Artificial neural networks

Neural transduction, 97

New Mexico State University, 241

Nippon Electric Corporation (NEC), 9, 10, 42, 82, 176, 383, 506, 507-511

Nippon Telephone and Telegraph (NTT)

analysis-synthesis systems, 119

ANSER (Automatic Answer Network System for Electrical Requests), 283, 291, 292, 296-297, 398-399, 407-409, 410, 417

concatenative synthesis, 126

HMM applications, 176

systematic optimization techniques, 115

telephone speech database, 407

Noise

additive, 459

and algorithm robustness, 413

excitation, 122

immunity, 305

Lombard effect, 415, 460

reduction technology, 331-333, 414-415

sources, 122

and speaker variation, 415-416

and speech recognition, 288, 305, 379, 388, 414-415, 469, 473-474

white, 122

Northern Telecom, 278, 291, 295, 299

Numbers, pronunciation of, 143, 288

NYNEX, 282, 283, 291, 292, 300, 301-302, 407, 409, 436

0

Occam parallel programming language, 396

Octel, 281

Official Airline Guide database, 46, 219

Olive, Joseph, 107

Operating systems

pen, 402, 511-512

speech, 417

Optical character recognition technology, 43, 349

Oregon Graduate Institute, 407

P

Packet data network (XU-NET), 99

Paget, Richard, 15-16

Palantype keyboard, 335

Parallel processing, 89, 383, 400

Parsing/parsers

ambiguous, 147-148

clause-level, 144, 145

crossing brackets, 491

natural language, 59, 247, 483, 489-495

phrase-level, 144-145, 146

probabilistic, 56

and prosodic marking, 56, 144, 146-147

in speech synthesis, 137, 139, 144-145

stochastic, 489-495

of unrestricted text, 144

word lattice, 265

Pause insertion strategies, 129

Performance structures, 146

Personal Communication Devices, 306

Personal Communication Networks, 306

Personal Communication Services, 306

Personal computers

hand-held, 355

portable, 64-65

sound boards, 350, 353, 397

speech interfaces for, 511

speech processing technology, 108, 374, 401-403, 509-510

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 538

Phoneme

conversion to acoustic events, 429

intelligibility, 411

recognition systems, 182

sequences, 138, 175, 461-462

Phonetics

acoustic, 85, 95

hidden Markov models, 166, 173-175, 178-179, 182, 188

segmental models, 125, 173-174, 190-191,193

and speech recognition, 167, 169-170, 188

in training speech, 30, 178-179, 182-183

text-to-speech synthesis, 85, 125, 174

typewriter, 511

Pierce, John, 283

Pitch-synchronous overlap-add approach (PSOLA), 114, 119-120, 128-129

Pitch-synchronous analysis, 127

Pragmatic structure, 144, 246, 150, 246, 250

Pronunciation

abbreviations and symbols, 142-143

computational, 139

numbers and currency, 143, 288

part of speech and, 144

speech recognition, 44

surnames/proper names, 140-141, 288

symbols, 142-143

Proper names, 92, 288, 458, 484

Prosodic phenomena

algorithm, 146, 147, 151

articulation as a basis for, 152-153

and conversational dynamics, 431

defined, 144, 145-146

discourse-level effects, 149-151

marking, 144, 145-149

modeling, 117

multiword compounds, 147

in natural language processing, 268-269

parsing and, 56, 144, 146-147

pauses, 431

phrasing, 146-147, 151

PSOLA technique for modifying, 128-129

in speech synthesis, 88, 117, 119, 124-125, 128-129, 145-149, 288-289

and speech quality, 88, 118, 288-289

Psychoacoustic behavior, 78, 91, 94

Pulse Code Modulation, adaptive differential (ADPCM), 24, 82-83, 101

Q

Quasi-frequency analysis, 177

Query language, artificial, 57

R

Rabiner, Lawrence, 111, 113

Recursive transition networks (RTNs), 222

Repeaters, electromechanical, 81

Resonators, 78, 80

Research methodology, spoken language vs. types language, 47-48

Robust processing techniques, 259-260, 263

Robustness

algorithms, 391, 392, 405, 412-416

ATIS system, 262

case frames and, 258

classification of factors in, 412-413

dialogue systems, 66

environmental variation in speech input and, 412-414

lexical stress system, 142

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 539

Robustness (cont'd)

linear predictive coding, 97

multimodal systems, 64

natural language systems, 56, 59, 262

noise considerations, 413

research, 417-418

speaker variation and, 415-416

speech analysis, 97

speech recognition systems, 29-30, 44, 184, 261-262, 459-460

speech synthesis, 139

speech variation and, 413

spoken-language understanding systems, 66, 258-259

templates and, 258-259

user interfaces, 56

word error rates, 182-183, 184, 185-186

Royal Institute of Technology (KTH), 122, 123, 124, 125, 129

Rutgers University, CAIP Center, 98, 99

S

Sampled-data theory, 78, 81

Security applications

seaker verification, 9, 30, 86, 300, 305

low bit-rate coding for transmission, 7

Semantics

ambiguity, 380

compositional, 486

First-Order Predicate Logic, 245-246

lexical, 486, 495-500

natural language, 245-246, 247, 250

pragmatics and, 144

propositional logic, 245

and speech recognition, 305-306

and spoken-language understanding, 220-221

Sensimetrics Corporation, 123

Siemens A. G., 9, 42, 83

Signal modeling techniques, 19, 101

Signal processing

digital, 19, 97

enhancement, 102

research, 21

Sinusoidal models, 24

Software technologies, 391

Sound

generation, 118, 119, 124

source model, 462

Sound Pattern of English, 126

Sound/speech spectrograph, 319, 325, 349

Source-filter decomposition, 128

Speak 'N Spell, 110

Speaker

adaptation, 459, 460

atypical, 187-188

dependence, 36

recognition/identification, 9, 30, 85, 348

style shifting, 460, 461

variation, 415-416

verification, 9, 30, 86, 300, 305

Speaking characteristics and styles, 128-129, 378-379

Spectrum analysis, 19

Speech

behaviors, conversational, 430-432

casual informal conversational, 82

compression, 23, 83, 474

connected, 97

continuous, 36, 78, 95, 323, 427-428, 430-431

constraints on, 77, 268-269

databases, 405, 407-409, 468

dialect, 409

digitized, 38, 45, 189, 428

dysarthric speech, 337

gender differences, 129

information processing technologies, 453

interactive, 36

intonation, 45, 127, 129, 432

knowledge about, 117

machine-generated, 335

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 540

Speech (cont'd)

noninteractive, 48

pause insertion strategies, 129

perception models, 26

preprocessor, 403

production, 21-22, 26, 77, 87-90, 137-138

prolongation of sounds, 322

psychological and physiological research, 462

self-correction, 256, 432

signal processing systems, 19

slips of the tongue, 257

spontaneous, 58-59, 185, 255-260, 303, 460, 461, 469-471

standard model of, 267

synthetic, 428-428; see also Speech synthesis; Speech synthesizers

toll quality, 23, 24

training, 322, 325

type, 36

ungrammatical, 257

units of, 168-170, 462-463

variability, 28, 176, 378, 413, 459-460, 480

waveforms, 24, 136, 137

Speech analysis

acoustic modeling, 26

analysis-by-synthesis method, 26-27

auditory modeling, 26

defined, 22

dimensions, 36-38

importance, 21

interactivity, 36

language modeling, 26

linear predictive coding, 24

robustness, 97

speech continuity, 36

speech type, 36

vocabulary and grammar, 28, 36-38

vocal tract representation in, 90, 91

Speech coding, 26

applications, 82-83

articulatory-model-based, 125

audio perception factors in, 84, 85

in cochlear implants, 331

concatenation using speech waveforms, 117

bit rates and, 23, 24, 81, 83-84

digital, 25, 82-83, 85

and masking, 84, 93

predictive, 117

psychoacoustic factors in, 101

research challenges in, 76

rule-based diphone system, 118

stereo coding, 84-85

technology status, 82-85, 281

terminal analog, 118

wideband audio signals, 84

Speech processing

algorithms, 21, 393

articulatory and perceptual constraints in, 461-463

digital, 22-23, 76

equipment and systems, 19-20, 81, 396-399

evaluation methods, 463-464

in hearing aids, 317

and natural language processing, 460-461

obstacles to, 373

research challenges, 76-77

psychoacoustic behavior and, 94

for sightless people, 333-335

and speech technology development, 76, 78

Speech recognition

accuracy, 28, 37, 41, 46-47, 86, 159, 181-189, 377, 378, 470, 473

acoustic modeling, 64, 182-183

adverse conditions, 459-460

algorithms, 28, 409-411, 412, 417-418, 469

alternative models, 189-193

analysis-by-synthesis, 30

applications, 28-29, 30-32, 81, 275-282, 283-284, 318, 377-379, 451, 457, 458, 471, 508-510

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 541

Speech recognition (cont'd)

articulation and, 152-153

assessment techniques, 410-411, 463-464

''barge in" (interruption of conversation) and, 277, 287, 292, 295, 298-299, 388, 404

common speech corpora, 181-182

complexity, 17

connected digit corpus, 184-185

continuous speech, 78, 165-194, 323, 471,506

defined, 7, 239, 348

decision criteria, 305

decoding, 209-214

dialogue grammar approach models, 63

dimensions of task difficulty, 376, 377-379

domain independent (DI), 187

dynamic grammar networks, 265-266

dynamic programming matching, 509

environmental factors, 413-414

error correction, 64, 261-262, 388

feature extraction, 177-178, 180

Flexible Vocabulary Recognition, 295

future, 307-309, 456-459

generalization, 479

Hidden Markov models and, 28, 30, 85, 170-175, 177-178, 199, 200-208, 377, 397, 478

historical overview, 175-176

improvements in performance, 181-184, 388

interactivity, 36

language modeling, 29, 81-82, 90-91,168-169, 183, 263

large-vocabulary systems, 183, 193, 277, 292, 506

lip reading, 64

linguistic rules, 82

market for technology, 350-351, 416-417

microphones and, 305, 414

most likely path, 208-209

most likely word sequence, 209-214

N-best filtering or rescoring, 267

natural language and, 17, 262-267, 388

naturalness, 45, 153

neural networks, 191-193

new words, 188-189

noise immunity and channel equalization, 288, 305, 379, 388, 414-415, 469, 473

normalization of speakers in, 30, 456-457, 459, 460

pattern matching, 474, 478-479

perplexity of language model and, 37, 180, 185, 229, 378, 463

phonetics and, 167, 169-170, 188, 410

processes, 167-168, 180-181, 199, 451,453-454, 473-474

pronunciation and, 44

prototype systems, 34

real-time, 189

rejection of irrelevant input, 287, 388

and repetitive stress injuries, 43

research challenges, 29-30, 44, 76, 108, 183-184, 304-306, 417-418

robustness, 29-30, 44, 184, 261-262, 459-460, 473, 474

sample performance figures, 184-185

search algorithms, 180-181, 248, 264-265

segmental models, 190-191, 473-474

sheep and goats phenomenon, 456

speaker-adaptive, 36, 187-188, 288, 388, 479

speaking characteristics and styles and, 128, 377, 378-379, 415-416, 460

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 542

Speech recognition (cont'd)

speaker-dependent, 28, 36, 54, 186-187, 292, 509-510

speaker expertise and, 378

speaker-independent, 28, 36, 37, 46, 184, 186-187, 188, 362-363, 378, 397, 425, 433-434, 506, 507

spontaneous speech and, 58-59, 185, 460, 461, 469, 471

SR-1000 system, 507

SR-3200 system, 507

subword units, 287-288, 299, 388

successful systems, 239

system structure, 27-28, 398,401,402

talker verification, 86

task completion rate, 410

technology status, 8-9, 18, 81, 85-86, 112-113, 159-164, 165-166, 181-189, 286-288, 428,468

templates, 258-259, 425

terminal-type, 508-510

training data, 178-180, 185-186, 457, 459, 473, 478-479

transputer-based, 397

trials, 417

units of speech and, 168-170

user tolerance of errors and, 379

vocabulary and grammar and, 36-37, 41-42, 81, 85-86, 185-186, 265-266, 277, 378, 457

Wizard of Oz assessment technique, 410-411, 439

word lattice parsing, 265

wordspotting, 286-287, 292, 295, 298-299, 305, 387, 388, 397, 404

Speech research

computational models of language, 90-91

critical directions in, 87-101

historical background, 78-82

language modeling, 26

physics of speech generation, 87-90

unification of coding, synthesis, and recognition, 94-95, 97

Speech synthesis. See also Text-to-speech synthesis

acoustic models, 85, 95, 117, 122, 476

analysis-synthesis systems, 117, 118, 119, 125

applications, 30-32, 108, 109, 110, 278, 381-382

articulatory models, 88, 117, 118, 120, 124-125, 152-153, 476, 480

assessment of, 411-412

automatic learning, 127

concatenative, 110, 114, 117, 118-119, 126, 168, 406

concept-to-speech systems, 38-39

content, 45

control, 124, 118, 125-127

corpus-based optimization, 113

defined, 22, 109, 110, 116, 348

digitized speech, 22-23, 25, 38

dimensions of task difficulty, 381-382

discourse-level effects, 149-151

error rates, 112

evaluation of, 130

expectations of listeners, 382

flexibility needs, 117-118

fluid dynamics in, 89-90

formant-based terminal analog, 117, 118, 122-123, 125

forms, 38-39

frequency domain approach, 119

future of, 152-153, 455-456

higher-level parameters, 123-124

history of development, 111-115

individual voices, speaking styles, and accents and, 117-118

input, 109

intelligibility, 44-45, 129, 130, 149, 382, 429

large-vocabulary systems, 101-102, 351

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 543

Speech synthesis (cont'd)

letter-to-sound rules, 140-141

linguistic aspects of, 135-153

market for, 351

microelectronics revolution and, 108

models, 109, 116-130

morphophonemics and lexical stress, 110, 111, 112, 113, 137, 141-142

multilingual, 42, 101, 117, 129-130, 151-152

natural speech coding and, 117, 128

naturalness, 129, 149, 381, 429, 456

noise sources, 122

and objective distortion metrics, 114-115

obstacles to, 117

orthographic conventions, 142-143

output, 118

parsing, 137, 139, 144-145

part-of-speech assignment, 143

phonetic HMM functions and, 174, 429

predictive coding, 117

process, 167-168, 135, 428-429, 453, 454, 479

prosody, 88, 117, 119, 124-125, 128-129, 145-149, 288-289

PSOLA (pitch-synchronous overlap-add approach), 114, 119-120, 128-129

quantity of text and, 381

real-time, 108

research, 25-26, 29-30, 44-45, 76, 108, 113-114, 128

rule-based, 111, 118, 125, 126-127, 140-145, 429

segmental, 113-114, 115, 125, 145, 479-480

sentence length and grammatical complexity, 382

sound generation, 118

source/system models, 22, 118, 120-121

speech quality, 130

structures and processes, 109-110

systematic optimization methods, 114

techniques, 118

text analysis, 110, 112, 113

technology status, 18, 29, 81, 85-86, 107-115, 411-412, 468

testing, 114-115

time functions, 111, 113, 118, 119, 476-478

variability of text and, 381-382

vocabulary, 119

vocal tract model, 95, 118, 122, 125

waveform concatenation (simple), 118-119, 383, 476

word-level analysis, 138-139

Speech synthesizers

acoustic terminal analog, 117

cartridge-type, 510

cascade, 122-123

future, 455-456

large-vocabulary, 349

neural network controller, 124

OVE, 123

parallel, 123, 125

terminal analog, 510

voice quality, 456

Speech technology, See Deployment of applications

capabilities and limitations, 427-430

challenges in, 284, 471-475

commercial developments, 352-354

foundations, 77-78

growth of, 2

information processing, 453

market, 350-352, 416-418

projections, 101-102, 355-356

readiness evaluation, 440

research on, 65-67, 417-418

service trials, 417

status, 82-87

trends, 117

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 544

Speech technology (cont'd)

voice input, 427-428

voice output, 428-429

Speech Technology Laboratory, 123

Speech transmission, low-bit-rate, 23, 24, 29, 77, 81, 83-84, 97, 474

Speech understanding, 17, 34, 37-38, 307, 379

Spoken language systems (SLS) ARPA, 218-220

comparison of modalities, 46-58

constraints on, 227-230

defined, 38, 241

dialogue, 47, 60, 61-63, 66, 229

discourse in, 227-230

efficiency of language-based modalities, 48-51

error metrics, 224-225, 259

error recovery, 439

evaluation of, 230-233, 251

human factors obstacles to, 58-63

interaction, 51-57, 60, 61-63

interfacing speech and language, 221-224

linguistic analysis, 59-60, 259

mixed initiative, 228-229

N-best interface, 217, 221, 233

natural language, 51-57, 59-61

order in problem solving, 229

prototypes, 46-47, 438

reference, 227-228

robustness, 66, 259

simulation methods, 66

speaker-independent, 65

spontaneous speech and, 58-59, 234, 255-260, 427-428

SUNDIAL, 228-229

research methodology, 47-48

technology development, 81

training, 60, 260

typed language contrasted with, 47-51, 60

user adaptation to, 60

Spoken language translation

current capabilities, 9-10, 42

defined, 9-10

directory assistance, 295-296

laboratory systems, 9-10

projections, 102

VEST (Voice English-Spanish Translator), 10, 42

voice output, 29

Spoken language understanding, 47

approaches to, 220-221

defined, 255

error repair, 260

limits on, 379

process, 452, 453

progress in, 224-226

spontaneous speech and, 258-260

Sprint, 300

SQL, 57

SRI International, 52, 176, 213

ATIS, 46, 261

Gemini system, 259, 260

Template Matcher, 258, 259

Stenograph, 322, 335

Stereo coding, 84-85

StockTalk, 383-386, 437, 438, 439

Stored voice, 110

Subband coders, 24, 83, 101

SUNDIAL spoken language systems, 229

Surnames, pronunciation of, 140-141, 288

Symbols, pronunciation of, 142-143

Symbolic learning techniques, 501

Syntax, 137. See also Parsing

natural language processing system, 244-245, 247, 269

speech recognition systems, 305-306

and spoken language understanding, 220-221

Syntactico-semantic theory, 447

System technologies. See Hardware technology; Workstations

T

Tactile technology, 101, 324-328

Talker. See Speaker

Talking statues, 78, 79

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 545

Technology transfer issues, 367-369

Telecommunications. See also Telephony

banking services, 292, 507

Baudot code, 323

conferencing, 101

cost-reduction applications, 290-291

digital speech coding, 82-83

information access from remote databases, 42, 44, 278, 296-299, 348, 349

interfaces, 397

market for speech technology, 290-304

personal communication networks and services, 306

predictions, 307-308

revenue opportunities in, 291-293

shaping user language, 60-61

speaker verification, 305

speech technology and, 7, 41-42, 285-286

technical challenges, 304-306

Telefbnica, 9, 10, 298

Telegraph, 80-81

Telephony. See also Telecommunications

Automated Alternate Billing Services, 292, 293, 431

Automated Customer Name and Address, 302

automatic interpreting, 513-514

bandwidth conservation, 19

banking by phone, 283, 291, 398-399, 407-408, 425

cellular, 6, 7, 81, 83, 374, 383-385, 507-508

deaf user aids, 43, 302-304

digital channels, 101

directory assistance, 41, 278, 282, 283, 291, 292, 295-296, 301-302, 355-356, 438, 458

history, 81

language translation, 10, 42, 77, 81, 82, 83, 108-109, 513-514

operator services, 8-9, 277, 282, 284, 291, 292, 293-296, 351, 353-354, 374, 380, 383-385, 387

simulated telephone lines, 278, 408-409

speech databases, 407

speech recognition technology, 428

teleconferencing, 454-455

telephone relay service, 302-304, 322

text telephone, 322, 323

voice-controlled automated attendant, 356

voiced-based dialers, 40, 292, 299-300, 355, 374, 376, 383-386, 436, 507-508

voice-interactive phone service, 292, 300-301,351

Voice Recognition Call Processing (VRCP), 292, 293-295, 376, 383-385

TELECOM, 510, 513

Telephone answering machines, digital, 7-8

Texas Instruments (TI), 110, 176, 184-185, 291, 300, 349, 377, 407

Text analysis, 110, 112, 113

Text-to-speech synthesis. See also Speech analysis

acoustic phonetics and, 85

address, date, and number processing, 288

advances in, 288-289

algorithms, 25

applications, 43, 109, 280, 282, 302, 354, 451

articulatory synthesis in, 124-125

cartridge-type device, 510

components of, 38

constraints on speech production, 137-137

development tools, 126-127

discourse analysis in, 145

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 546

Text-to-speech synthesis (cont'd)

error rate, 262

formant-based terminal analog, 122-123

future of, 152-153, 308

hardware requirement, 383

language modeling and, 26, 78, 90-91

linguistic analysis in, 382

multilingual, 42, 129, 397-398

naturalness, 381

output, 29

parsing, 144-145

part-of-speech assignment, 143

phonemic-based, 348

phonetic factors, 125

problems, 120, 303-304, 471

proper name pronunciation, 288

prosody, 288-289, 306

research challenges, 26, 304, 306, 324

rule system, 125

sound generation, 124

source models and, 120

speaker identity and normalization, 30

speaking characteristics and styles and, 128-129

structural framework, 136-137, 398

waveform approach, 24-25

word-level analysis, 138-139

Text preprocessors, 381-382

Time Assignment Speech Interpolation, 81

Tools. See Computer-aided tools

Touch screens, 50

Touch-Tone keypad, 335

Trackballs, 52

Training

natural language interactive systems, 56, 57, 58

neural nets, 193

shaping user language, 60-61

speech, 322

tactical, combat team, 364-365

Training speech [learning]

automatic, 263-264

databases for, 387, 405, 407, 468, 472

discriminative, 479

effects of, 185-186, 473

grammar, 179-180, 185-186

natural language processing, 56, 57, 58, 249, 250, 252, 263-264

phonetic HHMs and lexicon, 30, 178-179, 182-183

speech recognition, 178-180, 185-186, 457, 459, 473, 478-479

syntactico-semantic theory and, 447

Transatlantic radio telephone, 81

Transatlantic telegraph cables, 81

Transform coders, 24

Treebank Project, 241, 491, 495

Trigrams, 92, 183, 201-202, 209-210, 212, 213-214, 229

Triphones, 182

Turing's test, 35

Tuttle, Jerry 0., 363

U

United Kingdom, Defense Research Agency, 365

University of Indiana, 130

University of Pennsylvania, 181, 241,252, 491,495

US West, 300-301

Usability/usefulness. See also Applications of voice communications

determinants of, 31-32

issues, 18, 30-32

pronunciation and, 44

voice input, 39-44

voice output, 44-45

User interfaces. See also Graphical user-interface

artificial query language, 57

capabilities and limitations, 51-52, 387, 427-430, 434

cost of interaction failures, 426-427

databases, 240, 252

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 547

User interfaces (cont'd)

design strategies, 387, 423-424, 426, 433-440

dialogue flow, 435-436

direct manipulation, 51, 52-55, 57-58

error recovery, 438-440

evaluation of, 440

feedback and confirmation, 434, 437-438, 445

heirarchical, 454

information requirements of, 425-426

instructions, 438

keyboard dialogs, 49-50

metaphor, 54

multimodal systems, 32, 56, 63-65, 505, 508-510

N-best, 217, 221, 226, 233

natural language interaction, 55-57

personal computer, 511-512

prompts, 435-436, 471

research directions, 56, 511-512

revisions suggested, 435

robustness, 56

smart, 512-513

system capabilities, 429-430

task modalities, 426

task requirement considerations, 424-427

telecommunications, 397

training issues, 58

user expectations and expertise and, 430-432

voice-actuated, 360

voice input, 427-428

Users

conversational speech behaviors, 430-432

expectations and expertise, 430-432

language modeling by, 60

novices vs. experts, 432

satisfaction, 429-430

tolerance of speech recognition errors, 379

USS Ranger, 363

V

Vector quantiization, 28

Verbal repair, 269

Videophones, 5-6

Virtual reality technology, 454-455

Visual sensory aids, 319-324

Vocabulary

algorithms, 307

confusability, 378

conversational, 101-102

Flexible Vocabulary Recognition, 295

large, 101-102, 183, 193, 277, 292, 307, 349, 351, 506

and natural language understanding, 37-38

operator services, 277

speech analysis and, 28, 36-38

speech recognition and, 36-37, 41-42, 81, 85-86, 183, 185-186, 193, 265-266, 277, 292, 378, 457, 506

speech synthesis, 101-102, 119, 349, 351

user-specific dictionaries, 335-336

wordspotting techniques, 292, 305

Vocal tract modeling, 95, 118, 122, 124, 125

Vocoder, 48, 81, 83, 119, 325

Voice

control, assistive, 278-279, 313, 337, 360, 452

conversion system, 128-129

dialog applications, 375-377

fundamental frequency, tactile display, 326-327

input, 39-44, 50, 427-428

mail, 7, 81, 83, 101, 110

messaging systems, 281

mimic, 94-95

output, 44-45, 428-429

response, 25

task-specific control, 452

typewriters, 97, 376, 380, 451

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×

Page 548

Voice coding

algorithm standardization, 7

current capabilities, 7-8

defined, 7-8

research challenges, 306

security applications, 7

source models, 120-122

storage applications, 7-8

Voice communication, human-machine

advantages, 16, 48-51

art of, 387-388

current capabilities, 469

degree-of-difficulty considerations, 375-386

expectations for, 505-506

implementation issues, 18

natural language interaction, man-farm animal analogy, 16

process, 374

research and development issues, 511-513

research methodology, 47-48

role of, 34-67; see also Applications

scientific bases, 15-33

scientific research on, 65-67

simulations, 47-48, 50, 51

successful, 423

system elements, 17-18

and task efficiency, 48-49

transcript, 433-434

voice control, 337

VSLI technology and, 510-511

Voice processing

network-based, 292

market share, 281

research, 6

technology elements, 6-7

technology status, 467-468

telecommunications industry vision, 285-286

Voice synthesis

current capabilities, 8

defined, 8

output, 23, 17-18, 29

text-to-speech, 99

von Kemplen's talking machine, 78, 80

Vowel

clusters, 140

digraphs, 140

reduction, 129

VSLI technology, 468, 510-511

W

Wave propagation, 26

Waveform coding techniques. See also Speech coding

adaptive differential PCM (ADPCM), 24

speech synthesis, 118, 119, 136, 137, 381,474

Wavelets, 21

Wideband audio signals, 84

Windows, 52, 350, 353

Wizard of Oz (WOZ) assessment technique, 410-411, 439

Word-level analysis, 138-139

Word models, 179, 207

Word processors, speech only, 50

Word recognition systems, 182, 188

Workstations

Hewlett-Packard 735

RISC chips in, 393

Silicon Graphics Indigo R3000, 189

speech input/output operating systems, 401-403

speech processing board, 397

Sun SparcStation 2, 189

Wheatstone, Charles, 80

X

Xerox, 52

Z

Zipf's law, 489

Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 525
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 526
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 527
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 528
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 529
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 530
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 531
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 532
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 533
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 534
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 535
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 536
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 537
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 538
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 539
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 540
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 541
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 542
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 543
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 544
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 545
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 546
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 547
Suggested Citation:"Index." National Academy of Sciences. 1994. Voice Communication Between Humans and Machines. Washington, DC: The National Academies Press. doi: 10.17226/2308.
×
Page 548
Voice Communication Between Humans and Machines Get This Book
×
 Voice Communication Between Humans and Machines
Buy Hardback | $95.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Science fiction has long been populated with conversational computers and robots. Now, speech synthesis and recognition have matured to where a wide range of real-world applications—from serving people with disabilities to boosting the nation's competitiveness—are within our grasp.

Voice Communication Between Humans and Machines takes the first interdisciplinary look at what we know about voice processing, where our technologies stand, and what the future may hold for this fascinating field. The volume integrates theoretical, technical, and practical views from world-class experts at leading research centers around the world, reporting on the scientific bases behind human-machine voice communication, the state of the art in computerization, and progress in user friendliness. It offers an up-to-date treatment of technological progress in key areas: speech synthesis, speech recognition, and natural language understanding.

The book also explores the emergence of the voice processing industry and specific opportunities in telecommunications and other businesses, in military and government operations, and in assistance for the disabled. It outlines, as well, practical issues and research questions that must be resolved if machines are to become fellow problem-solvers along with humans.

Voice Communication Between Humans and Machines provides a comprehensive understanding of the field of voice processing for engineers, researchers, and business executives, as well as speech and hearing specialists, advocates for people with disabilities, faculty and students, and interested individuals.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!