Skip to main content

Currently Skimming:

6 Achieving Intelligence
Pages 101-126

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 101...
... Going beyond simply retrieving information, machine learning draws inferences from available data. Mitchell describes the application of classifying text documents automatically and shows how this research exemplifies the experiment-analyze-generalize style of experimental research.
From page 102...
... Koller and Biermann examine the history of computer science endeavors in chess and checkers, showing how success depends both on "smarts" (improved representations and algorithms) and sheer computer power.
From page 103...
... Computer scientists have studied the problem of automatic text classification for a number of years, over time developing increasingly effective algorithms that achieve higher classification accuracy and accommodate a broader range of text documents. Machine Learning for Text Classification One approach to developing text classification software involves machine learning.
From page 104...
... . Improving Accuracy by Learning from Unlabeled Examples Although the naive Bayes classifier can often achieve accuracies of 90 percent or higher when trained to discriminate classes of Web pages such as "personal home page" versus "academic course Web page," it often requires many hundreds or thousands of training examples to reach this accuracy.
From page 105...
... 105 and contain. words they of list words the the by of (shown frequencies the contains by it word solely each of described are frequencies the documents by text represented approach, is left the on bag-of-words right)
From page 106...
... At first it may seem that the answer must be no, because providing unlabeled examples amounts to providing an example input to the program without providing the desired output. Surprisingly, if the text documents we wish to classify are Web pages, then we shall see that the answer is yes, due to a particular characteristic of Web pages.
From page 107...
... . In short, we say in this case that the features describing the example Web pages are redundantly sufficient
From page 108...
... . This characteristic of Web pages suggests the following training procedure for using a combination of labeled and unlabeled examples: First, we use the labeled training examples to train two different naive Bayes classifiers.
From page 109...
... For example, they show that if one makes the additional assumption that X1 and X2 are conditionally independent given the class Y, then any function that is learnable from noisy labeled data can also be learned from a small set of labeled data that produces better-than-random accuracy, plus unlabeled data. Summary This case study shows how the attempt to find more accurate learning algorithms for Web page classification motivated the development of a specialized algorithm, which in turn motivated a formal analysis to understand the precise class of problems for which the learning algorithm could be proven to succeed.
From page 110...
... And the experiment-analyze-generalize cycle of research often leads to a second and third generation of experiments and of theoretical models that better characterize the application problems, just as current theoretical research on using unlabeled data is now considering problem formalizations that relax the assumption violated by the "click here" hyperlinks.
From page 111...
... In this essay, we describe the history of statistical NLP; the twists and turns of the story serve to highlight the sometimes complex interplay between computer science and other fields. Although currently a major focus of research, the data-driven, computational approach to language processing was for some time held in deep disregard because it directly conflicts with another commonly held viewpoint: human language is so complex that language samples alone seemingly cannot yield enough information to understand it.
From page 112...
... Such is not the case, however. In order to appreciate this point, we temporarily divert from describing statistical NLP's history -- which touches upon Hamilton versus Madison, the sleeping habits of colorless green ideas, and what happens when one fires a linguist -- to examine a few examples illustrating why understanding human language is such a difficult problem.
From page 113...
... For example, consider the speech recognition problem: how can we distinguish between this utterance, when spoken, and ".
From page 114...
... found in language data are important sources of information, or, as the influential linguist J.R. Firth declared in 1957, "You shall know a word by the company it keeps." Such notions accord quite happily with ideas put forth by Claude Shannon in his landmark 1948 paper establishing the field of information theory; speaking from an engineering perspective, he identified the probability of a message's being chosen from among several alternatives, rather than the message's actual content, as its critical characteristic.
From page 115...
... And so, the effect of Chomsky's claim, together with some negative results for machine learning and a general lack of computing power at the time, was to cause researchers to turn away from empirical approaches and toward knowledge-based approaches where human experts encoded relevant information in computer-usable form. This change in perspective led to several new lines of fundamental, interdisciplinary research.
From page 116...
... Probabilities arise because of the ever-present problem of ambiguity: as mentioned above, several word sequences, such as "your lie cured mother" versus "you like your mother," can give rise to similar spoken output. Therefore, modern speech recognition systems incorporate information both about the acoustic signal and the language behind the signal.
From page 117...
... formalism -- this is a way of analyzing natural language utterances that truly marries deep linguistic information with computer science mechanisms, such as unification and recursive datatypes, for representing and propagating this information throughout the utterance's structure. In sum, although many challenges remain (for instance, while the speech-recognition systems mentioned above are very good at transcription, they are a long way from engaging in true language understanding)
From page 118...
... Martin, 2000, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall. Contributing writers: Andrew Keller, Keith Vander Linden, and Nigel Ward.
From page 119...
... Otherwise, if she has a move leading to a position marked D, then she can force a draw, and the position is labeled with D If all of her moves lead to positions marked B, then this position is a guaranteed win for black (assuming he plays optimally)
From page 120...
... Indeed, because his checkers program was one of the earliest examples of non-numerical computation, Samuel greatly influenced the instruction set of early IBM computers. The logical instructions of these computers were put in at his instigation and were quickly adopted by all computer designers, because they are useful for most non-numerical computation.
From page 121...
... This powerful idea coupled with other mechanisms enabled the search routines to run hundreds or even thousands of times faster than would have otherwise been possible. Besides having sophisticated search capability, Samuel's program was the first program that learned by itself how to perform a task better.
From page 122...
... Researchers realized that the game trees were growing in size at an exponential rate as one looks further ahead in the sequences of possible moves. Computer programs were wasting their time doing a uniform search of every possible move sequence while humans searched only selected paths.
From page 123...
... Why did the uniform search program defy all reason and defeat selected path programs that look much deeper? The answer came from a new rationalization: a program doing uniform search to a fixed depth plays perfectly to that depth.
From page 124...
... might occasionally lead to quality -- a performance that might be at the level of a human, or indeed indistinguishable from that of a human. (In the match between Deep Blue and Kasparov, several of Kasparov's advisors accused IBM of cheating by having human players feeding moves to Deep Blue.)
From page 125...
... For example, computer chess research at IBM demonstrated computing technology that has also been used to attack problems related to computations on environmental issues, modeling financial data, the design of automobiles, and the development of innovative drug therapies. Samuel's ideas on how to get programs to improve their performance by learning have provided a basis for tackling applications as diverse as learning to fly a helicopter, learning to search the Web, or learning to plan operations in a large factory.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.