Skip to main content

Currently Skimming:

Automatic Text Understanding of Content and Text Quality--Ani Nenkova
Pages 49-54

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 49...
... The mismatch matters a great deal because people rely on machines to locate and navigate information sources and increasingly read machine-generated text, for example as machine translations or text summaries. In this presentation I discuss some of the simple and elegant intuitions that have enabled semantic processing in machines, as well as some of the emerging directions in text quality assessment.
From page 50...
... To aid analysis of customer reviews, researchers at Google developed a large lexicon of almost 200,000 positive and negative words and phrases, identified through their similarity to a handful of predefined positive or negative words such as excellent, amazing, bad, horrible. Among the positive phrases in the automatically constructed lexicon were cute, fabulous, top of the line, melt in your mouth; negative examples included subpar, crappy, out of touch, sick to my stomach (Velikovich et al., 2010)
From page 51...
... When reading a specific text, computers also need to resolve what entity in the document is referred to by pronouns such as "he/his," "she/her," and "it/its." Systems are far from perfect but are getting better at this task. Usually pronouns appear in the text near noun phrases, i.e., "the professor prepared his lecture," but in other situations gender and number information is necessary to correctly resolve the pronoun, as in, "John told Mary he had booked the trip." Machines can rather accurately learn the likely gender of names and nouns, again by reading large volumes of text and collecting statistics of co-occurrence.
From page 52...
... A logistic regression classifier, trained on around 2,800 examples of general and specific sentences from instantiation relations, learned to predict the distinc tion incredibly well. On a completely independent set of news articles, five different people were asked to mark each sentence as general or specific.
From page 53...
... . Word co-occurrence statistics and subjective language have also been successful in automatically distinguishing implicit comparison, contingency, and temporal discourse relations (Pitler et al., 2009)
From page 54...
... Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Singapore, August 2–7, 2009, pp.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.