Skip to main content

Currently Skimming:


Pages 281-294

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 281...
... 281 Transcript of Presentation Io:~ 9 ~ ~ .~.~.~,~,~.~.~.~.~.~..~ ~..~.~.~.~.~.~ - , ,,,~:~:~,~:~'..~.~.~.~ .
From page 282...
... 282 you can see a lot of what is going on. This is the city here.
From page 283...
... This failure of tools happens not only on streaming data or massive data, but ordinary data sets of a size you could presumably fit in memory will cause the computers to die with the tools that we typically use. Then, there is another type of complexity that is a whole lot slipperier.
From page 284...
... 284 of the information to the user, and that is a tricky thing, a very slippery thing. We spend way too much time handling data.
From page 285...
... 285 number of times that the word signature appears, that is a pretty low level of information about the objects in question. It turns out, you put that together and you can start to do useful data analysis.
From page 286...
... 286 way, it floes show that there is some potential for mapping there. The reason you might care is that maybe you don't speak one of those languages.
From page 287...
... 287 vector, and then just show it and say, well, there is a label you can slap on that tape that gives you some idea of the content in a quick and dirty fashion. ~.~.~.~ ~ ,k i~::+X:~ The idea just goes on.
From page 288...
... look at it.
From page 289...
... Then, it is streaming data, and we have kind of been challenged by data sets on the order of the size of memory in our computer, given our current tools. So, streaming data is a whole other level of difficulty.
From page 291...
... 291 Here are some scanned Xeroxes from Cover and Joy's information theory book, showing what some of these models actually do from a generative point of view. if you just start thinking about what an e-gram might mean, well, it kind of goes with this Markov mode} thing.
From page 292...
... .. ::::::: :::: ~Y=~::,,,2.~ Even taking that advice, and just working with, say, a 1 00-megabyte data set in a 500-megabyte work station, and the kinds of tools that we typically use, stuff happened.
From page 293...
... 293 probably some ~ don't even know about. One of our spin off companies took some time to worry about keeping good control, the amount of RAM used in their calculations, but there is theory that you could do, too.
From page 294...
... So, we had an explicit data structure that represented that type of operation. ~ hope to use that as a basis for lots of other algorithms, though.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.