Skip to main content

Currently Skimming:


Pages 327-341

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 327...
... DOMINGOS: My talk is about a general framework for mining data streams. This is joint work that ~ have done with Jeff Hockney at the Department of Nuclear Science and Engineering at the University of Washington.
From page 328...
... In the data scheme, it is open ended. In essence, we have infinite data.
From page 329...
... It is very easy to satisfy those requirements if you compromise the quality of the results that you are producing. The whole challenge is to actually guarantee those things while producing, say, decision trees or things that are not different from the ones that you get if you ran the algorithms that we know, and that we have, and whose properties we know.
From page 330...
... Those are the problems with trying to ensure that you learn the same model, in real time, as quickly as possible, that you would run on infinite data, and a little bit at the end about what happens when the data-generating process is not sufficient.
From page 331...
... In the traditional setting, each of these steps, in general, looks at the whole data before doing what it wants to do. Now, in a data stream setting, that is not going to work because you would never get past the first step, because you would wait forever for all your data to arrive.
From page 332...
... Part two, we (livide the upper bound on the loss between the motley you get with finite data and the mode! that you would get with infinite data, as a function of the number of examples that you use when you are (loin" this with finite data.
From page 333...
... We studied industry decision trees, and then we did it for clustering, and generally we saw that we could generalize this to many different things. Notice that, in some sense, what we are effectively cloing with this is learning from infinite data in finite time, because we are getting, in finite time, the same results with the same tolerance that you would get if you waited forever, until your data stream gave you infinite data.
From page 334...
... How are we going to compare the decision tree we are building with finite data with the decision tree that we would build with infinite data. You can do this in many ways.
From page 335...
... Then, if the data of the best attribute is better than the data of the second attribute by at least epsilon, when epsilon is a function of this 6, then at this point we know that, with probability ~ - 6, we are picking the same attribute that we would be picking if we waited until we saw infinite data. So, at that point, we just split on that leaf, and now we start sending the new examples in the stream down to the children of that leaf So, we start with the root.
From page 336...
... The two trees that ~ am going to compare is the decision tree that T learned using the algorithm that ~ just saw, with parameter 6, and the decision tree that ~ would learn in batch mode that T would learn using an ordinary decision tree algorithm, with infinite memory and infinite time, waiting to see infinite samples. Here is what we can guarantee.
From page 337...
... Like ~ said, there is this nice corollary that applies to, in finite time, basically what you get is a subtree of the decision tree that you would get with infinite data, again, with at most this disagreement. The bottom line here, though, is the following, that people in competition line theory have a lot of bounds that have this flavor, but they are incredibly loose.
From page 338...
... Let's say that your leaf probability is ~ percent, which basically means that your tree is huge, because only ~ percent of your examples runs into a leaf at any given level. So, this is a very large tree.
From page 339...
... 339 Spot' I~$ - ~~g Twn - ~~ of Aced ~ t~-I3~c=9 _'—,ph~ c~ A.siig~ brow 0~ - m~f urn i:~r$t`~r~N :~t ~~ '~ iN,r~ct:i':~ ~ ~~:p~ `~xi ;: I:, If: 0~= di~!
From page 340...
... For an easy explanation, ~ am just now talking about how we clo it in the context of decision trees. So, the standard technique to handle time changing data is to use a sliding window, or use some kind of weighted behavior examples.
From page 341...
... is going to be almost the same as you get with infinite data. We have applied it to a wale variety of learning algorithms by now, and we have actually developed now in a library that hopefully we will make available in a few months or a year, that anybody who is interested in doing classification or clustering or regression estimation can use these primitives.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.