The National Academies Press

Currently Skimming:

Pages 347-361

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 347... ... 347 Transcript of Presentation ~ ~ C;enc~ ~~ ~t Omening :~w I::: S~ P - ~~ I:h~ If::~p�~~:~ ha l1~= .~ ~~g Id, ~4,~ MR. DOMINGOS: My talk is about a general framework for mining data streams. Read the entire page →
From page 348... ... In essence, we have infinite data. How would we modify any of the algorithms if we actually had Internet data available, because that is what we have in the data stream. Read the entire page →
From page 349... ... However, to our amazement, we found that we were actually able to satisfy all those criteria. in fact, what we now have is, we have a framework that meets all those criteria, and we have successfully applied it to several algorithms, including decision tree learning, network learning, image clustering. Read the entire page →
From page 350... ... ~ will just focus one the most salient ones. Those are the problems with trying to ensure that you learn the same model, in real time, as quickly as possible, that you would run on infinite data, and a little bit at the end about what happens when the data-generating process is not sufficient. Read the entire page →
From page 351... ... Now, in a data stream setting, that is not going to work because you would never get past the first step, because you would wait forever for all your data to arrive. So, what you need to do is, you need to make the decision about how much data to use in each step. Read the entire page →
From page 352... ... So, we figure out how the running time of the algorithm varies with the number of examples that you use at each step. Part two, we divide the upper bound on the loss between the model you get with finite data and the model that you would get with infinite data, as a function of the number of examples that you use when you are doing this with finite data. Read the entire page →
From page 353... ... it, _ ~ ~ � ,1 ~ � 1 ~ ~~ ~ � 1 1 � �~1 ~1 � � 1 Notice that, In some sense, what we are effectively doing with this Is learning from infinite data in finite time, because we are getting, in finite time, the same results with the same tolerance that you would get if you waited forever, until your data stream gave you infinite data. So, that is our general idea. Read the entire page →
From page 354... ... How are we going to compare the decision tree we are building with finite data with the decision tree that we would build with infinite data. You can do this in many ways. Read the entire page →
From page 355... ... Then, the information gain on a linear or any other matters, we compute it for each attribute. Then, if the data of the best attribute is better than the data of the second attribute by at least epsilon, when epsilon is a function of this 6, then at this point we know that, with probability 1 - 6, we are picking the same attribute that we would be picking if we waited until we saw infinite data. Read the entire page →
From page 356... ... lp I, ~; You will recall that in our last measure there was some disagreement between some trees, and that T define as the probability, that the path of the example through the first tree differs from the path that the example takes in the second. The two trees that ~ am going to compare is the decision tree that ~ learned using the algorithm that ~ just saw, with parameter 6, and the decision tree that T would learn in batch mode that T would learn using an ordinary decision tree algorithm, with infinite memory and infinite time, waiting to see infinite samples. Read the entire page →
From page 357... ... So, if you take this expression and you sum it, you just come out with the ~ that we have. Like ~ said, there is this nice corollary that applies to, in finite time, basically what YOU net is a subtree of the decision tree that YOU would get with infinite data, again, with at most this disagreement. Read the entire page →
From page 358... ... Let's say that your leaf probability is ~ percent, which basically means that your tree is huge, because only ~ percent of your examples runs into a leaf at any given level. So, this is a very large tree. Read the entire page →
From page 360... ... For an easy explanation, ~ am just now talking about how we do it in the context of decision trees. So, the standard technique to handle time changing data is to use a sliding window, or use some kind of weighted behavior examples. Read the entire page →
From page 361... ... You know, the basic idea behind this framework is to minimize the amount of time that you need to grow a model, or you build a mode} on the data stream subject to guarantees that that mode} is going to be almost the same as you get with infinite data. We have applied it to a wide variety of learning algorithms by now, and we have actually developed now in a library that hopefully we will make available in a few months or a year, that anybody who is interested in doing classification or clustering or regression estimation can use these primitives. Read the entire page →

From page 347...

... 347 Transcript of Presentation ~ ~ C;enc~ ~~ ~t Omening :~w I::: S~ P - ~~ I:h~ If::~p�~~:~ ha l1~= .~ ~~g Id, ~4,~ MR. DOMINGOS: My talk is about a general framework for mining data streams.

Read the entire page →

From page 348...

... In essence, we have infinite data. How would we modify any of the algorithms if we actually had Internet data available, because that is what we have in the data stream.

Read the entire page →

From page 349...

... However, to our amazement, we found that we were actually able to satisfy all those criteria. in fact, what we now have is, we have a framework that meets all those criteria, and we have successfully applied it to several algorithms, including decision tree learning, network learning, image clustering.

Read the entire page →

From page 350...

... ~ will just focus one the most salient ones. Those are the problems with trying to ensure that you learn the same model, in real time, as quickly as possible, that you would run on infinite data, and a little bit at the end about what happens when the data-generating process is not sufficient.

Read the entire page →

From page 351...

... Now, in a data stream setting, that is not going to work because you would never get past the first step, because you would wait forever for all your data to arrive. So, what you need to do is, you need to make the decision about how much data to use in each step.

Read the entire page →

From page 352...

... So, we figure out how the running time of the algorithm varies with the number of examples that you use at each step. Part two, we divide the upper bound on the loss between the model you get with finite data and the model that you would get with infinite data, as a function of the number of examples that you use when you are doing this with finite data.

Read the entire page →

From page 353...

... it, _ ~ ~ � ,1 ~ � 1 ~ ~~ ~ � 1 1 � �~1 ~1 � � 1 Notice that, In some sense, what we are effectively doing with this Is learning from infinite data in finite time, because we are getting, in finite time, the same results with the same tolerance that you would get if you waited forever, until your data stream gave you infinite data. So, that is our general idea.

Read the entire page →

From page 354...

... How are we going to compare the decision tree we are building with finite data with the decision tree that we would build with infinite data. You can do this in many ways.

Read the entire page →

From page 355...

... Then, the information gain on a linear or any other matters, we compute it for each attribute. Then, if the data of the best attribute is better than the data of the second attribute by at least epsilon, when epsilon is a function of this 6, then at this point we know that, with probability 1 - 6, we are picking the same attribute that we would be picking if we waited until we saw infinite data.

Read the entire page →

From page 356...

... lp I, ~; You will recall that in our last measure there was some disagreement between some trees, and that T define as the probability, that the path of the example through the first tree differs from the path that the example takes in the second. The two trees that ~ am going to compare is the decision tree that ~ learned using the algorithm that ~ just saw, with parameter 6, and the decision tree that T would learn in batch mode that T would learn using an ordinary decision tree algorithm, with infinite memory and infinite time, waiting to see infinite samples.

Read the entire page →

From page 357...

... So, if you take this expression and you sum it, you just come out with the ~ that we have. Like ~ said, there is this nice corollary that applies to, in finite time, basically what YOU net is a subtree of the decision tree that YOU would get with infinite data, again, with at most this disagreement.

Read the entire page →

From page 358...

... Let's say that your leaf probability is ~ percent, which basically means that your tree is huge, because only ~ percent of your examples runs into a leaf at any given level. So, this is a very large tree.

Read the entire page →

From page 360...

... For an easy explanation, ~ am just now talking about how we do it in the context of decision trees. So, the standard technique to handle time changing data is to use a sliding window, or use some kind of weighted behavior examples.

Read the entire page →

From page 361...

... You know, the basic idea behind this framework is to minimize the amount of time that you need to grow a model, or you build a mode} on the data stream subject to guarantees that that mode} is going to be almost the same as you get with infinite data. We have applied it to a wide variety of learning algorithms by now, and we have actually developed now in a library that hopefully we will make available in a few months or a year, that anybody who is interested in doing classification or clustering or regression estimation can use these primitives.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.