Skip to main content

Currently Skimming:


Pages 191-206

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 191...
... Statistical Analysis of Massive Data Streams Committee on Applied & Theoretical Statistics National Academies Washington, DC December 13-14, 2002 John F Elder ~V, Ph.D elders dabmiliinglab cam 0 2002 Bder Reteach, Inc Elder RescArcb, Inc.
From page 192...
... Again, this is on out-of-sample data, so this was after the fact. I don't have shown here what we knew about the training data beforehand.
From page 193...
... 193 it is a whole lot easier. Essentially every Bundling method improves performance .
From page 194...
... -- invent new X data C 20021306r Rc - Huh, Inc 6 Now, all of these have been attempted. ~ am just going to list here a handful of the major ways of combining models, and we have some experts here in a number of these areas.
From page 195...
... This is a nearest neighbor surface representation, where a data point looks for its closest known point by some metric, and takes it answer as its estimate. That gives you similar surfaces to a decision tree, but not rectangular, but convex shapes.
From page 196...
... is a combination of the three network methods. It is possible -- in this case, stepwise regression, neural nets and trees -- combine to give you something worse than any of the individual models.
From page 197...
... Pedro Domingos here got the best paper award a few years ago at the Knowledge, Discovery and Data Mining Conference for highlighting failings of the razor, and highlighting failings of this assumption, including, besides the performance of various kinds of mode} ensemble techniques, showing that, if you built an ensemble and then estimated it -- if you built one mode! that estimated -- that would improve your accuracy over building that one mode!
From page 198...
... 198 counting and penalizing terms. it certainly works for regression, where the number of terms is equal to the number of degrees of freedom that the mode!
From page 199...
... It is certainly more subject to outliers than a median or something, but it is a whole lot less flexible than to changes in the data than a polynomial network or a decision tree or something like that. So, if your modeling procedure can respond to noise that you inject, and responds very happily, then you realize it is an over-fit problem.
From page 200...
... D 2002 Eager Research, WOK 14 So, let me explain a little bit about what generalized degrees of freedom are. With regression, the number of degrees of freedom is the number of terms.
From page 201...
... This is a decision tree, and this is a decision-making mechanism for the data. Naturally, the tree is going to do pretty well on it.
From page 202...
... 202 trees as their source.
From page 203...
... We also did experiments with eight noise variables being added in. So, tree structure depends on two variables, but now you have thrown in eight distracting noise variables.
From page 204...
... It seems to actually increase with decision trees and decrease with neural nets, for instance. By the way, neural nets, on this same problem, had about I.5 degrees of freedom per parameter.
From page 205...
... Again, it was a little bit confusing to me, but the trees, they are only concerned about -- once they partition the data, they are only concerned about their little area, and they are not paying attention to the big, whereas, models that are basically fitting a global model, data everywhere helps, and it tends to rein it in a little bit, is my best guess. It is certainly an interesting area.
From page 206...
... 206 that are more complex in GDF tend to over-fit than things that aren't. So, it is a vote on the side of Occam's razor.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.