Skip to main content

Currently Skimming:


Pages 138-164

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 138...
... Otherwise, ~ think what we will see is this morning, this afternoon and tomorrow, there are many different shapes and forms for this large data problem and large data streams. So, what you are going to hear is something completely different than what you heard this morning.
From page 139...
... T will give you a brief introduction of what sort of network data ~ want to talk about. Then ~ want to go into a little bit of a spiel where ~ want to show you some network data.
From page 141...
... Graphs are a convenient representation of the data and in the applications that T am going to be talking about, we are talking about call graphs. So, every node in a network, the network identifier is a phone number, and the edges we are talking about are phone calls or aggregate summaries of phone calls between telephone numbers.
From page 142...
... So, this gives you an iclea, a little bit, of how many transactors we are seeing on the network, how many transactions, and also gives you a little hint of the sparsity of the graph. if you have a graph that has n nodes, roughly, there are n2 edges.
From page 144...
... 144 This is a different way to view the data. This talk is about the lifetime of the edges.
From page 145...
... 145 AUDIENCE: [Question off microphone.]
From page 146...
... 146 to capture the 90th percentile. So, 90 percent of the nobles on this graph have an in degree of eight or fewer edges, ant!
From page 147...
... 147 legitimate choices for legitimate purposes. Again, you can just define the graph to be the network traffic you see on some period, say, today.
From page 148...
... T just show you some values of 0 that we use for different applications. Tfwe said ~ equals about .85 and you follow that curve down, for our applications, if we do daily processing, that means an hour phone call will last, in our analysis, for roughly about a month.
From page 149...
... By building this redundant structure, it is very easy to go from this down to this. So, it is very easy to traverse this data structure to build subgraphs of arbitrary depth, and literally within a second.
From page 150...
... 150 for that is sort of asymmetry. Amy might be very popular.
From page 151...
... So, the weights typically would be associated with some characteristic of the transaction. So, the weight might be the dollar value of a transaction, length of a phone call, number of bytes of the connection between this IF address and that IP address.
From page 152...
... So, the final thing ~ will talk about is something that we are going to do day in and day out. As the data streams in, we are going to construct a network topology of the data streaming in and maintain that.
From page 153...
... 153 from each of our numbers. They won't see me calling my brother, because we live in the same town and that is a local call.
From page 154...
... 154 ~ may adc! some edges in.
From page 155...
... The way we account for the fact that we are missing data is to put some priors on some of these parameters, and you can clecide which ones you want priors on. Then, once you have those, you can use an N type algorithm to estimate the parameters, ant} this is one of the things we are experimenting with.
From page 156...
... 156 my observer! data.
From page 158...
... 158 Okay, so without further ado, let's talk about the applications. The primary use of these tools is in franc!
From page 159...
... Hind on tm: 'a _ ~ ~ I: ~ ~ ~ 8.~ ~ ~ ~ ~ ~ no A, ' ,~, ! ,, ~ ~ .| So, part of the plan we have is, you know, bad guys don't work in isolation, or you count on the fact that they don't.
From page 160...
... Out here, there were six cases that we presented that had about nine bad guys in it. Over 80 percent of those turned out to be bad.
From page 161...
... The second thing ~ will talk about, and just close on this, is tracking bad guys. You can think about this as account linkage.
From page 162...
... It mimics, T think, a little bit of what goes on in text analysis, some of the scoring methods that are used there. You want to account for the fact that big weights
From page 163...
... 163 associated with edges are good, but if a node is very common, it is not very discriminatory, you want to clown weight it. The fact that both the newbie and the old guy call Lands' Ens!
From page 164...
... 164 Generally speaking, we are getting tens of thousands of new customers on our network a day, tens of thousands of baddies of different sorts, not paying their bills or whatever, a day. So, the fact that we are able to distill this down to less than a thousand things for our hauc!


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.