The National Academies Press

Currently Skimming:

Pages 138-164

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 138... ... Otherwise, ~ think what we will see is this morning, this afternoon and tomorrow, there are many different shapes and forms for this large data problem and large data streams. So, what you are going to hear is something completely different than what you heard this morning. Read the entire page →
From page 139... ... T will give you a brief introduction of what sort of network data ~ want to talk about. Then ~ want to go into a little bit of a spiel where ~ want to show you some network data. Read the entire page →
From page 141... ... Graphs are a convenient representation of the data and in the applications that T am going to be talking about, we are talking about call graphs. So, every node in a network, the network identifier is a phone number, and the edges we are talking about are phone calls or aggregate summaries of phone calls between telephone numbers. Read the entire page →
From page 142... ... So, this gives you an iclea, a little bit, of how many transactors we are seeing on the network, how many transactions, and also gives you a little hint of the sparsity of the graph. if you have a graph that has n nodes, roughly, there are n2 edges. Read the entire page →
From page 144... ... 144 This is a different way to view the data. This talk is about the lifetime of the edges. Read the entire page →
From page 145... ... 145 AUDIENCE: [Question off microphone.] Read the entire page →
From page 146... ... 146 to capture the 90th percentile. So, 90 percent of the nobles on this graph have an in degree of eight or fewer edges, ant! Read the entire page →
From page 147... ... 147 legitimate choices for legitimate purposes. Again, you can just define the graph to be the network traffic you see on some period, say, today. Read the entire page →
From page 148... ... T just show you some values of 0 that we use for different applications. Tfwe said ~ equals about .85 and you follow that curve down, for our applications, if we do daily processing, that means an hour phone call will last, in our analysis, for roughly about a month. Read the entire page →
From page 149... ... By building this redundant structure, it is very easy to go from this down to this. So, it is very easy to traverse this data structure to build subgraphs of arbitrary depth, and literally within a second. Read the entire page →
From page 150... ... 150 for that is sort of asymmetry. Amy might be very popular. Read the entire page →
From page 151... ... So, the weights typically would be associated with some characteristic of the transaction. So, the weight might be the dollar value of a transaction, length of a phone call, number of bytes of the connection between this IF address and that IP address. Read the entire page →
From page 152... ... So, the final thing ~ will talk about is something that we are going to do day in and day out. As the data streams in, we are going to construct a network topology of the data streaming in and maintain that. Read the entire page →
From page 153... ... 153 from each of our numbers. They won't see me calling my brother, because we live in the same town and that is a local call. Read the entire page →
From page 154... ... 154 ~ may adc! some edges in. Read the entire page →
From page 155... ... The way we account for the fact that we are missing data is to put some priors on some of these parameters, and you can clecide which ones you want priors on. Then, once you have those, you can use an N type algorithm to estimate the parameters, ant} this is one of the things we are experimenting with. Read the entire page →
From page 156... ... 156 my observer! data. Read the entire page →
From page 158... ... 158 Okay, so without further ado, let's talk about the applications. The primary use of these tools is in franc! Read the entire page →
From page 159... ... Hind on tm: 'a _ ~ ~ I: ~ ~ ~ 8.~ ~ ~ ~ ~ ~ no A, ' ,~, ! ,, ~ ~ .\| So, part of the plan we have is, you know, bad guys don't work in isolation, or you count on the fact that they don't. Read the entire page →
From page 160... ... Out here, there were six cases that we presented that had about nine bad guys in it. Over 80 percent of those turned out to be bad. Read the entire page →
From page 161... ... The second thing ~ will talk about, and just close on this, is tracking bad guys. You can think about this as account linkage. Read the entire page →
From page 162... ... It mimics, T think, a little bit of what goes on in text analysis, some of the scoring methods that are used there. You want to account for the fact that big weights Read the entire page →
From page 163... ... 163 associated with edges are good, but if a node is very common, it is not very discriminatory, you want to clown weight it. The fact that both the newbie and the old guy call Lands' Ens! Read the entire page →
From page 164... ... 164 Generally speaking, we are getting tens of thousands of new customers on our network a day, tens of thousands of baddies of different sorts, not paying their bills or whatever, a day. So, the fact that we are able to distill this down to less than a thousand things for our hauc! Read the entire page →

From page 138...

... Otherwise, ~ think what we will see is this morning, this afternoon and tomorrow, there are many different shapes and forms for this large data problem and large data streams. So, what you are going to hear is something completely different than what you heard this morning.