Skip to main content

Currently Skimming:


Pages 115-133

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 115...
... Background ~ will give you a little bit of background because T think it is important to understand (a) where we are coming from, and (b)
From page 116...
... So, we cannot come and say, "Yes, we would like to help you but can you change your distribution a little bit? ~ mean, you know -- if the distribution would have been a little bit different, it would have been much easier for us." The same thing is true for us as computer scientists.
From page 117...
... 117 Tic Myers of Condor So, sort of one message I want to leave with you, something we have learned over the last 1 5 years, is that when we build a distributive computing environment, we have to sort of separate it into three main components, one that represents the clients, the consumer. This is not the trivial part of the equation.
From page 118...
... ............. So, if you look at this notion of the grid, there is the user who wants to JO some analysis, who wants to create some kind of a histogram or scatte~plot or what have you, or ask a question, or generate a million events, and T will show you a little bit about that.
From page 119...
... 119 contributors, in order to make the whole thing work and in order to understand what is in common and what is different.
From page 120...
... 120 So, this is a simplified view of what is going on. These are the phenomena down there that can either be described by the detector or described by a set of models.
From page 121...
... 121 stable state. The other principle that we learned from the physicists, and that is the importance of time.
From page 122...
... 122 the United Kingdom now has invested a huge amount of money in e science, and they want the resources to be in the United Kingdom. All of a sudden, BaBar has more of a presence in the United Kingdom, not because that is what makes sense, but that is where suddenly the money is.
From page 123...
... C>ata/Work Flow Phenomenon -> data: real-time constraints of instruments, processing capacity, storage capabilities and communication bandwidth for Monte Carlo applications bate -> data: multi stage feature and meta-data extraction, indexing and compression. bate -> statistics: select, project and aggregate.
From page 124...
... Whether it is a telescope or a detector, we have to make sure that this data goes in and we cannot lose anything, because this is real data. We also have to deal with the production of data coming from the Monte Carlo production, which is, again, a stream of data coming in.
From page 125...
... 125 Let's assume ~ have an x. ~ want to apply on it an F
From page 126...
... Management · Who gets Parking spacer where, when and for how long · Co-allocation of compute storage and data transfer resources Just in time delivery of input data Timely removal of output data Local" storage capabilities · Easy to manage _,.;, · Cost effective ...
From page 127...
... Uniform framework for defining and managing processing and data placement jobs ^ www cs.wisc.eduicondor ~ So, ~ already talked a little bit about it, because ~ want to move faster to the examples to show you that we can actually do something with all that, but the approach that we have been taking is, first of all, to make data placement first-cIass citizens. That means that when you write an application, when you design a system, make sure that getting space, moving the data, releasing it, is a clear action that is visible from the outside, rather than buried in a script that nobody knows about it and, if it fails, it really doesn't help us much.
From page 128...
... 128 This is sort of the high-level architecture of what we have deployed out there. So, the application is generating a high-level description of what has to be done.
From page 129...
... w~w.cs.wisc.edulcond~ PPO6-MOP US-CMS Test bed .
From page 130...
... 130 MOP Job Stages Stage-in - get the program and its data to a remote site Run - run the job at the remote site Staqe-back- get the program logs back from the remote site Publish - advertise the results so they will be sent to sites that want it QnU.D ~ Clean UD remote site Basically, each of these jobs is a DAG like this, and then we move them all to larger DAGs that include some controls before and after, and that is the way it works. MOP Job Stanes ..~,.
From page 131...
... 131 It Works!
From page 132...
... 132 ;.;;;; Cluster-finding Data Pipeline ~$ 9 axe: ~7 Ha., 'a ~7 ~ ~ ~ / i' EXAM i' ID -/ ....~ I_ ~ ~~ I.' fin. ~9 .~..~ 7 _ Argonne National Laboratory lllllll Small SASS Cluster-Finding DAG ~ Argonne National Laboratory So, this is what we have to do there.
From page 133...
... 133 Size distribution of Sloan Digital Sky Survey Analysis ' tOWO0_ .0000 Galaxy cluster size distribution .~ Chimera Virtual Data System + GriPhyN Virtual Data Toolkit + iVDGL [) ata Grid (many CPUs)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.