3
Identifying Priorities

Ideally, researchers would like to have sequenced the complete genomes of every animal of interest. In practice, however, that is not possible. The worldwide sequencing capacity is, according to the workshop participants, enough to sequence a complete mammalian genome every four to eight months, given that entire full-time capacity was devoted to one species. So in theory researchers could sequence the genomes of the major domesticated animals of interest—cattle, pigs, dogs, cats, horses, sheep, chickens—within a few years. But the reality is that they likely will have to settle for much less. Moreover, it must be noted that a completely sequenced genome typically is preceded by a draft sequence, and draft sequences can vary in the extent of their completeness and quality.

“The problem is the price tag” explained Stephen O’Brien. It’s very expensive to sequence a mammalian genome and estimated costs range from fifty to as much as one hundred million dollars. Mark Guyer of the National Institutes of Health (NIH) echoed O’Brien’s point: “We know how to build sequencing capacity these days. It’s not that difficult; it just takes money. The question is, if you want the genomes of domestic animals of agricultural or other importance sequenced, where’s the money going to come from?”

So it is necessary to identify priorities, regarding which domestic animal genomes should be sequenced first, and how well each should be sequenced. Is it always important to sequence the complete genome, for instance, or is it possible with some species to get by with a partial genome sequence, choosing certain parts of the DNA and ignoring others? To obtain the greatest accuracy, it is necessary to repeat the sequencing as many as six or eight times, and each replication adds to the overall cost. The workshop participants were asked to consider how such priorities might be set, taking into account not



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary 3 Identifying Priorities Ideally, researchers would like to have sequenced the complete genomes of every animal of interest. In practice, however, that is not possible. The worldwide sequencing capacity is, according to the workshop participants, enough to sequence a complete mammalian genome every four to eight months, given that entire full-time capacity was devoted to one species. So in theory researchers could sequence the genomes of the major domesticated animals of interest—cattle, pigs, dogs, cats, horses, sheep, chickens—within a few years. But the reality is that they likely will have to settle for much less. Moreover, it must be noted that a completely sequenced genome typically is preceded by a draft sequence, and draft sequences can vary in the extent of their completeness and quality. “The problem is the price tag” explained Stephen O’Brien. It’s very expensive to sequence a mammalian genome and estimated costs range from fifty to as much as one hundred million dollars. Mark Guyer of the National Institutes of Health (NIH) echoed O’Brien’s point: “We know how to build sequencing capacity these days. It’s not that difficult; it just takes money. The question is, if you want the genomes of domestic animals of agricultural or other importance sequenced, where’s the money going to come from?” So it is necessary to identify priorities, regarding which domestic animal genomes should be sequenced first, and how well each should be sequenced. Is it always important to sequence the complete genome, for instance, or is it possible with some species to get by with a partial genome sequence, choosing certain parts of the DNA and ignoring others? To obtain the greatest accuracy, it is necessary to repeat the sequencing as many as six or eight times, and each replication adds to the overall cost. The workshop participants were asked to consider how such priorities might be set, taking into account not

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary just scientific factors but also the practical aspects, such as how likely it is that funding can be secured for sequencing a genome. FINDING THE BALANCE BETWEEN SCIENTIFIC INTEREST AND PRACTICAL NEEDS One of the most important things in identifying which genomes to sequence, O’Brien said, is to maintain a balance between the purely scientific interest in various genomes and the practical benefits that can be gained from sequencing them. “If we’re going to get the resources for sequencing, we cannot be so academic and ideal as to ignore the fact that it is taxpayers or pharmaceutical companies who will have to write tens of millions of dollars in checks. We need to have benefits that will pay them back for their investment. So there’s always going to have to be a balance between scientific relevance and things that have a payoff in other ways.” Medical Relevance In terms of funding potential, the most important practical criterion is medical relevance, O’Brien noted, since that is what the NIH is most interested in, and it is the NIH that to date has been the major source of funds for genome sequencing. To appeal to the NIH, researchers interested in sequencing the genomes of domestic animals will have to consider which work will address issues of human health. “The most important aspect, then,” O’Brien said, “is what can we do with a species?” O’Brien noted how the mouse, for example, was selected in part for its versatility for genetic manipulation. Scientists can develop lines or families of mice by inducing mutations by “knocking out” genes (Knockouts are the deactivation of specific genes, and are often created in laboratory organisms such as yeast or mice so that scientists can study the knockout organism as a model for a particular disease). They also can be used to develop stem cells to be delivered for medical research. Due to their small size and fast generation time, mice are easy to breed and sustain in captivity and are inexpensive to maintain in laboratories (compared to other mammals). These features allow researchers to derive inbred lines for studies of genetic disorders. Mice also are used for drug and vaccine trials. More recently, researchers have been able to develop transgenic versions for even more research and investigation. “Which of those things can we say about cattle?” asked O’Brien. Which of those things can we say about the elephant? Which of those things can we say about the other species we nominate?” “In these terms, the pig genome is a natural choice for sequencing because growth and development in the pig follows a very similar path to

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary growth and development in a human. It is standard to dissect a fetal pig in high school and college biology courses, for instance, because the organs of the fetal pig are arranged in a way that is anatomically similar to those of a human. There also are good arguments for sequencing the chicken. A great deal of classic embryology has been done on chicken embryos, for example, so that there is a large body of knowledge available for combining with knowledge about the chicken genome.” O’Brien continued. A second reason for choosing the chicken, O’Brien said, concerns the major histocompatibility complex (MHC), an important family of genes involved in the body’s immune system. “In terms of comparative biology, we’ve learned a lot already about the MHC because the MHC of the bird is a minimal form,” he said. “It’s something like 19 genes compared to 250 in the humans.” Agricultural Relevance After medical relevance, a second practical consideration in choosing which genome to sequence is the agricultural value of the animal. “Clearly,” O’Brien noted, “the things we eat are important to humans, and we need to have better knowledge of some of these species, including the cattle, pigs and sheep.” If one focuses strictly on agricultural value, a different ordering of priorities emerges. In purely economic terms, cattle, pigs, and chickens are the most important species to sequence, followed by horses and sheep. Cats and dogs also must be taken into account because of the amount spent by their owners on keeping them healthy. Basic Scientific and Evolutionary Considerations As discussed by O’Brien, evolutionary considerations form the third set of criteria. They have implications both for the task of annotating the human genome, which will have many direct medical benefits, and for a better understanding of how species evolved over tens of millions of years, which, as an issue of basic science, will have more indirect benefits in the future. One key evolutionary factor to consider in choosing which genomes to sequence is how closely related a species is to other species that have been or will be sequenced. “The evolutionary aspect makes it important to cross a range of vertebrates,” said one of the participants, “because we don’t know which ones are going to be important, which ones are going to be most informative.” O’Brien showed a diagram of the mammalian family tree, as determined by a comparison of corresponding stretches of DNA among seventy different mammalian species, work done by researchers at the University of California, Riverside. He said, “there are four major mammalian radiations that have happened since the divergence of the placental mammal away from the

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary marsupials on the order of a hundred million years ago. Thus the primary placental mammals are sorted into four major clades, or groups” (see Box 3-1). Box 3-1 Phylogenetic Relationships Among Modern Orders of Placental Mammals Cladistics is a system of arranging taxa by the analysis of primitive and derived characteristics, so that the arrangement will reflect a pattern of descent among the species in question. Cladistics attempt to determine which characteristics of the organisms are specialized, derived ones that truly reflect recent common descent and it emphasizes such features, which are called "shared derived characters", in classification. Following are the four clades with examples of the animals found within them. Reprinted from O'Brien, S. J., E. Eizirik, and W. J. Murphy. 2001. On Choosing Mammalian Genomes for Sequencing. Science. 292:2264–2266. Presently, even-toed ungulates and cetaceans are being categorized into a fifth clade, Cetartiodactyla, on the premise that they are closely related. See J. G. M. Thewissen, E. M. Williams, L. J. Roe, and S. T. Hussain, 2001. Skeletons of Terrestrial Cetaceans and the Relationship of Whales to Artiodactyls. Nature 413:281; and Kimball’s Biology Pages, available online at www.ultranet.com/~jkimball/BiologyPages/V/Vertebrates.html.

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary “The first is a group called Afrotheria, which consists of the elephants, the manatees and the elephant shrews. This was an African group of species.” “The second was a South American group, Xenarthra, which includes the sloths, anteaters and armadillos.” “The third major clade, Euarchontoglires, includes the rodents, rabbits, primates, tree shrews and flying lemurs.” “The fourth group has the rest of the species. It contains all your favorite species, such as the whales and the even-toed ungulates, the horses and the carnivores, as well as the primitive tree shrews and the bats. This fourth group, called Laurasiatheria, is a widely dispersed group and includes all the barnyard animals and the carnivores.” To understand the evolution of mammals, O’Brien said, researchers would like to have the genome of at least one representative from each of the four clades. That has not yet happened. “The three species that have already been nominated for full genome sequencing—human, mouse and rat—are all nested in a single clade,” he noted. “That means that three of the four mammalian major clades are unrepresented entirely.” That lack, O’Brien argued, offers a strong argument for sequencing at least one or two domestic animals as representatives of the fourth clade, such as cattle and perhaps one of the members of the order Carnivora —either a dog or a cat. Evolutionary biologists also would like the sequences of representatives from the first two clades—say, the elephant and the armadillo—but that interest does not help researchers choose among domestic animals, since all of them sit within the fourth clade. There are other evolutionary considerations that do distinguish among domestic animals, however. “Many species have a slow or conserved rate of evolution,” O’Brien said. That is, the overall structure of their genomes has changed relatively little from their distant ancestors. This is true, for instance, of cats and humans. “But there are other species that have a three- or four-fold reorganization relative to the primitive mammalian genotype.” This is true of mice and rats, dogs, and gibbons. “That shuffling of the genome just seems to happen once in a while in a backdrop of very slow genome evolution. Why it happens is an open question, but the point is, some species have a conservative genome and other species have a very shuffled, derived genome that is punctuated by global reorganization.” It would be valuable to sequence genomes from both types of species. “In addition to that,” he said, “some species have highly derived morphometric (body-proportion) characteristics, such as the shrews. The primitive mammals looked much like today’s insectivores. It looked like a little rodent.” Most other mammals look little like their distant ancestors. In

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary choosing genomes to sequence, researchers might want to consider how primitive a species’ characteristics are. Other Criteria In addition to the criteria of medical relevance, agricultural value, and evolutionary significance, the workshop participants offered a variety of other criteria that could be considered in deciding which genomes to sequence. “Genome size is an issue,” O’Brien said, “in a sense that it’s a little bit cheaper to do a bat, which is on the order of 1.72 billion base pairs, which is a little bit over half the size of the human genome.” “It’s important for this information to be useful,” added Ernest Bailey of the University of Kentucky. “You need to have a community of scientists that is prepared to use it. The elephant would be fascinating to do, but I don’t know how many scientists will use that information.” Daphne Preuss of the University of Chicago suggested that species be chosen based on how easy it would be to use their genomes to trace the causes of various genetic diseases. “In the human genetics community,” she said, “gene discovery has been fueled by isolated populations that have discrete genetic disorders. That’s really been a key to driving gene discovery forward. I think the species chosen should have genetic diversity as well as inbred populations that reveal diseases. Unlike a wild species like the elephant, where the identifiable disease states would be very limited, domesticated animals are really valuable in that way.” Another consideration, Preuss suggested, should be the value of different genomes in helping researchers to understand gene expression and regulation. “In the human genome project,” she said, “it was surprising to everyone that there were so few genes, and so a lot of people are now focusing on gene regulation. We’ve got to understand these regulatory sequences to understand the array of gene expression. If you go too far away in evolution, you start to lose the ability to compare regulatory sequences. But there is also value in going further away. So, in evolutionary terms, we need some species that are close and some that are farther apart.” Joachim Messing, director of the Institute of Microbiology at Rutgers University, added that researchers should keep in mind how far along genomics research already has come for various species. “We should think a little bit about entry points,” he said, “that is, with what information is available for a particular genome. Is there already a genetic map? Are there Expressed Sequence Tags (ESTs – stretches of DNA used to identify functional genes)? And so on.” Studies of ESTs, for example, can be done with relative ease at a fairly low cost, and they can provide valuable information when annotating genomic-based sequencing.

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary There indeed is a difference among domestic animals in how far along genetic mapping and sequencing work has come, said Steven Kappes. For example, the number of ESTs varies from species to species. “Within GenBank (the major repository of genetic sequences in the United States), cattle have the most among farm animals, with 230,000 sequences. This is fifth in GenBank behind humans, mouse, rat, and Drosophila (the fruit fly widely used in genetic research). Pigs are about half of that. Chickens have only 44,000 (that had been discovered in earlier research), but a United Kingdom effort has nearly finished with sequencing 300,000, and those will become public, so that will dramatically increase the numbers for the chicken. On sheep it’s relatively few, and only a little more on horse.” The story is similar for other resources used in genetic mapping and sequencing; cattle, pigs, and chickens are most advanced, while horses and sheep lag behind. Thus mapping the sequences of the first three species could be finished more quickly. It might not be necessary to do a complete genome for each species, Messing said, and so researchers who are prioritizing genome projects should consider whether to sequence the entire genome for a particular animal. “We should look at the extent of sequence coverage that we want to allocate to a particular project,” he said, “either a complete sequence or to go for targeted regions, which I think also has great value in terms of comparative genomics” (see Box 3-2). Box 3-2 Targeted Sequencing Sequencing an entire mammalian genome is very expensive. Thus far, the quality of a draft sequenced genome—by using the “whole-genome shotgun approach”—has depended upon costs. Because basic “1X” coverage of a genome can cost roughly $15 million, and because 6X coverage typically is preferred, a draft sequence alone can cost nearly $100 million, and an additional $90–125 million would be required to improve the draft to a high quality finished version. Eric Green of the National Institutes of Health, however, suggested an alternative: targeted sequencing, or sequencing only certain portions of a genome that are of particular interest. “A 1X shotgun sequence of a mammalian genome costs something on the order of $10 to $15 million,” he said. “I think that that is a fairly accurate number. So if you’re thinking about 6X or 8X coverage, you quickly approach $100 million per genome, at least by current technologies. And if they go for $100 million a crack, there is going to be a limited set of organisms that can be subjected to global sequencing.” “I think it is really important to recognize,” he continued, “that so far the discussion has been a little bit, all, or none. It’s either you’re going to get your organism up on that list and get it sequenced or else it was going to forever

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary be lost. I don’t think that’s the case at all. And so I want to tell you there is a great, great value and great future in targeted sequencing efforts.” Green described work done in his laboratory that involved comparing corresponding stretches of DNA from the mouse and ten other species. The technique demands the use of bacterial artificial chromosomes, or BACs. A BAC is a long stretch of DNA from a human or another organism that is put into a bacterium in the form of an artificial chromosome, so that when the bacterium makes copies of itself, each copy has its own identical BAC, complete with the DNA of interest. In this way, researchers easily can work with these long stretches of DNA, making copies and comparing them with other bits of DNA. If the entire genome is thought of as an encyclopedia containing the information necessary to build an organism, Green said, each BAC corresponds to a single page in the encyclopedia. Working with these BACs, Green and coworkers were able to do a comparative genome analysis on five different stretches of DNA taken from eleven species. In doing so, he derived a great deal of information about the relationships among the species while needing only small pieces of their genomes, not the entire thing. “Certainly this provides a greater potential for exploring a wider array of genomes,” he said. “You don’t have to invest $100 million to get a little bit of sequence information about a particular region of a particular organism.” In the case of domestic animals, he suggested that targeted sequencing would make it possible to investigate stretches of DNA that contain genes of interest without sequencing the entire genome. “In these cases you start with some chromosomal region of interest, for example, some region of the livestock genome that may have a quantitative trait locus (QTL) that you’re interested in studying. You’re going to want to isolate and map that in overlapping BAC clones and go through and systematically sequence each of those individual BACs.” Such a strategy demands that collections of BACs be available for the species under study, but, said Green, that does not appear to be a problem with domestic animals. “The good news here is that both the NIH and the National Science Foundation (NSF) have realized the importance of this, and, as a result, they are now either currently funding or soon will be funding major efforts to generate dozens and dozens of new BAC libraries in the coming years.” One drawback associated with targeted sequencing is that it assumes beforehand which genes are deemed most important. Hence, the criteria for selecting genes remain subjective. Although some suggestions for criteria were discussed during the workshop, the participants did not discuss a uniform set of standards. In addition, Kappes noted, it will be necessary to decide how much redundancy is needed in the genome sequence for each animal. (“Fold

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary redundancy” reflects the average number of times each base pair has been sequenced from independent DNA clones of the bacterium Escherichia coli.) Will six- or eight-fold coverage be needed, or will it be possible in some cases to get by on much less? (The number of ‘folds’ [i.e., 6-fold or 6X] refers to the number of bases sequenced relative to the genome size [in base pairs] of an organism. Depending on the number of genes estimated to be in the genome of interest, researchers have to decide how much coverage they need to be statistically secure in their efforts to identify genes.) “I think 6X hits it about right for eutharian primate non-rodent species. The other species within that clade we do not need to do as much, so we could back off on that. I am a little bit careful in saying that because Claire Fraser’s group has shown very well that you miss a lot of things if you only do a rough draft so I think the jury is still out. But we obviously have to be realistic in what it costs and how many genomes we can do. I think we should start out with a moderate coverage in some of these species and then back off as we go down the line.” (Lower fold redundancy eventually might be possible due to the increased bioinformatics tools and the increased number of comparative data sets from many species. Hence, one might expect that new tools and increased data would allow one to rely upon less sequence data.) WHICH ANIMAL GENOMES SHOULD BE CONSIDERED FOR SEQUENCING? The workshop participants were not asked to rank the genomes of domestic animals from most important to least, but they were asked which genomes should be put under consideration. A number of participants expressed the view that cattle, pig, and, perhaps, chicken genomes should be put at the top of the list for a variety of reasons, including agricultural value and helpfulness in understanding the human genome better. “As far as farm animals are concerned,” O’Brien said, “it would be hard not to put the cow up on the top of the list with the people that are here at this meeting, but the people interested in the pig genome make a pretty strong argument, too. They have good opinions. Those are the two front-runners that I see.... the second tier would certainly be chickens, sheep, perhaps even horses.” Max Rothschild, of Iowa State University, on the other hand, argued for putting chickens in the top group. “It seems to me that pigs, cattle and chickens are the three highest for domestic species,” he said. “But you can argue on two grounds why the cow ought to be third on that list. It’s only first on the list because it’s better organized but from a financial standpoint, most of the meat consumed in the world is either poultry or pork. From a health standpoint, chickens and pigs are more important. Certainly, pigs from a

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary xenotransplantation perspective and certainly chickens from what we could learn about immunology and some other things.” There also was some disagreement about horses. “I’m a horse guy,” Douglas Antczak of Cornell University’s College of Veterinary Medicine said, “and I can’t justify working on horses in genetics at all. They have only one offspring per year, you can’t superovulate them, they’re very large, they fight, and they kick.” But Ernest Bailey offered a different perspective: “The horse is economically important in the United States, although not because it produces food. It’s important from a recreational standpoint. Many people regard it as a companion animal, but I think the racing industry is quite large. There’s a lot of money there and a lot of money spent on health and animals. “Furthermore, the horse is a separate family. It’s a member of a family that has ten different species, all with different chromosome numbers. There’s been a rapid chromosome evolution over a period of about two million years. One of the experiments that will be interesting in the long run is to look at the reasons for the chromosome evolution in the horse. There are ideas that gene duplication is responsible. Do these kinds of gene duplications exist in the different species?” Assuming that the cattle genome is sequenced, Kappes said, that would take much of the pressure off the need to sequence the sheep genome. “The sheep genome is very similar to cattle genome. There are only three different changes in chromosomal organization between cattle and sheep and, basically, when we find a gene in sheep on this particular location, we find the similar gene in cattle.” As for domestic companion animals, the two natural choices are dogs and cats, but there was a difference of opinion as to which genome would be more useful to sequence. “The cat has been a favorite in my laboratory for almost twenty years,” O’Brien said, suggesting that it would be the better choice for a number of reasons. “It’s a model for many human hereditary diseases, such as hemophilia, as well as several infectious diseases, such as leukemia. There’s an acquired immune deficiency syndrome (AIDS) virus in cats, feline immunodeficiency virus (FIV). There’s extensive medical surveillance and literature. Finally, the human genome and the cat genome are both very primitive for their respective orders. That is to say, the ancestor of carnivores look a lot like a cat, and the ancestor of primates looks a lot like a human.” An audience member echoed O’Brien’s arguments: “One of my preferences is the cat, and that’s because of it being such a good model for FIV and human immunodefiency virus (HIV) and because it is an animal we can do drug therapies on.” Vivek Kapur, University of Minnesota, noted, “Cats share a very large number of common pathogens with humans, as do pigs.” But Anczak offered a counterpoint. “I’d like to speak for the dog over the cat,” he said. “A dog has all the advantages that have already been

OCR for page 9
Exploring Horizons for Domestic Animal Genomics: Workshop Summary mentioned in the cat and, in addition, it has behavioral and morphologic traits of interest, which the cats don’t have.” Finally, Harris Lewin proposed a couple of longshot candidates for sequencing. “I will put a plug in here for fish, because fish really are an incredibly powerful tool for genetic mapping. You can get down to a half centimorgan (unit for measuring the recombination frequency in DNA) resolution and there are several groups around the world that are interested in the fine mapping (high resolution genome mapping) of traits with fish.” “I’ll add that the honeybee is an incredibly interesting model for fine mapping, because it has a very high recombination rate. One centimorgan is about 50 kilobases (50,000 nucleotides), which is an incredible tool. It has a very short generation interval as well, which is a powerful tool for fine mapping of quantitative trait loci. I think that we may see a big push to sequence the honeybee genome as well in the next few years.”