9
Bioinformatics

Although possession of the viral genome sequence does not yield total knowledge about the integrated biology, virulence, or transmissibility of a virus, much can be learned from a variety of studies based on these sequences. An important question is the variability of genetic information (DNA sequence) among different isolates of the virus obtained from patients in different geographic locations, although available collections are not well-ordered sets of variola virus strains. Multiple clones or PCR products should be sequenced to assess diversity. Significant differences are known to exist between the sequences of variola major and minor, but the extent of the variation and the importance of the identified differences for virulence in each of the two subspecies or even within individual isolates have not been determined [36]. The extent and consistency of sequence variability might provide essential clues to the pathogenesis, virulence, and evolution of the virus and the nature of the infection. For example, a recent outbreak of monkeypox exhibited somewhat greater human-to-human transmission than had been the case in the past. However, preliminary results from sequencing of DNA fragments of monkeypox virus isolates obtained at various times since 1970 suggest that the virus has changed very little over this period. Thus at present, there is no clear evidence that the rate of human-to-human transmission of monkeypox is likely to increase [37]. DNA sequence information from a characteristic set of variola virus isolates could enhance our capability to assess whether human monkeypox is evolving transmission characteristics similar to those of smallpox. Variola virus stocks need to be retained until a sufficient number have been cloned, or PCR amplifications have been obtained and analyzed.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 63
--> 9 Bioinformatics Although possession of the viral genome sequence does not yield total knowledge about the integrated biology, virulence, or transmissibility of a virus, much can be learned from a variety of studies based on these sequences. An important question is the variability of genetic information (DNA sequence) among different isolates of the virus obtained from patients in different geographic locations, although available collections are not well-ordered sets of variola virus strains. Multiple clones or PCR products should be sequenced to assess diversity. Significant differences are known to exist between the sequences of variola major and minor, but the extent of the variation and the importance of the identified differences for virulence in each of the two subspecies or even within individual isolates have not been determined [36]. The extent and consistency of sequence variability might provide essential clues to the pathogenesis, virulence, and evolution of the virus and the nature of the infection. For example, a recent outbreak of monkeypox exhibited somewhat greater human-to-human transmission than had been the case in the past. However, preliminary results from sequencing of DNA fragments of monkeypox virus isolates obtained at various times since 1970 suggest that the virus has changed very little over this period. Thus at present, there is no clear evidence that the rate of human-to-human transmission of monkeypox is likely to increase [37]. DNA sequence information from a characteristic set of variola virus isolates could enhance our capability to assess whether human monkeypox is evolving transmission characteristics similar to those of smallpox. Variola virus stocks need to be retained until a sufficient number have been cloned, or PCR amplifications have been obtained and analyzed.

OCR for page 63
--> Variability of Variola Virus While the complete genomic sequences of a few variola virus isolates are available, the overall scope of such information remains limited. As noted in Chapter 5, the complete genome DNA of variola major virus Bangladesh-1975 (GenBank #L22579) has been sequenced from clones with about sixfold redundancy. The variola major virus India-1967 (GenBank X69198) genome, except for a small region at each DNA terminal, and the variola minor alastrim virus Brazil-1966 genome (EMBL Y167080) also have been sequenced, with about twofold redundancy. The samples in the CDC and VECTOR repositories do not, however, represent a complete archive of characterized strains from the different outbreaks in recent history.* Although the sequences of the above strains are not entirely identical, they are nearly so. Direct sequence comparison of the Bangladesh-1975 and India-1967 strains shows that the viruses are 99.2 percent identical throughout the entire genome (see Figure 9-1) [6, 36, 38]. While in one sense this finding argues for relatively little variability, that conclusion should be tempered by the following considerations. Most of the differences are clustered in the terminal regions of the viral genome. Those regions contain genes that frequently are not essential for viral replication, yet typically are associated with pathogenesis, interact with the immune system, and affect virulence and host range. While only 18 of 200 proteins in the entire genome differ significantly between the Bangladesh and India strains, 7 of 30 open reading frames at the left terminus and 8 of 22 open reading frames near the right terminus show variation between the two viruses [36]. It must be remembered that a very minor change—a single base addition or deletion or a single amino acid coded by a gene—can lead to profound effects in the corresponding proteins that determine variations in virulence. Moreover, available sequence data have been derived from plaque-purified isolates whose DNA was cloned into plasmids, and there are sparse or no data on heterogeneity within individual isolates, the effect of cloning in bacteria, or the heterogeneity in strains other than those discussed above. The issue of heterogeneity can be addressed using different strategies, such as multiple plaque-purified clones from the same isolate, or a complete catalogue of sequences from the left and right terminal regions of the genomes from strains with quite different clinical histories or epidemiological descriptions. Limited studies have shown that long-distance PCR and RFLP analysis of specific amplifications occasionally does not produce the restriction pattern predicted in the published sequence obtained from cloned DNA fragments [26, 39–41]. Specifically, within one PCR-amplified fragment where, say, four restriction sites with a given enzyme would have been predicted, only three are found with the corresponding adjustments in size. This discrepancy may be the result of poor long-distance PCR copy fidelity, unappreciated heterogeneity within the *   Joseph J. Esposito, Personal communication, December 1998.

OCR for page 63
--> DNA, or sequencing errors introduced because clones were used to determine the sequence. The uncertainty inherent in this analytic method and cloning in plasmids reinforces the importance of examining a large set of isolates to develop consensus on the DNA molecular preparation. Sequencing provides only a snapshot of the vires genome, which actually exists in nature as a molecular array. Figure 9-1 This figure illustrates the degree of conservation and variation between the Bangladesh (BSH) and India (IND) strains and between vaccinia and BSH by considering, on the Hind III genome map, the percent amino acid identity of predicted proteins. The open reading frames (ORF) were divided into 10 groups of 20 ORF across the genome, and the percentage of BSH and IND ORF that encode proteins of >99 percent amino acid sequence identity was determined. The analysis showed that the most varied proteins arise from the DNA terminal regions. Interestingly, the group of ORF A14L-A032L encodes proteins with greater amino acid diversity than does the group A33L-A49R, which is more proximal to the right end of the BSH and IND Hind III map. Also shown is a histogram that illustrates the striking difference in the amino acid sequence of proteins of vaccinia and BSH. SOURCE: Shchelkunov et al., Comparison of the genome DNA sequences of Bangladesh-1975 and India-1967 variola viruses. Virus Research 36:107–118, 1995. Copyright 1995, Elsevier Science; reprinted with permission.

OCR for page 63
--> Potential Developments Information about the variations discussed above could be critical for the development of diagnostic reagents, subunit vaccines, and therapeutic drugs. It might also shed light on the mechanisms of viral virulence and host tropism. Much could be deduced about the degree of diversity and variation by sequencing the DNA from a comprehensive set of viral isolates. Long-distance, highfidelity PCR would ideally be employed, with staggered initiation sites to produce overlapping sequences of sufficient redundancy to ensure the ability to determine the entire sequences with adequate accuracy. Such cloning and evaluation of sequence variability would necessarily have to be done before viral stocks were destroyed. Once sufficient DNA plasmid clones had been obtained and sequenced or analyzed by genome PCR or RFLP, live virus would no longer be required for most of the currently available informatics methods. The open reading frames (ORF) of the smallpox virus genome have been identified in the genomes already sequenced. New sequences would help to identify naturally occurring variations and verify the validity of sequences already determined. Particular putative protein products could be studied from the perspective of the degree to which there was amino acid sequence conservation (or variation). Putative genes could be compared with equivalent ORF in other orthopoxviruses, as well as with ORF encoding similar proteins from the cell and in the genomes of other types of viruses. In particular, the ORF of the smallpox virus genome that are thought to be associated with virulence could be. compared with those of other orthopoxviruses, including monkeypox virus, possibly leading to the identification of genes important for determining the pathogenicity, virulence, transmissibility, and human tropism of variola virus. It has been reported that transfection of the intact vaccinia virus genome as DNA into cells infected with fowlpox virus leads to production of vaccinia virus [42]. Thus far, however, neither virion DNA fragments (representing the entire viral sequence) nor viral DNA from a plasmid has been successful in regenerating infectious vaccinia. This failure could be due to technical problems that might be solved as the technology improves. If so, it also should be possible to recreate live smallpox viruses from the DNA clones of overlapping fragments. This capability would have profound implications for issues of security, safety, and ethics surrounding the proposed destruction of live variola virus stocks. The ability to reconstitute viruses from DNA clones would make it possible to engineer a variety of attenuated viruses that would constitute an effective form of biological containment to help ensure safety while working with live virus. Infectious poxviruses have not yet been created from assembled plasmids or synthetic DNA fragments, but there is no technical impediment to the eventual establishment of this ability. It is entirely possible that future advances in gene synthesis and transfection technologies would enable synthesis of variola virus

OCR for page 63
--> from the published sequence information. There is no way of predicting the rate at which such technologies might develop.

OCR for page 63
This page in the original is blank.