3
Genetics and Health

Although there are many possible causes of human disease, family history is often one of the strongest risk factors for common disease complexes such as cancer, cardiovascular disease (CVD), diabetes, autoimmune disorders, and psychiatric illnesses. A person inherits a complete set of genes from each parent, as well as a vast array of cultural and socioeconomic experiences from his/her family. Family history is thought to be a good predictor of an individual’s disease risk because family members most closely represent the unique genomic and environmental interactions that an individual experiences (Kardia et al., 2003). Inherited genetic variation within families clearly contributes both directly and indirectly to the pathogenesis of disease. This chapter focuses on what is known or theorized about the direct link between genes and health and what still must be explored in order to understand the environmental interactions and relative roles among genes that contribute to health and illness.

GENETIC SUSCEPTIBILITY

For more than 100 years, human geneticists have been studying how variations in genes contribute to variations in disease risk. These studies have taken two approaches. The first approach focuses on identifying the individual genes with variations that give rise to simple Mendelian patterns of disease inheritance (e.g., autosomal dominant, autosomal recessive, and X-linked) (see Table 3-1; Mendelian Inheritance in Man). The second approach seeks to understand the genetic susceptibility to disease as the con-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 44
3 Genetics and Health Although there are many possible causes of human disease, family history is often one of the strongest risk factors for common disease com- plexes such as cancer, cardiovascular disease (CVD), diabetes, autoimmune disorders, and psychiatric illnesses. A person inherits a complete set of genes from each parent, as well as a vast array of cultural and socioeco- nomic experiences from his/her family. Family history is thought to be a good predictor of an individual’s disease risk because family members most closely represent the unique genomic and environmental interactions that an individual experiences (Kardia et al., 2003). Inherited genetic variation within families clearly contributes both directly and indirectly to the patho- genesis of disease. This chapter focuses on what is known or theorized about the direct link between genes and health and what still must be explored in order to understand the environmental interactions and relative roles among genes that contribute to health and illness. GENETIC SUSCEPTIBILITY For more than 100 years, human geneticists have been studying how variations in genes contribute to variations in disease risk. These studies have taken two approaches. The first approach focuses on identifying the individual genes with variations that give rise to simple Mendelian patterns of disease inheritance (e.g., autosomal dominant, autosomal recessive, and X-linked) (see Table 3-1; Mendelian Inheritance in Man). The second ap- proach seeks to understand the genetic susceptibility to disease as the con- 44

OCR for page 44
45 GENETICS AND HEALTH TABLE 3-1 Online Mendelian Inheritance in Man (OMIM) Statistics (as of May 15, 2006), Number of Entries X- Y- Autosomal Linked Linked Mitochondrial Total Gene with 10,215 472 48 37 10,772 known sequence Gene with 349 31 0 0 380 known sequence and phenotype Phenotype 1,710 153 2 26 1,891 description molecular basis known Mendelian 1,384 134 4 0 1,522 phenotype or locus, molecular basis unknown Other, mainly 2,065 145 2 0 2,212 phenotypes with suspected Mendelian basis Total 15,723 9,353 56 63 16,777 SOURCE: OMIM, www.ncbi.nlm.nih.gov/Omim/mimstats.html, accessed May 15, 2006. sequence of the joint effects of many genes. Each of these approaches will be discussed below. In general, diseases with simple Mendelian patterns of inheritance tend to be relatively uncommon or frequently rare, with early ages of onset, such as phenylketonuria, sickle cell anemia, Tay-Sachs disease, and cystic fibrosis. In addition, some of these genes have been associated with extreme forms of common diseases, such as familial hypercholesterolemia, which is caused by mutations in the low-density lipoprotein (LDL) recep- tor that predispose individuals to early onset of heart disease (Brown and Goldstein, 1981). Another example of Mendelian inheritance is familial forms of breast cancer associated with mutations in the BRCA1 and BRCA2 genes that predispose women to early onset breast cancer and often ovarian cancer. The genes identified have mutations that often are highly penetrant—that is, the probability of developing the disease in someone carrying the disease susceptibility genotype is relatively high (greater than 50 percent). These genetic diseases often exhibit a genetic phenomenon known as allelic het- erogeneity, in which multiple mutations within the same gene (i.e., alleles) are found to be associated with the same disease. This allelic heterogeneity

OCR for page 44
46 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT often is population specific and can represent the unique demographic and mutational history of the population. In some cases, genetic diseases also are associated with locus heteroge- neity, meaning that a deleterious mutation in any one of several genes can give rise to an increased risk of the disease. This is a finding common to many human diseases including Alzheimer’s disease and polycystic kidney disease. Both allelic heterogeneity and locus heterogeneity are sources of variation in these disease phenotypes since they can have varying effects on the disease initiation, progression, and clinical severity. Environmental factors also vary across individuals and the combined effect of environmental and genetic heterogeneity is etiologic heterogeneity. Etiologic heterogeneity refers to a phenomenon that occurs in the general population when multiple groups of disease cases, such as breast cancer clusters, exhibit similar clinical features, but are in fact the result of differ- ing events or exposures. Insight into the etiology of specific diseases as well as identification of possible causative agents is facilitated by discovery and examination of disease cases demonstrating etiologic heterogeneity. The results of these studies may also highlight possible gene-gene interactions and gene-environment interactions important in the disease process. Identi- fying etiologic heterogeneity can be an important step toward analysis of diseases using molecular epidemiology techniques and may eventually lead to improved disease prevention strategies (Rebbeck et al., 1997). As opposed to the Mendelian approach, the second approach to study- ing how variations in genes contribute to variations in disease risk focuses on understanding the genetic susceptibility to diseases as the consequence of the joint effects of many genes, each with small to moderate effects (i.e., polygenic models of disease) and often interacting among themselves and with the environment to give rise to the distribution of disease risk seen in a population (i.e., multifactorial models of disease). This approach has been used primarily for understanding the genetics of birth defects and common diseases and their risk factors. As described below, several steps are in- volved in developing such an understanding. As a first step, study participants are asked to provide a detailed family history to assess the presence of familial aggregation. If individuals with the disease in question have more relatives affected by the disease than individuals without the disease, familial aggregation is identified. While familial aggregation may be accounted for through genetic etiology, it may also represent an exposure (e.g., pesticides, contaminated drinking water, or diet) common to all family members due to the likelihood of shared environment. When there is evidence of familial aggregation, the second step is to focus research studies on estimating the heritability of the disease and/or its risk factors. Heritability is defined as the proportion of variation in disease

OCR for page 44
47 GENETICS AND HEALTH risk in a population that is attributable to unmeasured genetic variations inferred through familial patterns of disease. It is a broad population-based measure of genetic influence that is used to determine whether further genetic studies are warranted, since it allows investigators to test the overarching null hypothesis that no genes are involved in determining dis- ease risk. Twin studies and family studies are frequently used in the study of heritability. Twin studies comparing the disease and risk factor variability of monozygotic and dizygotic twins have been a common study design used to easily estimate both genetic and cultural inheritance. Studies of monozy- gotic twins reared together versus those reared apart also have been impor- tant in estimating both genetic and environmental contributions to patterns of inheritance. The modeling of the sources of phenotypic variation using family studies has become quite sophisticated, allowing the inclusion of model parameters to represent the additive genetic component (i.e., poly- genes), the nonadditive genetic component (i.e., genetic dominance, as well as gene-environment and gene-gene interactions), shared family environ- ment, and individual environments. The contributions of these factors have been shown to vary by age and population. When significant evidence of genetic involvement is established, the next step is to identify the responsible genes and the mutations that are associated with increased or decreased risk, using either genetic linkage analysis or genetic association studies. For example, in the study of birth defects, this often involves the search for chromosomal deletions, inser- tions, duplications, or translocations. GENETIC LINKAGE ANALYSIS AND GENETIC ASSOCIATION STUDIES The human genome is made up of tens of thousands of genes. With approximately 30,000 genes to choose from, assigning a specific gene or group of genes to a corresponding human disease demands a methodical approach consisting of many steps. Traditionally, the process of gene dis- covery begins with a linkage analysis that assesses disease within families. Linkage analyses are typically followed by genetic association studies that assess disease across families or across unrelated individuals. Genetic Linkage Analysis The term linkage refers to the tendency of genes proximally located on the same chromosome to be inherited together. Linkage analysis is one step in the search for a disease susceptibility gene. The goal of this analysis is to approximate the location of the disease gene in relation to a known genetic

OCR for page 44
48 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT marker, applying an understanding of the patterns of linkage. Traditional linkage analysis that traces patterns of heredity of both the disease pheno- type and genetic markers in large, high-risk families have been used to locate disease-causing gene mutations such as the breast cancer gene (BRCA1) on chromosome 17 (Hall et al., 1990). Because the mode of inheritance is often not clear for common diseases, an alternative approach to classic linkage analysis was developed to capital- ize on the basic genetic principle that siblings share half of their alleles on average. By investigating the degree of allelic sharing across their genomes, pairs of affected siblings (i.e., two or more siblings with the same disease) can be used to identify chromosomal regions that may contain genes whose variations are related to the disease being studied. If numerous sibling pairs affected by the disease of interest exhibit a greater than expected sharing of the known alleles of the polymorphic genetic marker being used, then the genetic marker is likely to be linked (that is, within close proximity along the chromosome) to the susceptibility gene responsible for the disease being studied. To find chromosomal regions that show evidence for linkage using this affected sibling pair method typically requires typing numerous af- fected sibships with hundreds of highly polymorphic markers uniformly positioned along the human genome (Mathew, 2001). This approach has been widely used to identify regions of the genome thought to contribute to common chronic diseases. However, results of linkage analyses have not been consistently replicated. The inability to successfully replicate linkage findings may be a result of insufficient statis- tical power (that is, including an inadequate number of sibling pairs with the disease of interest) or results that included false positives in the original study. An alternate explanation could be that different populations are affected by different susceptibility genes than those that were studied origi- nally (Mathew, 2001). Without consistent replication of results it is prema- ture to draw conclusions about the contribution of a gene locus to a specific disease. Upon the confirmation of a linkage, researchers can begin to search the region for the candidate susceptibility gene. The search for a single suscep- tibility gene for common diseases often involves examination of very large linkage regions, containing 20 to 30 million base pairs and potentially hundreds of genes (Mathew, 2001). It is also important to note, however, that while linkage mapping is a powerful tool for finding Mendelian disease genes, it often produces weak and sometimes inconsistent signals in studies of complex diseases that may be multifactorial. Linkage studies perform best when there is a single susceptibility allele at any given disease locus and generally performs poorly when there is substantial genetic heterogeneity.

OCR for page 44
49 GENETICS AND HEALTH Genetic Association Studies Technological advances in high-throughput genotyping have allowed the direct examination of specific genetic differences among sizable num- bers of people. Genetic association techniques are often the most efficient approach for assessing how specific genetic variation can affect disease risk. Genetic association studies, which have been used for decades, have per- petually progressed in terms of the development of new study designs (such as case-only and family-based association designs), new genotyping systems (such as array-based genotyping and multiplexing assays), and new meth- ods used for addressing biases such as population (Haines and Pericak- Vance, 1998). Analysis of the effects of genetic variation typically involves first the discovery of single nucleotide polymorphisms (SNPs)1 and then the analy- sis of these variations in samples from populations. SNPs occur on average approximately every 500 to 2,000 bases in the human genome. The most common approach to SNP discovery is to sequence the gene of interest in a representative sample of individuals. Currently, sequencing of entire genes on small numbers of individuals (~25 to 50) can detect polymorphisms occurring in 1 to 3 percent of the population with approximately 95 per- cent confidence. The Human DNA Polymorphism Discovery Program of the National Institute of Environmental Health Sciences’ Environmental Genome Project is one example of the application of automated DNA sequencing technologies to identify SNPs in human genes that may be associated with disease susceptibility and response to environment (Livingston et al., 2004). The National Heart, Lung, and Blood Institute’s Programs in Genomic Applications also has led to important increases in our knowledge about the distribution of SNPs in key genes thought to be already biologically implicated in disease risk (i.e., biological candidate genes2 ). Impressive and rapid advances in SNP analysis technology are rapidly redefining the scope of SNP discovery, mapping, and genotyping. New array-based genotyping technology enables “whole genome association” analyses of SNPs between individuals or between strains of laboratory animal species (Syvanen, 2005). Arrays used for these analyses can repre- sent hundreds of thousands of SNPs mapped across a genome (Klein et al., 1An SNP is the DNA sequence variation that occurs when a single nucleotide (A, T, C, or G) in the genome sequence is altered (Smith, 2005). 2A candidate gene is a gene whose protein product is involved in the metabolic or physi- ological pathways associated with a particular disease (IOM, 2005).

OCR for page 44
50 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT 2005; Hinds et al., 2005; Gunderson et al., 2005). This approach allows rapid identification of SNPs associated with disease and susceptibility to environmental factors. The strength of this technology is the massive amount of easily measurable genetic variation it puts in the hands of re- searchers in a cost-effective manner ($500 to $1,000 per chip). The criteria for the selection of SNPs to be included on these arrays are a critical consideration, since they affect the inferences that can be drawn from using these platforms. Of course, the ultimate tool for SNP discovery and genotyping is individual whole genome sequencing. Although not currently feasible, the rapid advancement of technology now being stimulated by the National Human Genome Research Institute’s “$1,000 genome” project likely will make this approach the optimal one for SNP discovery and genotyping in the future. With the ability to examine large quantities of genetic variations, re- searchers are moving from investigations of single genes, one at a time, to consideration of entire pathways or physiological systems that include in- formation from genomic, transcriptomic, proteomic, and metabonomic lev- els that are all subject to different environmental factors (Haines and Pericak-Vance, 1998). However, these genome- and pathway-driven study designs and analytic techniques are still in the early stages of development and will require the joint efforts of multiple disciplines, ranging from mo- lecular biologists to clinicians to social scientists to bioinformaticians, in order to make the most effective use of these vast amounts of data. GENE-ENVIRONMENT AND GENE-GENE INTERACTIONS The study of gene-environment and gene-gene interactions represents a broad class of genetic association studies focused on understanding how human genetic variability is associated with differential responses to envi- ronmental exposures and with differential effects depending on variations in other genes. To illustrate the concept of gene-environment interactions, recent studies that identify genetic mutations that appear to be associated with differential response to cigarette smoke and its association with lung cancer are reviewed below. Tobacco smoke contains a broad array of chemi- cal carcinogens that may cause DNA damage. There are several DNA re- pair pathways that operate to repair this damage, and the genes within this pathway are prime biological candidates for understanding why some smok- ers develop lung cancers but others do not. In a study by Zhou et al. (2003), variations in two genes responsible for DNA repair were examined for their potential interaction with the level of cigarette smoking and concomitant association with lung cancer. Briefly, one putatively functional mutation in the XRCC1 (X-ray cross-complementing group 1) gene and two putatively functional mutations in the ERCC2 (excision repair cross-complementing

OCR for page 44
51 GENETICS AND HEALTH group 2) gene were genotyped in 1,091 lung cancer cases and 1,240 con- trols. When the cases and controls were stratified into heavy smokers versus nonsmokers, Zhou et al. (2003) found that nonsmokers with the mutant XRCCI genotype had a 2.4 times greater risk of lung cancer than nonsmok- ers with the normal genotype. In contrast, heavy smokers with the mutant XRCCI genotype had a 50 percent reduction in lung cancer risk compared to their counterparts with the more frequent normal genotype. When the three mutations from these two genes were examined together in the ex- treme genotype combination (individual with five or six mutations present in his/her genotype) there was a 5.2 time greater risk of lung cancer in nonsmokers and a 70 percent reduction of risk in the heavy smokers com- pared to individuals with no mutations. The protective effect of these ge- netic variations in heavy smokers may be caused by the differential increase in the activity of these protective genes stimulated by heavy smoking. Simi- lar types of gene-smoking interactions also have been found for other genes in this pathway, such as ERCC1. These studies illustrate the importance of identifying the genetic variations that are associated with the differential risk of disease related to human behaviors. Note that this type of research also raises many different kinds of ethical and social issues, since it identi- fies susceptible subgroups and protected subgroups of subjects by both genetic and human behavior strata (see Chapter 10). The study by Zhou et al. (2003) also demonstrates the increased infor- mation provided by jointly examining the effects of multiple mutations on toxicity-related disease. Other studies of mutations in genes involved in the Phase II metabolism (GSTM1, GSTT1, GSTP1) also have demonstrated the importance of investigating the joint effects of mutations (Miller et al., 2002) on cancer risk. Although these two studies focused on the additive effects of multiple genes, gene-gene interactions are another important com- ponent to develop a better understanding of human susceptibility to disease and to interactions with the environment. To adequately understand the continuum of genomic susceptibility to environmental agents that influences the public’s health, more studies of the joint effects of multiple mutations need to be conducted. Advances in bioinformatics can play a key role in this endeavor. For example, methods to screen SNP databases for mutations in transcriptional regulatory regions can be used for both discovery and functional validation of polymorphic regulatory elements, such as the antioxidant regulatory element found in the promoter regions of many genes encoding antioxidative and Phase II detoxification enzymes (Wang et al., 2005). Comparative sequence analysis methods also are becoming increasingly valuable to human genetic studies, because they provide a means to rank order SNPs in terms of their potential deleterious effects on protein function or gene regulation (Wang et al., 2004). Methods of performing large-scale analysis of nonsynonymous SNPs

OCR for page 44
52 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT to predict whether a particular mutation impairs protein function (Clifford et al., 2004) can help in SNP selection for genetic epidemiological studies and can be used to streamline functional analysis of mutations that are found to be statistically associated with differential response to environ- mental factors such as diet, stress, and socioeconomic factors. MECHANISMS OF GENE EXPRESSION Identifying genes whose variations are associated with disease is just the first step in linking genetics and health. Understanding the mechanisms by which the gene is expressed and how it is influenced by other genes, proteins, and the environment is becoming increasingly important to the development of preventive, diagnostic, and therapeutic strategies. When genes are expressed, the chromosomal DNA must be transcribed into RNA and the RNA is then processed and transported to be translated into protein. Regulating the expression of genes is a vital process in the cell and involves the organization of the chromosomal DNA into an appropri- ate higher-order chromatin structure. It also involves the action of a host of specific protein factors (to either encourage or suppress gene expression), which can act at different steps in the gene expression pathway. In all organisms, networks of biochemical reactions and feedback signals organize developmental pathways, cellular metabolism, and progression through the cell cycle. Overall coordination of the cell cycle and cellular metabolism results from feed-forward and feedback controls arising from sets of dependent pathways in which the initiation of events is dependent on earlier events. Within these networks, gene expression is controlled by mo- lecular signals that regulate when, where, and how often a given gene is transcribed. These signals often are stimulated by environmental influences or by signals from other cells that affect the gene expression of many genes through a single regulatory pathway. Since a regulatory gene can act in combination with other signals to control many other genes, complex branch- ing networks of interactions are possible (McAdams and Arkin, 1997). Gene regulation is critical because by switching genes on or off when needed, cells can be responsive to changes in environment (e.g., changes in diet or activity) and can prevent resources from being wasted. Variation in the DNA sequences associated with the regulation of a gene’s expression are therefore likely candidates for understanding gene-environment interac- tions at the molecular level, since these variations will affect whether an environmental signal transduced to the nucleus will successfully bind to the promoter sequence in the gene and stimulate or repress gene expression. Combining genomic technologies for SNP genotyping with high-density gene expression arrays in human studies has only recently elucidated the

OCR for page 44
53 GENETICS AND HEALTH extent to which this type of molecular gene-environment interaction may be occurring. Cells also regulate gene expression by post-transcriptional modifica- tion; by allowing only a subset of the mRNAs to go on to translation; or by restricting translation of specific mRNAs to only when and where the product is needed. The genetic factors that influence post-transcriptional control are much more difficult to study because they often involve multiprotein complexes not easily retrieved or assayed from cells. At other levels, cells regulate gene expression through epigenetic mechanisms, in- cluding DNA folding, histone acetylation, and methylation (i.e., chemical modification) of the nucleotide bases. These mechanisms are likely to be influenced by genetic variations in the target genes as well as variations manifested in translated cellular regulatory proteins. Gene regulation oc- curs throughout life at all levels of organismal development and aging. A classic example of developmental control of gene expression is the differential expression of embryonic, fetal, and adult hemoglobin genes (see Box 3-1). The regulation of the epsilon, delta, gamma, alpha, and beta genes occurs through DNA methylation that is tightly controlled through developmental signals. During development a large number of genes are turned on and off through epigenetic regulation. One of the fastest growing fields in genetics is the study of the developmental consequences of environ- mental exposures on gene expression patterns and the impact of genetic variations on these developmental trajectories. An Example of a Single-Gene Disorder with Significant Clinical Variability: Sickle Cell Disease3 Sickle cell disease refers to an autosomal recessive blood disorder caused by a variant of the β-globin gene called sickle hemoglobin (Hb S). A single nucleotide substitution (T→A) in the sixth codon of the β-globin gene results in the substitution of valine for glutamic acid (GTG→GAG), which can cause Hb S to polymerize (form long chains) when deoxygenated (Stuart and Nagel, 2004). An individual inheriting two copies of Hb S (Hb SS) is considered to have sickle cell anemia, while an individual inheriting one copy of Hb S plus another deleterious β-globin variant (e.g., Hb C or Hb β- thalassemia) is considered to have sickle cell disease. An individual is con- sidered to be a carrier of the sickle cell trait if he/she has one copy of the 3The sickle cell example is abstracted from a commissioned paper prepared by Robert J. Thompson, Jr., Ph.D. (Appendix D).

OCR for page 44
54 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT BOX 3-1 Gene Expression and Globin The production of hemoglobin is regulated by a number of transcriptional con- trols, such as switching, that dictate the expression of a different set of globin genes in different parts of the body throughout the various stages of the develop- ment process. This transcriptional regulation of globin genes is a result of many different DNA sequences and methylation of those sequences. The process be- gins shortly after conception when the yolk of the egg sac expresses genes that are responsible for the embryonic hemoglobin are deactivated, while the genes responsible for producing fetal hemoglobin in the liver are activated. Upon birth, the adult globin genes are activated and the bone marrow stem cells begin to produce adult hemoglobin and red blood cells (Rimoin et al., 2002). A group of diseases that are the result of defective switching among the globin genes during the development process are called thalassemias. This class of dis- eases results in the decreased capacity to carry oxygen due to the complete ab- sence of hemoglobin or the production of abnormal hemoglobin. Two types of thalassemias, alpha and beta, are the product of ineffective gene regulation. The α α % of total globin synthesis 50 γ β 40 30 Switch from fetal Switch from embryonic to adult Hb to fetal Hb 20 ε β 10 ξ γ δ 6 12 18 24 30 36 6 12 18 24 30 36 42 48 Birth Postnatal age (weeks) Prenatal age (weeks) normal β-globin gene and one copy of the sickle variant (Hb AS) (Ashley- Koch et al., 2000). Four major β-globin gene haplotypes have been identified. Three are named for the regions in Africa where the mutations first appeared: BEN (Benin), SEN (Senegal), and CAR (Central African Republic). The fourth haplotype, Arabic-India, occurs in India and the Arabic peninsula (Quinn and Miller, 2004). Disease severity is associated with several genetic factors (Ashley-Koch et al., 2000). The highest degree of severity is associated with Hb SS, followed by Hb s/β0-thalassemia, and Hb SC. Hb S/β+-thalassemia is asso- ciated with a more benign course of the disease (Ashley-Koch et al., 2000). Disease severity also is related to β-globin haplotypes, probably due to

OCR for page 44
57 GENETICS AND HEALTH contexts such as age, sex, diet, and physical activity that modify the relation- ship with risk. For the most part, we are still at the stage of documenting the complexity, finding examples and types of genetic susceptibility genes, under- standing disease heterogeneity, and postulating ways to develop models of risk that use the totality of what we know about human biology, from our genomes to our ecologies to model risk. Cardiovascular Disease (CVD) The study of CVD can be used to illustrate the issues that are encoun- tered in using genetic information in order to understand the etiology of the most common chronic diseases as well as in identifying those at highest risk of developing these diseases. The majority of CVD cases have a complex multifactorial etiology, and even full knowledge of an individual’s genetic makeup cannot predict with certainty the onset, progression, or severity of disease (Sing et al., 2003). Disease develops as a consequence of interac- tions between a person’s genotype and exposures to environmental agents, which influence cardiovascular phenotypes beginning at conception and continuing throughout adulthood. CVD research has found many high-risk environmental agents and hundreds of genes, each with many variations that are thought to influence disease risk. As the number of interacting agents involved increases, a smaller number of cases of disease will be found to have the same etiology and be associated with a particular geno- type (Sing et al., 2003). The many feedback mechanisms and interactions of agents from the genome through intermediate biochemical and physiologi- cal subsystems with exposure to environmental agents contribute to the emergence of a given individual’s clinical phenotype. In attempting to sort out the relative contributions of genes and environment to CVD, a large array of factors must be considered, from the influence of genes on choles- terol (e.g., LDL levels) to psychosocial factors such as stress and anger. Although hundreds of genes have been implicated in the initiation, progres- sion, and clinical manifestation of CVD, relatively little is known about how a person’s environment interacts with these genes to tip the balance between the atherogenic and anti-atherogenic processes that result in clini- cally manifested CVD. Please see Chapters 4 and 6 for further discussion of effects of social environment on CVD. It is well known that many social and behavioral factors ranging from socioeconomic status, job stress, and depression, to smoking, exercise, and diet affect cardiovascular disease risk (see Chapters 2, 3, and 6 for more detailed discussion of these factors). As more studies of gene-environment interaction consider these factors as part of the “environment,” which are examined in conjunction with genetic variations, multiple intellectual and methodological challenges arise. First, how are the social factors embodied

OCR for page 44
58 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT such that an interaction with a particular genotype can be associated with differential risk? Second, how can we handle complex interactions to address questions, such as how does an individual’s genotype influence his/her behav- ior? For example, one’s genetic susceptibility to nicotine addiction is actually a risk factor for CVD and its effect on CVD risk may be contingent on interactions with other genetic factors. Pharmacogenetics It has been well established that individuals often respond differently to the same drug therapy. The drug disposition process is a complex set of physiological reactions that begin immediately upon administration. The drug is absorbed and distributed to the targeted areas of the body where it interacts with cellular components, such as receptors and enzymes, that further metabolize the drug, and ultimately the drug is excreted from the body (Weinshilboum, 2003). At any point during this process, genetic varia- tion may alter the therapeutic response of an individual and cause an ad- verse drug reaction (ADR) (Evans and McLeod, 2003). It has been esti- mated that 20 to 95 percent of variations in drug disposition, such as ADRs, can be attributed to genetic variation (Kalow et al., 1998; Evans and McLeod, 2003). Sensitivity to both dose-dependent and dose-independent ADRs can have roots in genetic variation. Polymorphisms in kinetic and dynamic factors, such as cytochrome P450 and specific drug targets can cause these individuals susceptibilities to ADRs. While the characteristics of the ADR dictate the true significance of these factors, in most cases, multiple genes are involved (Pirmohamed and Park, 2001). Future analyses using genome- wide SNP profiling could provide a technique for assessing several genetic susceptibility factors for ADRs and ascertaining their joint effects. One of the challenges to the study of the relationship between genetic variation and ADRs is an inadequate number of patient samples. To remedy this prob- lem, Pirmohamed and Park (2001) have proposed that prospective random- ized controlled clinical trials become a part of standardized practice to ultimately prove the clinical utility of genotyping all patients as a measure to prevent ADRs. Here we review some of the current work in pharmacogenetics as an example of what might be expected to arise from rigorous study of the interaction between social, behavioral, and genetic factors. Researchers have provided a few well-established examples of differences in individual drug response that have been ascribed to genetic variations in a variety of cellular drug disposition machinery, such as drug transporters or enzymes responsible for drug metabolism (Evans and McLeod, 2003). For example:

OCR for page 44
59 GENETICS AND HEALTH • With the knowledge that the HER2 gene is overexpressed in ap- proximately one fourth of breast cancer cases, researchers developed a humanized monoclonal antibody against the HER2 receptor in hopes of inhibiting the tumor growth associated with the receptor. Genotyping ad- vanced breast cancer patients to identify those with tumors that overexpress the HER2 receptor has produced promising results in improving the clinical outcomes for these breast cancer patients (Cobleigh et al., 1999). • A therapeutic class of drugs called thiopurines is used as part of the treatment regimen for childhood acute lymphoblastic leukemia. One in 300 Caucasians has a genetic variation that results in low or nonexistent levels of thiopurine methyltransferase (TPMT), an enzyme that is responsible for the metabolism of the thiopurine drugs. If patients with this genetic varia- tion are given thiopurines, the drug accumulates to toxic levels in their body causing life-threatening myelosuppression. Assessing the TPMT phenotype and genotype of the patient can be used to determine the individualized dosage of the drug (Armstrong et al., 2004). • The family of liver enzymes called cytochrome P450s plays a major role in the metabolism of as many as 40 different types of drugs. Genetic variants in these enzymes may diminish their ability to effectively break down certain drugs, thus creating the potential for overdose in patients with less active or inactive forms of the cytochrome P450 enzyme. Varying levels of reduced cytochrome P450 activity is also a concern for patients taking multiple drugs that may interact if they are not properly metabolized by well-functioning enzymes. Strategies to evaluate the activity level of cytochrome P450 enzymes have been devised and are valuable in planning and monitoring successful drug therapy. Some pharmaceutical drug trials are now incorporating early tests that evaluate the ability of differing forms of cytochrome P450 to metabolize the new drug compound (Obach et al., 2006). Some pharmacogenetics research has focused on the treatment of psy- chiatric disorders. With the introduction of a class of drugs known as selective serotonin re-uptake inhibitors (SSRIs), pharmacological treatment of many psychiatric disorders changed drastically. SSRIs offer significant improvements over the previous generation of treatments, including im- proved efficacy and tolerance for many patients. However, not all patients respond positively to SSRI treatment and many experience ADRs. New pharmacogenetic studies have indicated that these ADRs may be the result of genetic variations in serotonin transporter genes and cytochrome P450 genes. Further study and replication of these findings are necessary. If the characterization of the genetic variations is completed and is fully under- stood it would be possible to screen and monitor patients using genotyping

OCR for page 44
60 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT techniques to create individualized drug therapies similar to those discussed above (Mancama and Kerwin, 2003). A significant challenge to the development of individualized drug thera- pies is the often polygenic or multifactorial inherited component of drug responses. Isolating the polygenic determinants of the drug responses is a sizable task. A good understanding of the drug’s mechanism of action and metabolic and disposition pathways should be the basis of all investiga- tions. This knowledge can aid in directing genome-wide searches for gene variations associated with drug effects and subsequent candidate-gene ap- proaches of investigation. Additionally, proteomic and gene-expression pro- filing studies are also important ways to substantiate and understand the pathways by which the gene of interest operates to affect the individual’s response to the drug (Evans and McLeod, 2003). It is not enough to show an association; characterization of the underlying biological mechanisms is an essential component of moving genetic findings into the area of risk reduction. Another key component of utilizing genetics to improve preven- tion and reduce disease is an understanding of the distribution of the ge- netic variations in the populations being served. GENETICS OF POPULATONS AS RELATED TO HEALTH AND DISEASE Human populations differ in their distribution of genetic variations. This is a consequence of their historical patterns of mutation, migration, reproduction, mating, selection, and genetic drift. Inherited mutations typi- cally occur during gametogenesis within a single individual and then can be passed on to offspring for many generations. Whether that mutation goes on to become a prevalent polymorphism (i.e., a mutation with a population frequency of greater than 1 percent) is determined by both evolutionary forces and chance events. For example, it depends on whether the original child who inherited the mutation survives to adulthood and reproduces and whether that child’s children survive to reproduce, and so on. The number of children in a family also influences the prevalence of the mutation, and this is often tied to environmental factors that impact fertility and mating patterns that influence the speed with which a private mutation becomes a public polymorphism. There are well-known examples of what are called founder mutations in which this trajectory can be documented. For ex- ample, one particular district in what is Quebec (Canada) today was origi- nally founded by only a few families from a particular French province. One of the founding fathers carried a 10kb deletion in his LDL receptor (LDL-R) gene that was passed down through the generations quickly and today is carried by 1 in 154 French Canadians in northeastern Quebec. This mutation is associated with familial hypercholesterolemia, and French Ca-

OCR for page 44
61 GENETICS AND HEALTH nadians have one of the highest prevalences of this disease in the world because of the small founding populations followed by population expan- sion (Moorjani et al., 1989). There are also a number of examples where mutations that arise in an individual become more prevalent because of the selective advantage they impart on their carriers. The best known example is the mutation associ- ated with sickle cell anemia. The geographical pattern of this mutation strongly mirrors the geographical pattern of malarial infection. It has been molecularly demonstrated that individuals carrying the sickle cell mutation have a resistance to malarial infection. Because many of the selection pres- sures that may have given rise to the current distribution of mutations in particular populations are in our evolutionary past, it is difficult to assess how much variation within or among populations is due to these types of selection forces. Another major force in determining the distribution of genetic varia- tions within and among human populations is their migration and repro- ductive isolation. According to our best knowledge, one of the most impor- tant periods in human evolution occurred approximately 100,000 years ago, when some humans migrated to other continents from the African basin and established new communities with relative reproductive isola- tion. Genetic differences among people in different geographical areas have been associated with the concept of race for hundreds of years. Although race is still used as a label, the original concept of race as genetically distinct subspecies of humans has been rejected through modern genetic informa- tion. For numerous reasons, discussed in the section below, it is more appropriate to reconceptualize the old genetics of race into a more accurate genetics of ancestry. In addition to distant evolutionary patterns of migration, more modern migration patterns also have had a profound effect on the genetics of popu- lations. For example, the current population of the United States and much of North America is very diverse genetically as a consequence of the mixing of many people from many different countries and continents. A central reason for studying the origins and nature of human genetic variation is that the similarities and differences in the type and frequencies of genetic variations within and among populations can have a profound impact on studies that attempt to understand the influence of genes on disease risk. For example, some genetic variations, such as the apolipoprotein E protein polymorphisms, are found in every population and have very similar genotype frequencies around the world (Wu et al., 2002; Deniz Naranjo et al., 2004). The variation’s association with in- creased heart disease and Alzheimer’s disease could be and has been tested in many of the world’s populations. Other mutations such as the 10kb

OCR for page 44
62 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT deletion in the LDL-R gene described above are more population-specific variations. Furthermore, from a statistical point of view, the effect of a genetic variation on the continuum of risk found in any population is correlated with its frequency. For example, common genetic polymorphisms with fre- quencies near 50 percent cannot be associated with large phenotypic effects within a population because the genotype classes each represent a large fraction of the population and, since most risk is normally distributed, the average risk for a highly prevalent genotype class cannot deviate from the overall risk of the population to any large degree. This correlation between genotype frequency and effect does not mean that common variations can- not be significant in their effects. The statistical significance of an associa- tion between a genetic variant and a disease is a joint function of sample size and the size of the effect. In addition, genetic research among popula- tions that differ in their genotype frequencies can differ in their inferences about which polymorphisms have significant effects even if the absolute phenotypic effect is the same. See Cheverud and Routman (1995) for a more formal statistical explanation of this phenomenon and its impact on assessing gene-gene interactions. Another key consideration in understanding the relationship between genetic variations and measures of disease risk is the population differences in the correlations between genotype frequencies at different SNP locations. There are two common reasons why the frequency of an allele or genotype at a particular SNP could be correlated with the frequency of an allele or genotype for a different SNP. First, a phenomenon known as linkage dis- equilibrium creates correlations among SNPs as a consequence of the mutation’s history. When mutations arise, they occur on a particular ge- netic background, which creates a correlation with the other SNPs on the chromosome. Second, the mixing of populations known as admixture that occurs typically through migration means that SNPs with population- specific frequencies will be correlated in a larger mixed sample. In this case, population stratification is the cause of the correlation, and there has been much genetic epidemiological research on this phenomenon and how to control for it. Population stratification is thought to be a possible source of spurious genetic associations with disease (see Box 3-2). CONCLUSION In large part, the twentieth century was dominated by studies of human health and disease that focused on identifying single genetic and environ- mental agents that could explain variation in disease susceptibility. This new century has been characterized by huge advances in our understanding of Mendelian disorders with severe clinical outcomes. However, the Men-

OCR for page 44
63 GENETICS AND HEALTH BOX 3-2 Population Stratification (Confounding) When the risk of disease varies between two ethnic groups, any genetic or environmental factor that also varies between the groups will appear to be relat- ed to disease. This phenomenon is called “population stratification” in epidemio- logic studies investigating the effect of a genetic factor on disease, and it is a form of confounding. Population stratification refers to the presence of sub- groups—for example ethnic groups—in the sample, which could potentially cause a spurious association between genetic variations and trait. Concerns about population stratification have raised doubts about the credibility of some reported findings in candidate gene studies and have led to calls for the routine use of related controls in case-control studies of genetic factors to eliminate the possibility of population stratification (Lander and Schork, 1994; Altshuler et al., 1998). In fact, although population stratification is frequently used as an explana- tion for nonreplicable associations in the literature, there are few actual exam- ples to support this assumption (Risch, 2000) and many agree that the problem probably has been overstated (Cardon and Bell, 2001). For example, Wacholder et al. (2000) argued that population stratification to an extent large enough to distort results is unlikely to occur in many realistic situations. Population stratifi- cation is a manifestation of confounding—that is, the distortion of the relationship between the exposure of interest and disease due to the effect of a true risk factor that is related to the exposure (Wacholder et al., 2000). Thus, in population stratification ethnicity acts as a surrogate for the true risk factor, which may be environmental or genetic. This means that controlling for ethnicity can reduce the confounding bias. Ardlie et al. (2002) evaluated four moderately sized case-control studies for the presence of population structure and concluded that carefully matched case-con- trol samples in U.S. and European populations are unlikely to contain levels of population stratification that would result in significantly inflated numbers of false positive associations. However, methods have been developed by which unlinked genetic markers can be used to detect stratification and even correct for it when it is present (Pritchard and Rosenberg, 1999; Satten et al., 2001). delian paradigm has failed to elucidate the genetic contribution to suscepti- bility to most common chronic diseases, which researchers know have a substantial genetic component because of their familial aggregation and studies that demonstrate significant heritabilities for these diseases. Like- wise, environmental and social epidemiological studies have been wildly successful in illuminating the role of many environmental factors such as diet, exercise, and stress on disease risk. However, these environmental factors still do not, by themselves, fully explain the variance in the preva- lence of several diseases in different populations. Researchers are only now beginning to study in earnest the potential interactions between the genetic

OCR for page 44
64 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT and environmental factors that are likely to be contributing to a large fraction of disease in most populations. There is much that can be done to incorporate measures of social environment into genetic studies and to also incorporate genetic measures into social epidemiological studies. Over the last two decades, progress in identifying specific genes and mutations that explain genetic susceptibility to common conditions has been relatively slow, for a variety of reasons. First, the diseases being stud- ied tend to be complex in their etiology, meaning that different people in a population will develop disease for different genetic and/or environmental reasons. Any single genetic or environmental factor is expected to explain only a very small fraction of disease risk in a population. Moreover, these factors are expected to interact, and other biological processes (e.g., epige- netic modifications) are likely to be contributors to the complex puzzle of susceptibility. An accurate phenotypic definition of disease and its subtypes is crucial to identifying and understanding the complexities of disease- specific genetic and environmental causes. Second, geneticists only recently have developed the knowledge base or methods needed to measure genetic variations and their metabolic conse- quences with sufficient ease and cost-effectiveness so that the large number of genes thought to be involved can be studied. With the completion of the Human Genome Project in 2003, many different scientific entities (e.g., the Environmental Genome Project and the International HapMap Consor- tium) have been working to identify the mutational spectra in human popu- lations, and genetic epidemiologists are just now beginning to understand the extensive nature of common variations (>1 percent population fre- quency) within the human genome that could be affecting people’s risk of disease. The SNP data generated by these initiatives are now centrally located in a number of public databases, including the National Center for Biotechnology Information’s dbSNPs database, the National Cancer Institute’s CGAP Genetic Annotation Initiative SNP Database, and the Karolinska Institute Human Genic Bi-Allelic Sequences Database. At present, the largest dataset on human variation is being generated by the International HapMap Project,4 which is genotyping millions of SNPs on 270 individuals from 4 geographically separated sites from around the world. The International HapMap Project has greatly increased the number of validated SNPs available to the research community to be used to study human variation and is producing a map of genomic haplotypes in four populations with ancestry from parts of Africa, Asia, and Europe. In addi- tion, high-throughput methods of genotyping large numbers of SNPs (thou- sands) in large epidemiological cohorts are only now becoming available 4See www.hapmap.org.

OCR for page 44
65 GENETICS AND HEALTH (see above). Unfortunately, high-throughput methods of measuring the en- vironment have not kept a similar pace. For many studies of common disease, a rate-limiting step to increasing our understanding will continue to be the difficult and costly measurement of environmental factors. Finally, progress also has been hampered because of a lack of adequate investment in developing new methods of analysis that can incorporate the high-dimensional biological reality that we can now measure. The complex genetic and environmental architecture of multifactorial diseases is not easily detected or deciphered using the traditional statistical modeling meth- ods that are focused on the estimation of a single overall model of disease for a population. For example, using traditional logistic regression methods it would be simply impossible to enter all the hundreds of genetic variations that are thought to be involved in CVD risk or in any of the other common disease complexes currently being studied. Beyond the obvious issues of power and overdetermination in such a large-scale model, we also do not know how to model or interpret interactions among many factors simulta- neously or how to incorporate the rare, large effects of some genes relative to the common, small effects of others. New modeling strategies that take advantage of advances in pattern recognition, machine learning, and sys- tems analysis (e.g., scale-free networks, Bayesian belief networks, random forest methods) are going to be needed in order to build more comprehen- sive, predictive models of these etiologically heterogeneous diseases. The field of human genetics, like many other disciplines, is in transi- tion, and there is much to be gained by joining forces with a wide range of other disciplines that are focused on improving prevention and reducing the disease burden in our populations. REFERENCES Altshuler D, Kruglyak L, Lander E. 1998. Genetic polymorphisms and disease. New England Journal of Medicine 338(22):1626. Ardlie KG, Lunetta KL, Seielstad M. 2002. Testing for population subdivision and associa- tion in four case-control studies. American Journal of Human Genetics 71(2):304-311. Armstrong VW, Shipkova M, von Ahsen N, Oellerich M. 2004. Analytic aspects of monitor- ing therapy with thiopurine medications. Therapeutic Drug Monitoring 26(2):220-226. Ashley-Koch A, Yang Q, Olney R. 2000. Sickle hemoglobin (Hb S) allele and sickle cell disease: A HuGE review. American Journal of Epidemiology 151(9):839-845. Bridges, K. 2002. Hemoglobinopathies (Hemoglobin Disorders). [Online]. Available: sickle. bwh.harvard.edu/hemoglobinopathy.html [accessed May 15, 2006]. Brown MS, Goldstein JL. 1981. Lowering plasma cholesterol by raising LDL receptors. New England Journal of Medicine 305(9):515-517. Cardon LR, Bell JI. 2001. Association study designs for complex diseases. Nature Reviews Genetics 2(2):91-99. Cheverud JM, Routman EJ. 1995. Epistasis and its contribution to genetic variance compo- nents. Genetics 139(3):1455-1461.

OCR for page 44
66 GENES, BEHAVIOR, AND THE SOCIAL ENVIRONMENT Clifford RJ, Edmonson MN, Nguyen C, Buetow KH. 2004. Large-scale analysis of non-synony- mous coding region single nucleotide polymorphisms. Bioinformatics 20(7):1006-1014. Cobleigh MA, Vogel CL, Tripathy D, Robert NJ, Scholl S, Fehrenbacher L, Wolter JM, Paton V, Shak S, Lieberman G, Slamon DJ. 1999. Multinational study of the efficacy and safety of humanized anti-HER2 monoclonal antibody in women who have HER2-overexpressing metastatic breast cancer that has progressed after chemotherapy for metastatic disease. Journal of Clinical Oncology 17(9):2639-2648. Deniz Naranjo MC, Munoz Fernandez C, Alemany Rodriguez MJ, Perez Vieitez MC, Irurita Latasa J, Suarez Armas R, Suarez Valentin MP, Sanchez Garcia F. 2004. Gender has a strong modulating effect on the risk of Alzheimer’s disease conferred by the apolipoprotein E gene in the population of the Canary Islands, Spain. Revista de Neurologia 38(7) 615-618. Evans WE, McLeod HL. 2003. Pharmacogenomics—drug disposition, drug targets, and side effects. New England Journal of Medicine 348(6):538-549. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. 2005. A genome-wide scalable SNP genotyping assay using microarray technology. Nature Genetics 37(5):549-554. Haines JL, Pericak-Vance MA. 1998. Approaches to Gene Mapping in Complex Human Dis- eases. New York: Wiley-Liss. Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King MC. 1990. Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250(4988):1684-1689. Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307(5712):1072-1079. IOM (Institute of Medicine). 2005. Implications of Genomics for Public Health. Washington, DC: The National Academies Press. Kalow W, Tang BK, Endrenyi L. 1998. Hypothesis: Comparisons of inter- and intra-individual variations can substitute for twin studies in drug research. Pharmacogenetics 8(4): 283-289. Kardia SL, Modell SM, Peyser PA. 2003. Family-centered approaches to understanding and preventing coronary heart disease. American Journal of Preventive Medicine 24(2): 143-151. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J. 2005. Comple- ment factor H polymorphism in age-related macular degeneration. Science 308(5720): 385-389. Lander ES, Schork NJ. 1994. Genetic dissection of complex traits. Science 265(5181): 2037-2048. Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, Rieder MJ, Gowrisankar S, Aronow BJ, Weiss RB, Nickerson DA. 2004. Pattern of sequence variation across 213 environmental response genes. Genome Research 14(10A):1821-1831. Mancama D, Kerwin RW. 2003. Role of pharmacogenomics in individualising treatment with SSRIs. CNS Drugs 17(3):143-151. Mathew C. 2001. Science medicine and the future—postgenomic technologies: Hunting the genes for common disorders. British Medical Journal 322(7293):1031-1034. McAdams HH, Arkin A. 1997. Stochastic mechanisms in gene expression. Proceedings of the National Academy of Sciences of the United States of America 94(3):814-819. Miller DP, Liu G, De Vivo I, Lynch TJ, Wain JC, Su L, Christiani DC. 2002. Combinations of the variant genotypes of GSTP1, GSTM1, and p53 are associated with an increased lung cancer risk. Cancer Research 62(10):2819-2823.

OCR for page 44
67 GENETICS AND HEALTH Moorjani S, Roy M, Gagne C, Davignon J, Brun D, Toussaint M, Lambert M, Campeau L, Blaichman S, Lupien P. 1989. Homozygous familial hypercholesterolemia among French Canadians in Quebec Province. Arteriosclerosis 9(2):211-216. Obach RS, Walsky RL, Venkatakrishnan K, Gaman EA, Houston JB, Tremaine LM. 2006. The utility of in vitro cytochrome P450 inhibition data in the prediction of drug-drug interac- tions. Journal of Pharmacology and Experimental Therapeutics 316(1):336-348. Pirmohamed M, Park BK. 2001. Genetic susceptibility to adverse drug reactions. Trends in Pharmacological Sciences 22(6):298-305. Pritchard JK, Rosenberg NA. 1999. Use of unlinked genetic markers to detect population stratification in association studies. American Journal of Human Genetics 65(1):220-228. Quinn CT, Miller ST. 2004. Risk factors and prediction of outcomes in children and adoles- cents who have sickle cell anemia. Hematology/Oncology Clinics of North America 18(6 SPEC.ISS.):1339-1354. Rebbeck TR, Walker AH, Phelan CM, Godwin AK, Buetow KH, Garber JE, Narod SA, Weber BL. 1997. Defining etiologic heterogeneity in breast cancer using genetic biomarkers. Progress in Clinical and Biological Research 396:53-61. Rimoin DL, Connor JM, Pyeritz RE, Korf BR, editors. 2002. Emery and Rimoin’s Principles and Practice of Medical Genetics Vol. 2. 4th edition. New York: Churchill Livingstone. Risch NJ. 2000. Searching for genetic determinants in the new millennium. Nature 405(6788): 847-856. Satten GA, Flanders WD, Yang Q. 2001. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. American Journal of Human Genetics 68(2):466-477. Sing CF, Stengard JH, Kardia SLR. 2003. Genes, environment, and cardiovascular disease. Arteriosclerosis, Thrombosis, and Vascular Biology 23:1190-1196. Smith, G. 2005. The Genomics Age: How DNA Technology Is Transforming the Way We Live and Who We Are. New York: AMACOM. Steinberg MH. 2005. Predicting clinical severity in sickle cell anaemia. British Journal of Haematology 129(4):465-481. Stuart MJ, Nagel RL. 2004. Sickle-cell disease. Lancet 364(9442):1343-1360. Syvanen AC. 2005. Toward genome-wide SNP genotyping. Nature Genetics (37 Suppl):S5-S10. Thompson MW, McInnes RR, Willard, editors. 1991. Thompson & Thompson Genetics in Medicine. 5th edition. Philadelphia, PA: W.B. Saunders Company. Wacholder S, Rothman N, Caporaso N. 2000. Population stratification in epidemiologic stud- ies of common genetic variants and cancer: Quantification of bias. Journal of the National Cancer Institute 92(14):1151-1158. Wang X, Tomso DJ, Liu X, Bell DA. 2005. Single nucleotide polymorphism in transcriptional regulatory regions and expression of environmentally responsive genes. Toxicology and Applied Pharmacology 207(2 Suppl):84-90. Wang Z, Fan H, Yang HH, Hu Y, Buetow KH, Lee MP. 2004. Comparative sequence analysis of imprinted genes between human and mouse to reveal imprinting signatures. Genomics 83(3):395-401. Weinshilboum R. 2003. Inheritance and drug response. New England Journal of Medicine 348(6):529-537. Wu JH, Lo SK, Wen MS, Kao JT. 2002. Characterization of apolipoprotein E genetic variations in Taiwanese association with coronary heart disease and plasma lipid levels. Human Biology 74(1)25-31. Zhou W, Liu G, Miller DP, Thurston SW, Xu LL, Wain JC, Lynch TJ, Su L, Christiani DC. 2003. Polymorphisms in the DNA repair genes XRCC1 and ERCC2, smoking, and lung cancer risk. Cancer Epidemiology, Biomarkers and Prevention 12(4):359-365.