Why does gene duplication occur
Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats. Repetitive genetic elements, such as transposable elements, offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals.
Replication slippage is an error in DNA replication, which can produce duplications of short genetic sequences. During replication, DNA polymerase begins to copy the DNA, and at some point during the replication process, the polymerase dissociates from the DNA and replication stalls. When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once.
Replication slippage is also often facilitated by repetitive sequence but requires only a few bases of similarity. During cellular invasion by a replicating retroelement or retrovirus, viral proteins copy their genome by reverse transcribing RNA to DNA. If viral proteins attach irregularly to cellular mRNA, they can reverse-transcribe copies of genes to create retrogenes.
Retrogenes usually lack intronic sequence and often contain poly A sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions.
Aneuploidy occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes. Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions. Some aneuploid individuals are viable. For example, trisomy 21 in humans leads to Down syndrome, but it is not fatal.
Aneuploidy often alters gene dosage in ways that are detrimental to the organism and therefore, will not likely spread through populations. However, a growing body of data suggests that these changes occur on selective medium in cells that are not growing Hastings et al. Current evidence suggests that revertants arise from rare cells with multiple copies of the lac genes. Even though exchanges between short repeats may underlie many spontaneous duplications, the rate and dependencies of forming specific SJs are difficult to study because the selected events can occur among a very wide variety of site pairs.
This problem is minimized by assays that select for exchanges between specific short sequences that fuse two distant chromosomal regions and cause expression of a gene at the duplication junction. These assays severely limit the possible recombining sites to the few that form a join point, which provides the selectable phenotype. This method was first tested some time ago using a promoterless histidine operon with a silent hisD gene Anderson and Roth Most revertant duplications fused hisD gene to the end of the argAB operon kb away Anderson and Roth ; Shyamala et al.
The responsible exchanges occurred between two nearly identical bp repetitive extragenic palindromic REP elements present within both argAB and his operons.
Other less-frequent duplications fused hisD to alternative active promoters by exchanges between less-similar REP elements. In this assay, a recA mutation reduced the yield of arg-hisD duplications only about sixfold and some arg-hisD fusions were recovered among the residual revertants that arose without RecA. The contribution of RecA to these exchanges may reflect enhanced recovery of mutants that amplified the join-point element Conner It has been suggested that the RecA-independent duplications between REP elements may be generated by side activities of DNA gyrase or topoisomerase, which have been shown to bind and cleave REP elements Shyamala et al.
Duplications with SJs form without homologous recombination. How can exchanges occur between such short sequences sometimes zero base pairs? Some models involve single-strand annealing between overhanging short complementary sequences from widely separated parts of the chromosome that can pair without need for strand invasion RecA. The following mechanisms are not mutually exclusive, but some invoke discontinuous events, whereas others propose a multistep process.
All try to solve the basic problem of recombination-independent joining of separated short sequences. Illegitimate recombination has been defined as exchanges between sequences with little or no similarity, whether or not they lead to duplication or some other rearrangement type. Exchanges between one site within the prophage and a second site in the neighboring region of the E. These events have been placed into two classes.
Gyrase-mediated exchanges might also contribute to the REP-mediated events described above because gyrase has been found to cause recombination in vitro and in vivo Naito et al. Given the sensitivity of these assays and their association with phage growth, it is not clear how heavily these pathways contribute to gene duplication in the bacterial chromosome. The events described by Ikeda and coworkers could also involve processes, such as template switching or TID modification as described below.
This model produces duplications within a single chromosome without need for any genetic exchange between sister chromosomes Kugelberg et al. The basic TID unit is actually a triplication of a region with copies in alternating orientation head-to-head, tail-to-tail , whose formation is thought to be initiated at quasipalindromic sequences.
A symmetrical TID is diagrammed in Figure 6. Two mechanisms have been suggested to explain TID formation and are outlined below. Once the basic TID is formed, it can amplify by recombinational exchanges between the direct-order repeats that flank the central inverse-order copy, much like the amplification drawn for standard tandem duplications in Fig 1 B. Rearrangements of this type have been seen in two situations.
Formation of a tandem inversion duplication TID. Template switching to the opposite strand by this replication track would be aided by a second palindrome or closely placed inverse repeat. Resolution or replication leaves three copies of the intervening region—two copies in direct order with a central third copy in inverse order.
This same process can in principle operate at a single-strand nick far from a replication fork. The product is a symmetrical TID sTID whose two junctions have short parental palindromes that have been extended in the sTID and may be prone to remodeling by deletion Kugelberg et al.
It is proposed that observed asymmetric join points form when deletions remove the initial palindrome and leave an asymmetric join point generated at the site of the deletion. A single large deletion that removes both junctions and the central inverse-order copy can generate a simple-tandem repeat with a short-junction SJ sequence.
Another model achieves the same end point by template switching across two diverging replication forks Brewer et al. The same structures can be explained by the microhomology-mediated break-induced replication MMBIR model described below Hastings et al. The simplest example is a TID amplification found in yeast after generations of growth under selection for increased dosage of a sulfate transporter Araya et al.
The rearrangement has five tandem copies of the same chromosomal region in alternating orientations. The basic TID has two junction types, one between head-to-head copies and another between tail-to-tail copies. Each junction has a short quasipalindromic sequence that was present in the parent chromosome see Fig. In the symmetrical TID, these palindromes are extended through the entire inverse-order repeat. Two models to explain the origin of the TID are outlined below.
In this yeast example, the initial symmetrical TID was presumably amplified further by subsequent recombination between direct-order repeats within the TID. Unlike the symmetrical TID junctions seen in yeast, the asymmetrical bacterial TID junctions do not form extended symmetrical palindromes and have repeats in each orientation that are of different sizes see Fig. The asymmetric lac repeats can be explained as forming from an inferred initial symmetrical TID as observed in yeast , but with junctions that are later remodeled by deletions that remove the origin palindromic junctions and render the join points asymmetric see the bottom of Fig.
Formation of these deletions may be stimulated by the palindromic character of the junction Sinden et al. If the junctions are deleted individually, the final product is a TID with two asymmetric junctions Kugelberg et al.
The remodeling deletions arise between short direct repeats as is typical for deletion events. These short sequences were in inverse order in the parent but were brought into direct order by the TID. The model we favor to explain formation of the initial TID Fig.
Replication continues away from the fork until a switch is made back to original leading strand template see Fig. This produces a branched structure whose replication or breakage where indicated leaves a symmetrical TID of the type described above and amplified in yeast. The extended palindromic junctions are subject to frequent deletions, especially in bacteria, where torsional supercoiling may favor hairpin extrusion Sinden et al.
If a single deletion removes both TID junctions with the central repeat, the product is a simple tandem head-to-tail SJ duplication, which can amplify and become an unstable revertant with a standard direct-order tandem repeat. The key to the TID model is the formation of snap-back structures and their use in priming repair synthesis. Considerable evidence supports such replication at snap backs in phage, bacteria, and yeast Ripley ; Papanicolaou and Ripley ; Butler et al.
Similar synthesis from snap-back palindromic sequences has been suggested for the breakage—fusion—bridge BFB model described below. A head-to-head amplification resembling the TID structure described above was found in yeast following prolonged growth under selection Araya et al.
This replication extends away from the fork of origin and toward another fork moving away at the opposite end of a replication bubble. There it switches back to the original leading strand template with the aid of a second palindrome. Switches at both diverging forks produces a head-to-head dimeric circle, which can be extracted and integrated into the chromosome to yield a symmetrical TID.
Fork interactions at replication bubbles were also proposed to explain duplication by unequal translocation in human cancer cell lines Howarth et al. The formation of TID duplications has also been explained using a template-switching model that does not restrict template switches to replication fork regions Hastings et al. In several models, joining of dissimilar sequences is attributed to replication template switching.
Although these models can, in principle, explain the origin of TIDs with SJ sequences, they are mechanistically a bit tortuous and do little more than restate the features of the duplications they explain. A subsequent model, called microhomology-mediated break-induced replication MMBIR , builds on break-induced replication Anand et al.
This replication start juxtaposes sequence from the priming strand with that of the newly synthesized template complement. This illegitimate initiation is said to become more likely during growth inhibition because of repression of the enzymes responsible for homologous recombination Hastings et al.
The fork made in this way is unstable and subject to collapse during subsequent replication. The process of RecA-independent replication initiation has been shown in vitro Li and Marians ; Kurth et al. These models may account for the complex rearrangements inferred to occur in some metazoan genomes Liu et al. The models can also accommodate the TIDs found in bacteria and yeast. Some time ago, Barbara McClintock suggested the BFB cycle as a way of forming of inversion duplications and amplifications during mitosis McClintock In her model, a chromosome breaks before replication and the broken ends of two copies fuse to generate a dicentric chromosome see Fig.
At cell division, this dicentric breaks asymmetrically to form one chromosome with a terminal duplication and another with a corresponding deletion. The duplication-bearing chromosome lacks telomeres and replicates to form sisters that are subject to fusion and formation of another dicentric. Breakage then produces a chromosome with four copies of the repeat two pairs of inverse-order repeats.
Multiple mechanisms have been suggested to explain the breakage, fusion, and stabilization final product, but repetition of the BFB cycle continues to generate higher and higher copy number amplifications of inverse-order repeats.
Based on the behavior of chromosomes during development of the Tetrahymena macronucleus, a model was proposed in which a palindromic sequence produces a break.
Some possible mechanisms are in Figure 8. A terminal snap back at this end can prime replication leading to the formation of a dicentric chromosome that initiates the BFB cycle. Asymmetric breakage of the dicentric at cell division leaves an inversion duplication centered on the palindrome.
The telomeric ends of the original chromosome are lost. The final product carries inverse-order repeats of various size regions, whose junctions are symmetrical and subject to remodeling to form asymmetric junctions.
Palindrome-initiated amplification events of this type have been shown in yeast by Lobachev and coworkers Lobachev et al. It should be noted that snap-back primer extensions of the types suggested in Figure 8 may also occur at single-strand nicks that generate TIDs as described in Figure 7.
The breakage—fusion—bridge BFB cycle. Suggested many years ago by Barbara McClintock, this model explains the alternating orientation of copies seen in some amplification arrays.
Issues are the source of the initial breaks, the forces that break a dicentric, the mechanisms of end fusions, and the stabilization of an array by blocking further end fusions. Several of these issues have been solved conceptually by the behavior of palindromic sequences.
Use of palindromic sequences for induction of breaks and fusions in the breakage—fusion—bridge BFB model. The frequent association of palindromic sequences with amplifications in mammalian amplification suggested various ways in which they might contribute to the events in the BFB model.
A break generated near a palindrome left side, top can leave ends whose snap-back primes repair synthesis, and serves to generate a dicentric chromosome left side. A cruciform structure can be cut to leave snap-back ends that can similarly prime replication to form a dicentric.
Heavy black lines denote duplex DNA and lighter black lines denote single strands. Ends lacking telomeres are likely to be subject to fusion and continued rounds of the cycle. Most models for duplication formation, like those for point mutations, propose a single discontinuous event or a cascade of immediately sequential events with intermediate structures that cannot be inherited. However, several other duplication models described here involve multistep processes in which intermediate forms are heritable and therefore subject to remodeling and selection over multiple cell generations.
This is notably true of the TID formation process in which the initial symmetrical duplication can be remodeled by deletion and amplified over multiple generations. This is also true of the BFB model in which multiple cell generations may be required to increase repeat copy number. In such processes, duplications can form over several generations.
Selection can progressively favor steps in their initial formation and later modification, as they lead to their higher amplification. The basic idea is that the initial duplication may form at a high rate and provide some modest selective benefit in excess of its cost.
Cell growth is further improved if secondary rearrangements reduce the fitness cost, perhaps by removing selectively unimportant parts of the repeated unit. Lower cost allows higher selective amplification of a tandem duplication or a TID, thus improving growth.
In the case of a symmetrical TID, junctions are extended quasi-palindromes that are subject to cutting. These junctions can be stabilized by deletions that render the junctions asymmetric or remove the entire central inverse repeat to form a simple tandem head-to-tail SJ duplication. Such nearly perfect palindromes are known to stimulate their own deletion Sinden et al.
These deletions reduce duplication cost and increase the stability of the array. The intermediates in this process are all replicable, allowing duplications to be completed over several cell generations. The observed duplication and amplification junction sequences may reflect these secondary remodeling events rather than initial duplication formation. The remodeling of palindromic junctions may be especially true in bacteria, where torsional supercoiling can drive hairpin formation and contribute to deletions of symmetrical TID junctions.
This may explain the TID amplifications observed in the Cairns system. In eukaryotic chromosomes, toroidal supercoiling may minimize the rate of palindrome remodeling and allow quasipalindromic inversion junctions to persist.
This could explain why yeast retains unmodified sTID junctions, whereas bacterial TID junctions are usually asymmetric. However, double hairpin structures are invoked in models for chromosome breakage at palindromes in yeast and mammalian cells Akgun et al. These results suggest that even in eukaryotes, palindromes may stimulate formation of deletions that remodel duplications and make them easier to amplify selectively. The selective remodeling of duplication junctions may make it difficult to infer formation mechanisms from the structure of ancient or even recent segmental duplications.
The spacing and relative sizes of the repeated copies are likely to depend on the source of the breaks and the extent of selective modification. Amplifications are prominent genomic features of many types of malignant cells, particularly those of solid tumors Albertson Similarly, amplifications often confer resistance to inhibiters used in cancer chemotherapy.
Understanding how these amplifications are initiated and how they expand under selection is important to cancer prevention, therapy and in predicting the ultimate course of the disease. The models described above emphasize genetic methods to isolate duplications and distinguish them from subsequent modification and amplification events.
In contrast, cancer-associated amplifications are discovered in final form in genomes of malignant cells. Their structures are characterized and interpreted to suggest and support models for their formation. Many of the interpreted structures are likely to have developed during many generations under selection for improved growth. Data on amplifications in cancer cells come from different tumor types and from resistance selections of different stringencies imposed on cultured cell lines.
These results may not all be interpretable in terms of a single consistent model. However, one hopes that some unifying principles may ultimately emerge.
More importantly, one hopes that an understanding of amplification in microorganisms will help clarify the metazoan process. This seems possible because structures observed and characterized in bacteria and yeast are often similar to those in mammalian cells. Much of the amplification literature on cancer and some on bacteria Galhardo et al. This general viewpoint may also prompt models for sudden formation of amplified arrays rather than stepwise expansion of duplications under selection Hyrien et al.
Haber and Debatisse pointed out that secondary rearrangements complicate the interpretation of cancer cell amplifications. This problem increases if one considers that these structures can develop over many generations under selection.
Selection would appear especially important for amplification in situations that allow frequent recombination between direct repeats in sister chromosomes or homologs. In these situations, amplifications are unstable and continuous copy loss is likely to restrict achievable copy number and be opposed by positive selection.
Selection may favor changes that minimize loss rates and fitness costs so as to allow higher amplification of copy number. In somatic cells, infrequent mitotic recombination may limit copy increases within a tandem array.
However, this lower rate of copy number increase may be balanced by similar reduction in loss rate. In bacteria, the frequency of cells with an unselected duplication and arguably the degree of amplification of any particular array comes to a steady state.
This occurs when the rate of copy gain on one hand balances the rate of copy loss and fitness cost on the other hand Reams et al. Selection essentially reduces the fitness cost of these arrays and allows the steady-state level to increase. Thus, short amplifications may arise and persist in somatic cells before selection and then show rapid expansion and modification when selection is imposed, as might occur during tumor progression.
The effects of selection stringency and copy gain and loss rates may explain the evidence that amplifications are common in transformed cell lines but effectively absent from a normal somatic cell population Tlsty ; Wright et al. We suspect that the selections used in these tests demand cells that already have many copies of the targeted gene. In transformed lines, a higher mitotic recombination rate or a higher level of preexisting duplications may allow cells to expand a short preexisting array and become fully resistant.
Cells unable to expand their arrays many not survive the selection. That is, cells with more breaks, or less apoptosis in response to breaks, may show better amplification in response to stringent selection. It has been claimed that amplifications in mammalian cells are only initiated after imposition of selection Tlsty et al. This conclusion was based on the negative result of fluctuation tests designed to show preexisting genetic changes Luria Unlike point mutations, duplications and short amplifications come to steady-state frequencies because of their high reversion rates and fitness cost as described above Reams et al.
The forces responsible for these steady states obscure frequency differences between cultures caused by differences in the timing of the initial duplication event. That is, any frequency elevation attributable to an early duplication event is returned to the steady state and fluctuation is not seen.
This problem arose in the Cairns bacterial selection system Cairns and Foster , where the absence of fluctuation led to the initial conclusion that new mutants are initiated under selection. These tests could not detect preexisting copy number variants, which now seem likely to be responsible for initiating revertants Sano et al.
The predominant model for gene duplication in mammalian cells is the BFB cycle described above in Figures 7 and 8. Support for this model reflects its ability to account for several troublesome features of mammalian gene amplifications. Mammalian gene amplifications are often tandem arrays of copies in alternating orientation TID. Moreover, the TDs were also involved in the recognition of pollen and single organism reproductive processes, suggesting their potential roles in the process of self-incompatibility.
In addition, the PDs were also related to immune response and stimulus or stress responses, implying roles in plant adaptation. For instance, TDs may play an important role in the expansion of some transcription factor families Lehti-Shiu et al. Over Thus, the increasing number of TD- and PD-derived genes after WGD can enhance the level of plant resistance against to abiotic and biotic stresses. Gene dosage balance has been suggested an important driving force in maintaining WGD genes and increasing morphological complexity Freeling and Thomas, ; Birchler and Veitia, The purifying selection driven by dosage-balance constraints can eliminate the deleterious mutations and protect both gene copies from functional divergence.
This result can be largely explained by the dosage-balance hypothesis, which suggests that purifying selection maintains the ancestral functions of two gene copies and prevents the divergence of duplicate genes to maintain the stoichiometric balance. In addition, the duplicated genes involved in signal transduction, transcriptional regulation, and macromolecular complexes tend to be preferentially retained after WGD, which can be attributed to the dosage constraint Blanc and Wolfe, ; Paterson et al.
Here, the gene dosage-balance model is further supported by the enrichment in GO terms for regulatory and metabolic genes among the WGD duplicates detected in the pear genome. In summary, we identified the different modes of duplicated genes in pear genome. Widespread sequence, expression and regulatory divergence have occurred between duplicated genes over 30—45 million years of evolution after the recent WGD event in pear.
Different modes of duplicate genes exhibited biased functional roles. The results from this study enhance our understanding of the evolution and retention mechanisms of duplicated genes. SZ and XQ conceived and designed the experiments. XQ carried out the experimental design, data analysis, and drafted the manuscript.
JyW and JW contributed advice. SZ managed the research and experiments. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Arsovski, A. Evolution of cis-regulatory elements and regulatory networks in duplicated genes of Arabidopsis. Plant Physiol. Ashburner, M. Gene ontology: tool for the unification of biology. Google Scholar. Bekaert, M. Two-phase resolution of polyploidy in the Arabidopsis metabolic network gives rise to relative and absolute dosage constraints.
Plant Cell 23, — Birchler, J. The gene balance hypothesis: from classical genetics to modern genomics. Plant Cell 19, — Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Blanc, G. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16, — Bolger, A.
Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, — Bray, N. Near-optimal probabilistic RNA-seq quantification. Cardoso-Moreira, M. Evidence for the fixation of gene duplications by positive selection in Drosophila. Genome Res. Castillo-Davis, C. Conant, G.
Dosage, duplication, and diploidization: clarifying the interplay of multiple models for duplicate gene evolution over time. Plant Biol. Dassanayake, M. The genome of the extremophile crucifer Thellungiella parvula. Diss, G. Gene duplication can impart fragility, not robustness, in the yeast protein interaction network. Science , — Dodsworth, S. Is post-polyploidization diploidization the key to the evolutionary success of angiosperms? Dong, X. The role of membrane-bound ankyrin-repeat protein ACD6 in programmed cell death and plant defense.
STKE pe6. Du, J. Pericentromeric effects shape the patterns of divergence, retention, and expression of duplicated genes in the paleopolyploid soybean.
Plant Cell 24, 21— Eddy, S. PLOS Comput. Farre, D. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Fawcett, J. Plants with double genomes might have had a better chance to survive the Cretaceous—Tertiary extinction event.
Finn, R. Pfam: the protein families database. Nucleic Acids Res. Flagel, L. Gene duplication and evolutionary novelty in plants. New Phytol. Force, A. Preservation of duplicate genes by complementary, degenerative mutations. Genetics , — Freeling, M. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition.
Many or most genes in Arabidopsis transposed after the origin of the order Brassicales. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity.
Gene Ontology Consortium The gene ontology GO database and informatics resource. Goodstein, D. Phytozome: a comparative platform for green plant genomics. Gout, J. Maintenance and loss of duplicated genes by dosage subfunctionalization. Grishkevich, V. Gene length and expression level shape genomic novelties. Gu, Z. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. Guo, B. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication.
Ha, M. External factors accelerate expression divergence between duplicate genes. Hahn, M. Distinguishing among evolutionary models for the maintenance of gene duplicates.
Hanada, K. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. He, X. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution.
Huang, J. An ankyrin repeat-containing protein, characterized as a ubiquitin ligase, is closely associated with membrane-enclosed organelles and required for pollen germination and pollen tube growth in lily.
Hudson, C. Selection for higher gene copy number after different types of plant gene duplications. Genome Biol. Huerta-Cepas, J. Evidence for short-time divergence and long-time conservation of tissue-specific expression after gene duplication. Jiang, W. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.
Jung, S. The genome database for Rosaceae GDR : year 10 update. Abstract Gene duplication is an important mechanism for acquiring new genes and creating genetic novelty in organisms. Publication types Review. Substances Retroelements.
0コメント