Skip to main content

Home > Section on Eukaryotic Transposable Elements

Retrotransposons as Models for the Replication of Retroviruses

Henry L. Levin, PhD
  • Henry L. Levin, PhD, Head, Section on Eukaryotic Transposable Elements
  • Angela Atwood-Moore, BA, Senior Research Assistant
  • Atreyi Chatterjee, PhD, Visiting Fellow
  • Yujin Cui, BA, Volunteer
  • Robin Cutler, BA, Postbaccalaureate Fellow
  • Hirotaka Ebina, PhD, Visiting Fellow
  • Gang Feng, BA, Graduate Student
  • Yabin Guo, PhD, Visiting Fellow
  • Young-Eun Leem, PhD, Visiting Fellow
  • Anusua Majumdar, PhD, Visiting Fellow
  • Jung Min Park, BA, Postbaccalaureate Fellow
  • Tracy Ripmaster, PhD, Research Assistant

Diseases caused by retroviruses, such as AIDS and leukemia, have intensified the need to understand how these viruses replicate. Our primary objective is to understand how retroviral cDNAs are integrated into the genome of infected cells. Because of their similarities to retroviruses, long terminal repeat (LTR) retrotransposons are important models for retrovirus replication. The retrotransposon under study in our laboratory is the Tf1 element of the fission yeast Schizosaccharomyces pombe. We are particularly interested in Tf1 because of its strong preference for integrating into pol II promoters. This choice of target sites is similar to the strong preferences of human immunodeficiency virus 1 (HIV-1) and murine leukemia virus (MLV) for integrating into pol II transcription units. Little is known about how these viruses recognize their target sites. We therefore study the integration of Tf1 as a model system from which we hope to uncover general mechanisms of target site selection. An understanding of the mechanisms responsible for targeted integration may lead to new approaches for blocking the replication of HIV-1.

The chromodomain of Tf1 integrase promotes binding to cDNA and mediates target site selection

To understand how integration is directed to pol II promoters, we study integrase (IN), the protein of Tf1 that catalyzes the breaking and joining of DNA that occurs at the target site. INs of retroviruses and LTR-retrotransposons contain three distinct domains: the N-terminal domain binds to Zn, the central domain is the catalytic core, and the C-terminal domain possesses nonspecific DNA binding activity. Although the C-terminal domain is the least conserved, some INs have a conserved chromodomain (CHD) similar to those present in a variety of eukaryotic proteins. These CHDs act as interaction modules for methylation marks such as histone H3 methylated at lysine 9. The presence of CHDs in retrotransposon INs suggests that they play a role in the selection of integration sites. We undertook to test the CHD of Tf1 for functions in transposition. In an active transposon expressed in S. pombe, we made substitutions in conserved residues of the CHD and generated a version of Tf1 with an IN that lacked the CHD. The single amino acid substitutions V1290, Y1292, and W1305 and the IN lacking the CHD all resulted in transposons with low transposition activity in vivo. However, blot analyses revealed that each of the mutant transposons produced normal levels of IN and cDNA. In addition, assays that measure homologous recombination of Tf1 sequences revealed that the mutations in the CHD did not inhibit the import of cDNA into the nucleus. Taken together, these results indicated that the mutations in the CHD inhibited transposition by disrupting the process of integration. A critical step in the assembly of the integration complex is the binding of IN to the ends of the cDNA. The technique of chromatin immunoprecipitation (ChIP) now makes it possible to measure the binding of specific proteins to DNA sequences in vivo. We applied a modified ChIP technique to measure the binding of IN to the ends of the cDNA. Tools of yeast genetics enabled allowed us, for the first time, to use ChIP to monitor the binding of any IN to cDNA in vivo. We compared wild-type Tf1 with the strain that expressed IN lacking the CHD. Truncation of the CHD resulted in a three-fold decrease in the binding of IN to the downstream LTR in the cDNA. Thus, it appears that the CHD plays an important role in mediating binding of IN to the cDNA.

Figure 1. The CHD is required to direct integration to the promoters of pol II genes.
Figure 1. The CHD is required to direct integration to the promoters of pol II genes.
(A) The plasmid-based targeting assay identified a 160-bp window of integration upstream of ade6. The f45 insertions were isolated for the wild-type (WT) Tf1 in a target plasmid that contained the bub1-ade6 sequence. (B) Transposition events for the Tf1 lacking the CHD no longer clustered within the 160-bp window.

To determine whether the mutations in IN caused defects in the recognition of pol II promoters, we used a plasmid-based targeting assay that measures the integration activity of specific sequences. The assay is based on a strain of S. pombe that contains both a plasmid that expresses Tf1 and one with the target sequence. The strong preference of Tf1 for integrating in the intergenic region containing the divergent promoters of ade6 and bub1 was established previously (Figure 1A). When the bub1-ade6 target plasmid is introduced into a strain of S. pombe that expresses wild-type Tf1, 95% (41 of 45) of the insertions in the plasmid occurred within a 160 bp window in the bub1-ade6 promoters. The strains with single amino acid mutations in the CHD exhibited modest defects in the proportion of their inserts that occurred in the promoter region. However, the Tf1 with IN lacking the CHD exhibited a dramatic defect in the targeting of integration (Figure 1B). To test whether the CHD functioned in directing integration at other promoters, we assayed two additional target plasmids. In both cases the Tf1 lacking the CHD showed a strong defect in integration in the promoter regions. These data indicate that Tf1 IN requires the CHD for target site preference.

The point mutants in the CHD and the IN lacking the CHD showed comparable reductions in transposition frequencies. However, only the truncated CHD caused a profound effect on target recognition. The disparity between truncation and point mutations in their ability to direct integration to the promoters indicates that the CHD has two separate functions. One function contributed significantly to the frequency of integration; the other, as revealed by the truncation, contributed to the selection of target sites.

Ultra-high throughput sequencing of transposon integration provides a saturated profile of target activity in Schizosaccharomyces pombe

The result that integration in the genome of S. pombe is directed to the promoters of genes raises several key questions about the biology of Tf1 integration, such as whether all promoters are recognized equally or whether integration is directed to specific sets of promoters; if specific sets of promoters are preferred targets, what distinguishes the preferred promoters from those not recognized by Tf1. To address these questions, large numbers of integrations throughout the genome of S. pombe must be sequenced. The revolutionary new methods for ultra-high throughput sequencing make it possible to characterize extraordinarily large numbers of integration events.

To create a genome-wide profile of integration sites, we sequenced large numbers of Tf1 inserts using the pyrosequencing technology of 454 Life Sciences™. Cells were induced for expression of Tf1-containing neo (Tf1-neo) to select for the cells with integration events. We applied ligation-mediated PCR to generate libraries of Tf1-neo associated with the downstream flanking DNA. In this study, we performed four independent transposition experiments (Hap_Mse_1, Hap_Mse_2, Dip_Mse, and Dip_Hpy ), which were named according to the strains (haploid or diploid) and restriction enzymes (Mse I or Hpy CH4 IV) used to digest the genomic DNA from the cells with integration events. The cut libraries of DNA were ligated to linkers, and subjected to barcoded PCR. The amplified products, consisting of the downstream LTRs and their flanking DNA, were size-selected and submitted to 454 Life Sciences for sequencing.

We obtained a total of 599,760 high-quality sequence reads that were then analyzed with BLAST to determine the chromosomal location of the insertions. In all, we identified 73,125 independent Tf1 integration events in unique positions of the S. pombe genome. The BLAST results of sequences from our first integration library identified 21,848 independent insertions termed Hap_Mse_2. The insertions were broadly distributed across all three chromosomes. To examine the insertion data for preferences, we mapped all 21,848 insertion sites from the Hap_Mse_2 experiment relative to ORFs (Figure 2) and determined the distance from the insertions to the closest ORF. The integration from Hap_Mse_2 showed a clear preference for the first 500 nt upstream of ORFs.

Figure 2. The distance from Tf1 integration sites to the nearest ORF.
Figure 2. The distance from Tf1 integration sites to the nearest ORF.
The X coordinate is the distance from the 5′ and 3′ ends of ORFs. The Y coordinate shows the number of integration events within bins of 100 bp.

The profile of integration across the genome revealed substantial variation, with some intervals containing 35 to 40 insertions per kb while many others had zero to five insertions per kb. An analysis of integration density for intervals of 10 kb also showed high levels of bias that were incompatible with random selection. The key question about this variation in integration is whether it was due to intrinsic differences in integration efficiency between different sequences in the genome or whether the size of our cultures and the PCR amplification limited our ability to sample the integration potential of each sequence. To distinguish between these two possibilities, we tested whether the levels of integration in individual intergenic sequences were reproducible between two independent experiments. We compared the numbers of integration events in the intergenic regions of the Hap_Mse_2 experiment with the numbers of integration events from the Dip_Mse experiment. Each intergenic region was plotted using the number of integration events identified in the Hap_Mse_2 experiment as the X coordinate and the number of inserts recorded in Dip_Mse experiment as the Y coordinate. Because each of the 5,045 intergenic regions was plotted, and many intergenic regions had the same X,Y coordinates, we used the Z coordinate to indicate the number of the intergenic regions that had the same X,Y coordinates. The planar distribution of the data points shows that the amount of integration in each intergenic region is similar between the two independent experiments. The R value for the data is 0.95 (R2=0.91), indicating that there is strong correlation of the integration levels between the two experiments. We performed the comparison between all pairs of the four experiments, and the plots showed similar correlations.

In the Hap_Mse_2 experiment, 76% of all the insertion events occurred in just 20% of the intergenic sequences. This strong bias is a consequence of the integration preference for a specific set of promoters. One possible explanation was that Tf1 integrated into the promoters with the highest transcription activity. We tested this hypothesis but found no correlation between transcription and integration. In another effort to determine what distinguishes promoters that had high levels of insertions from the promoters that did not, we asked whether the genes associated with the targeted promoters contributed to specific classes of biological function. The results of the gene ontology analysis suggested that genes regulated by environmental stress were among the strongest targets of integration. To examine this further, we sorted all the intergenic sequences from highest number of insertions to the lowest using the Hap_Mse_2 data. Using this order, the intergenic regions were placed into bins of 500 each. We then used published microarray data to tabulate how many of the intergenic regions in each bin contained promoters that are induced at least three-fold by conditions of stress. The bin containing the 500 intergenic regions with the most integration contained the highest number of genes induced by cadmium. The bins with successively lower amounts of integration contained fewer promoters that are induced by cadmium. This relationship indicates that integration has a preference for promoters that are induced by cadmium. Similar preferences were observed for genes induced when cells are treated with hydrogen peroxide or heat. Particularly strong preferences for integration into promoters induced by MMS or sorbitol were observed for the first bin of 500 intergenic regions. The targeting of Tf1 to stress-induced promoters represents a unique response that may function to specifically alter expression levels of stress response genes. Although there are no systematic data, integration of Tf1 into the promoter of ade6 and bub1 does stimulate transcription.

The size and number of the integration experiments reported here resulted in reproducible measures of integration for each intergenic region and ORF in the S. pombe genome. The reproducibility of the integration activity of each intergenic and ORF sequence from experiment to experiment demonstrates that we have saturated the full set of insertion sites that are actively targeted by Tf1. To our knowledge, this is the first time such a profile of integration data has been assembled.

Genome-wide footprinting of essential sequences in S. pombe

In previous work, we found that the “cut and paste” DNA transposon Hermes from the housefly is highly active in S. pombe. To measure transposition activity, the transposase was expressed in cells that also contained a plasmid-encoded neo flanked by the terminal inverted repeats (TIRs) of Hermes. The transposase cuts out neo with the TIRs and inserts this DNA into the S. pombe genome. The strains are then grown on agar plates containing G418 to select for cells that acquire transposed copies of Hermes. To accurately measure the level of transposition, we conducted a quantitative assay, which allowed us to calculate the number of transposition events per generation. We found that after approximately 25 cell generations, 1.5% to 2.75% of the cells contained insertions. Importantly, 54% of the insertions disrupted ORFs. The high rate of integration and the disruption of ORFs means that Hermes is suitable for mutagenesis studies. Pilot studies designed to identify mutations in ade6 and ade7 confirmed that Hermes can disrupt genes with frequencies that are consistent with a random distribution of insertions.

The highly efficient mutagenesis mediated by Hermes and the new technology of ultra-high throughput sequencing provided us with the opportunity to develop a unique strategy for mapping essential sequences throughout the genome of S. pombe. This strategy, if successful, could be used more generally to identify genes that function in specific processes. We realized that with one illumina® sequencing run and the strategy described above of ligation-mediated PCR, we could map the position of 7 million insertions. With a genome of 14 million base pairs, this would produce one insertion for every two base pairs of the genome. Given that S. pombe can be grown as a haploid, we predicted that insertions that disrupt essential sequences would be lethal and such events would not accumulate in our cultures. We have now tested this concept with one culture that contained 1 billion insertions. After five illumina® runs, we obtained 50 million high quality sequence reads. With these data, we identified 360,000 unique insertion events that divide the ORFs into one class that has a high average number of insertions and one that has significantly fewer insertions. Based on data available for genes that have been characterized, the class with low numbers of insertions are the essential ORFs. We are continuing to analyze the data to obtain a genome-wide profile of sequences essential for cell division. One compelling result that is emerging from our analysis is that this method may also provide quantitative measures of how much each gene contributes to cell division. Insertions that cause slow growth will be less prevalent in the cultures. We are now conducting a comprehensive test of this possibility.

Additional Funding

  • NIH Intramural AIDS Targeted Antiviral Program (2009 and 2010)
  • Office of Intramural Research & Training AIDS Research Fellowship Award (2009)

Publications

  • Ebina H, Judson R, Levin H. The GP(Y/F) domain of Tf1 integrase multimerizes when present in a fragment, and substitutions in this domain reduce enzymaticactivity of the full-length protein. J Biol Chem 2008 283:15965-15974.
  • Leem Y, Ripmaster T, Kelly F, Ebina H, Heincelman M, Zhang K, Grewal S, Hoffman C, Levin H. Retrotransposon Tf1 is targeted to pol II promoters by transcription activators. Mol Cell 2008 30:98-107.
  • Cam H, Noma K, Ebina H, Levin H, Grewal S. Host genome surveillance for retrotransposons by transposonderived proteins. Nature 2008 451:431-436.
  • Park J, Evertts A, Levin H. The Hermes transposon of Musca domestica and its use as a mutagen of Schizosaccharomyces pombe. Methods 2009 49:243-247.
  • Chatterjee AG, Leem Y, Kelly F, Levin H. The chromodomain of Tf1 integrase promotes binding to cDNA and mediates target site selection. J Virol 2009 83:2675-2685.

Collaborators

  • Shiv Grewel, PhD, Laboratory of Biochemistry and Molecular Biology, NCI, Bethesda, MD
  • Charles Hoffman, PhD, Boston College, Boston, MA

Contact

For more information, email henry_levin@nih.gov or visit sete.nichd.nih.gov.

Top of Page