Skip to main content

Home > Section on Eukaryotic Transposable Elements

The Biological Impact and Function of Transposable Elements

Henry L. Levin, PhD
  • Henry L. Levin, PhD, Head, Section on Eukaryotic Transposable Elements
  • Angela Atwood-Moore, BA, Senior Research Assistant
  • Atreyi Chatterjee, PhD, Visiting Fellow
  • Caroline Esnault, PhD, Visiting Fellow
  • Sudhir Rai, PhD, Visiting Fellow
  • Parmit Singh, PhD, Visiting Fellow
  • Amnon Hizi, PhD, Visiting Professor
  • Gang Feng, BA, Graduate Student
  • Stephen Hung, BA, Postbaccalaureate Fellow
  • Elizabeth Humes, BA, Postbaccalaureate Fellow

Inherently mutagenic, the integration of retroviral and retrotransposon DNA is responsible for many pathologies, including malignancy. Given that some chromosomal regions are virtually gene free while others encode genes essential for cellular processes, the position of integration has great significance. Recent studies show clearly that integration occurs into specific types of sequences and that the targeting patterns vary depending on which retrovirus or retrotransposon is inserted. Currently, there is great interest in such patterns, in part because understanding the mechanisms that position HIV-1 insertions may lead to new antiviral therapies. In addition, retrovirus-based vectors are now being used for gene therapy. Early gene therapy vectors had patterns of integration that activated oncogenes and caused leukemia in three patients. Therefore, to gauge the risks associated with new gene therapy vectors, it is essential that we characterize in detail the positions of integration and understand the mechanisms that position such integration.

Determinants of the integration pattern of retrotransposon Tf1 in the fbp1 promoter of Schizosaccharomyces pombe

Figure 1

Click image to enlarge.

Figure 1. A plasmid-based assay for integration of Tf1 in fbp1
A. Linear diagram representing the fbp1 promoter in construct pHL2679. The construct has a 2.9 kb fbp1 sequence that includes UAS1 (in red), UAS2 (in red), and the fbp1 ORF (in yellow block arrow). Integration occurred in TW1 (green triangle) and TW2 (blue triangle). In parentheses are the coordinate numbers based on the wild-type (WT) fbp1 plasmid (pHL2679). The target plasmid assay was used to map integration in plasmids that contained WT fbp1(B), TW1 deleted (C), TW2 deleted (D), UAS2 deleted (E), and ORF deleted (F). The positions of integration events are shown as black lines with plasmid coordinate numbers based on the sequence of WT fbp1. The positions of TW1 and TW2 are shown for each plasmid. Restriction sites for BseYI (Bse), SpeI (SpeI), NgOMIV (N), PacI (P), BglII (Bgl), SbfI (SbfI), and BsrGI (Bsr) are shown.

A goal of our research is to identify the mechanism that directs integration to regions containing pol II promoters. Because of their extensive similarity to retroviruses, long terminal repeat (LTR) retrotransposons are excellent models for the study of integration. Tf1, an LTR retrotransposon of Schizosaccharomyces pombe, is similar to the murine leukemia virus in that it integrates into the promoters of pol II transcribed genes (1). To study insertion patterns of Tf1 in specific genes, we developed a target plasmid assay. The assay demonstrated that integration into the promoter of fbp1 clustered within 10 bp of a transcription enhancer called upstream activating sequence 1 (UAS1). Integration into the promoter of fbp1 depended on the UAS1 sequence and Atf1p, a transcription activator that binds to UAS1. To identify the key determinants responsible for targeting integration in the fbp1 promoter, we conducted an extensive study of the promoter sequences (2). We found that two discrete target windows close to UAS1 were the only sequences in the promoter required for the pattern of integration (Figure 1). The two target windows functioned independently, with each sufficient to act as an efficient target of integration. Although Atf1p is necessary for directing integration to UAS1, it may be that, by activating transcription, Atf1p induces subsequent steps of transcription that are more directly responsible for directing integration. If the role of Atf1p in integration were indirect, other factors that promote fbp1 transcription would also influence integration at this promoter. However, other known factors that mediate fbp1 transcription—Pcr1p, Rst2p, and Tup11p/Tup12p—did not contribute to integration. UAS2 is an independent enhancer in the promoter of fbp1 and not a target of integration. Nevertheless, we found that UAS2 did promote efficient transcription of fbp1. In addition, we found a synthetic promoter induced by lexA, fused to the activator VP16, was not a target of Tf1 integration. The data indicate that transcription activity of a promoter is not sufficient to mediate integration but that Atf1p plays a direct and specific role in targeting integration to UAS1 of the fbp1 promoter.

Ultra-high throughput sequencing of transposon integration with serial number technology provides a saturated profile of target activity in Schizosaccharomyces pombe.

The finding that integration in the genome of S. pombe is directed to the promoters of genes raises several important questions about the biology of Tf1 integration, namely, whether all promoters are recognized equally or whether integration is directed to specific sets of promoters. If specific sets of promoters are preferred targets, what distinguishes the preferred promoters from those not recognized by Tf1? To address this question, we sequenced large numbers of integrations throughout the genome of S. pombe (1). The revolutionary new methods for ultra-high throughput sequencing made it possible to characterize extraordinarily large numbers of integration events.

To select for the cells with integration events, we induced cells for the expression of Tf1-containing neo (Tf1–neo). We applied ligation-mediated PCR to generate libraries of Tf1–neo associated with the downstream flanking DNA. The amplified products, consisting of the downstream long terminal repeats (LTRs) and their flanking DNA, were size-selected and submitted for sequencing.

Over 90% of the insertions occurred within intergenic sequences that contained promoters. The profile of integration into the promoter regions revealed substantial variation. We observed reproducibly high levels of integration in 20% of the intergenic sequences in S. pombe. The strong bias is a consequence of the integration preference for a specific set of promoters. We found that there was no correlation between the promoters with the highest transcription activity and the promoters that had high levels of integration. However, the results of a gene ontology analysis revealed that genes regulated by environmental stress are preferred targets of integration.

The size and number of the integration experiments resulted in reproducible measures of integration for each intergenic region and ORF (open reading frame) in the S. pombe genome. However, to understand which factors could mediate integration within these promoters, we needed to know not only where the insertions occurred but how often integration occurred at each nucleotide position, which, to date, we have not been able to determine, because independent integration events at the same nucleotide would result in duplicate sequence reads that would be indistinguishable from the duplicates generated during the PCR amplification of the library. This year, we developed a technology that can measure the number of independent integration events that occur at single nucleotide positions. This technology, termed the serial number system, is based on randomizing eight base pairs in the tip of the Tf1 transposon. Each independent integration event is tagged with the "serial number" of the individual Tf1 element that was inserted. As a result, we can now record as many as 65,000 independent insertions at each nucleotide of the S. pombe genome. Our first application of this technique detected 1.2 million independent insertions and created a saturated and reproducible measure of integration at each nucleotide of S. pombe. The data are now being compared to the binding sites of transcription factors to identify factors that may mediate integration. Use of the serial number system can be generalized, and we are currently testing it to measure integration levels of retroviruses.

Transposon integration increases the expression of stress-response genes.

Transcription of transposons can be activated when cells are under stress, and environmental stress has been shown to induce integration events. Barbara McClintock put forward the intriguing hypothesis that transposon insertions triggered by conditions of stress may benefit the host by improving survival. However, this model remains unsubstantiated.

The LTR–retrotransposon Tf1 of Schizosaccharomyces pombe integrates into the promoters of pol II–transcribed genes. Saturated profiles of insertion sites revealed that Tf1 integrates with a preference for pol II promoters that are induced by environmental stresses (1). To determine the biological impact of integration we examined the effect of Tf1 integration on the expression of the adjacent genes (3). We studied 32 genes often targeted by Tf1 and found that integration did not reduce their expression. In six cases, Tf1 insertion actually increased the expression of adjacent genes by enhancing the levels of the native transcripts. In other cases, host factors that participate in genome surveillance, such as Upf1 and Abp1, were found to restrict the expression of genes that would otherwise have been enhanced by Tf1 insertion. We found that Tf1 transcription was induced by heat treatment and, interestingly, only genes that themselves were induced by heat could be activated by Tf1 integration. We propose that it is the synergy of Tf1 enhancer sequence with the stress-response elements of target promoters that results in gene activation. In support of this model, the motif identification software MEME identified a sequence that was present in the promoter of Tf1 and in the promoters of the six genes enhanced by Tf1 insertion. Importantly, this motif was not present in the 26 promoters that were unaffected by Tf1 insertion. Moreover, the motif is similar to the sequence known to be bound by Atf1, a stress-response transcription factor. Together, the findings indicate that Tf1 inserts can increase the expression of stress-response genes because Tf1 carries a copy of an enhancer that binds to the same factor/s that stimulate the stress-response genes. We therefore speculate that Tf1 integration has the potential to improve the survival of individual cells exposed to environmental stress.

Integration profiling: a whole-genome analysis of sequence function

Figure 2

Click image to enlarge.

Figure 2. Essential genes contained low numbers of integration events.
Few insertions (red lines) occurred in essential genes such as the cdc genes (yellow) compared with nonessential genes (green).

The existing genome-wide methods for testing gene function consist largely of microarray hybridization and deep sequencing of RNA, techniques that infer function based on patterns of gene expression. Despite the valuable information produced by these methods, they do not provide a direct demonstration of gene function. To address this need, we developed integration profiling—a simple method capable of directly probing the function of the single-copy sequences throughout the genome of a haploid eukaryote. With transposons that readily disrupt ORFs and sequencing technology that can position over 250 million insertions per reaction, the analysis of a single culture can identify which sequences in a eukaryotic genome are functional. In previous work, we found that the "cut and paste" DNA transposon Hermes from the housefly is highly active in S. pombe. The high rate of integration and the disruption of ORFs means that Hermes is suitable for mutagenesis studies. With integration profiling, large populations of cells with transposon insertions are grown for many generations, depleting the culture of cells that have insertions in genes important for division. In one experiment, we passaged cells for 74 generations until 13.4% of the cells in the final culture contained an integrated copy of Hermes. We determined the positions of the insertions in the culture by ligation-mediate PCR followed by Illumina sequencing. We identified 360,000 unique insertion events that produced an average of one insertion for every 29 bp of the S. pombe genome. A survey of known essential genes revealed very few insertions per ORF while neighboring nonessential gene ORFs had high numbers of insertions (Figure 2). Recently, a consortium systematically deleted the ORFs of S. pombe in heterozygous diploids and, after sporulation, designated which ORFs were essential (Kim et al. Nat Biotechnol 2010;28:617). Using these designations, we plotted the distribution of integration densities separately for the nonessential and essential ORFs. We also graphed the integration densities of a subclass of nonessential genes that, when deleted, resulted in small colonies. Clearly, the essential ORFs had significantly fewer insertions/kb than the nonessential ORFs, indicating that the integration profiles did indeed discriminate between essential and nonessential ORFs. Importantly, the nonessential ORFs required for full colony growth had intermediate densities of integration, indicating that intermediate levels of integration may be used to identify nonessential genes that nevertheless contribute to growth. The principal discrepancy between the designations made by the consortium and the Hermes integration is the group of 200 ORFs designated nonessential, which exhibited very low levels of integration. Using PCR and DNA blotting, we found that the majority of these consortium designations were incorrect because the genes had not been successfully deleted. The results validate integration profiling as an accurate method for measuring gene function.

Additional Funding

  • NIH Intramural AIDS Targeted Antiviral Program (2011 and 2012)


  • Guo Y, Levin H. High throughput sequencing of retrotransposon integration provides a saturated profile of target activity in Schizosaccharomyces pombe. Genome Res 2010;20:239-248.
  • Majumdar A, Chatterjee A, Ripmaster T, Levin H. The determinants that specify the integration pattern of retrotransposon Tf1 in the fbp1 promoter of Schizosaccharomyces pombe. J Virol 2011;85:519-529.
  • Feng G, Leem Y, Levin H. Transposon integration enhances expression of stress response genes. Nucl Acids Res 2012;submitted with favorable review.
  • Levin H, Moran J. Dynamic interactions between transposable elements and their hosts. Nat Rev Genet 2011;12:615-627.
  • Rhind N, Chen Z, Yassour M, Thompson DA, Haas BJ, Habib N, Wapinski I, Roy S, Lin MF, Heiman DI, Young SK, Furuya K, Guo Y, Pidoux A, Chen HM, Robbertse B, Goldberg JM, Aoki K, Bayne EH, Berlin AM, Desjardins CA, Dobbs E, Dukaj L, Fan L, FitzGerald MG, French C, Gujja S, Hansen K, Keifenheim D, Levin JZ, Mosher RA, Müller CA, Pfiffner J, Priest M, Russ C, Smialowska A, Swoboda P, Sykes SM, Vaughn M, Vengrova S, Yoder R, Zeng Q, Allshire R, Baulcombe D, Birren BW, Brown W, Ekwall K, Kellis M, Leatherwood J, Levin H, Margalit H, Martienssen R, Nieduszynski CA, Spatafora JW, Friedman N, Dalgaard JZ, Baumann P, Niki H, Regev A, Nusbaum C. Comparative functional genomics of the fission yeasts. Science 2011;332:930-936.


  • Nancy Craig, PhD, The Johns Hopkins Medical School, Baltimore, MD
  • Shiv Grewel, PhD, Laboratory of Biochemistry and Molecular Biology, NCI, Bethesda, MD


For more information, email or visit

Top of Page