National Institutes of Health

Eunice Kennedy Shriver National Institute of Child Health and Human Development

2017 Annual Report of the Division of Intramural Research

The Biological Impact and Function of Transposable Elements

Henry Levin
  • Henry L. Levin, PhD, Head, Section on Eukaryotic Transposable Elements
  • Angela Atwood-Moore, BA, Senior Research Assistant
  • Caroline Esnault, PhD, Visiting Fellow
  • Feng Li, PhD, Visiting Fellow
  • Si Young Lee, PhD, Postdoctoral Fellow
  • Zelia Worman, PhD, Postdoctoral Fellow
  • Oluwadamilola Bankole, BA, Postbaccalaureate Fellow
  • Arianna Lechsinska, BA, Postbaccalaureate Fellow
  • Michael Lee, BA, Postbaccalaureate Fellow

Inherently mutagenic, the integration of retroviral and retrotransposon DNA is responsible for many pathologies, including malignancy. Given that some chromosomal regions are virtually gene free while others encode genes essential for cellular processes, the position of integration has great significance. Recent studies show clearly that integration occurs into specific types of sequences and that the targeting patterns vary depending on the specific retrovirus or retrotransposon. Currently, there is great interest in such patterns, in part because understanding the mechanisms that position HIV-1 insertions may lead to new antiviral therapies. In addition, retrovirus-based vectors are now being used for gene therapy. Early gene therapy vectors had patterns of integration that activated oncogenes and caused leukemia in patients. It is therefore essential to understand the mechanisms that position such integration. Our current work adopts methods of high-throughput sequencing to study dense integration patterns of model elements such as the long terminal repeat (LTR) retrotransposon Tf1 of Schizosaccharomyces pombe. This model element allows us to study integration mechanisms using highly informative techniques of yeast genetics (Reference 1). As an example, we generated an expression technique that tags each integration with a highly specific serial number. With this method, we sequenced 500,000 independent integration events. The improvements we made in sequencing technology are general and allowed us to generate dense profiles of HIV-1 integration. Our analyses of these datasets has greatly improved our understanding of integration and the mechanisms that select insertion sites.

Single nucleotide–specific targeting of the Tf1 retrotransposon promoted by the DNA–binding protein Sap1 of S. pombe

Our initial use of deep sequencing revealed that Tf1 integration favors the promoters of RNA polymerase II (RNA pol II)–transcribed genes. In particular, the promoters of stress-response genes are strong targets. As DNA sequencing methods improved, it became possible to map a million integration events of Tf1 within S. pombe. A significant shortcoming of these dense maps of integration is the inability to measure repeated insertions at specific nucleotide positions. This is because we and others discard duplicate sequence reads to avoid PCR–generated distortion. We addressed the problem by including a random eight-nucleotide serial number in the LTR of Tf1. With this method we can count the number of independent insertions at single nucleotide positions. While the serial number system identified specific sequence locations with high integration efficiency, sequence itself did not account for the selection of promoters. We had tested the transcription factors known to activate stress-response promoters and found that they do not contribute to the efficiency or position of Tf1 integration. However, a recent study of Switch-activating protein 1 (Sap1), an essential DNA–binding protein in S. pombe, showed that Sap1 binds to genomic positions where Tf1 integration occurs. To determine whether Sap1 plays a role in Tf1 retrotransposition, we studied S. pombe with the temperature-sensitive mutant sap1-1 (Reference 2). At permissive temperature, Tf1 transposition is reduced ten-fold compared with wild-type sap1+, and the defect was not the result of lower levels of Tf1 proteins or cDNA. The data argue that Sap1 contributes to the integration of Tf1. A mutation that results in 10-fold less integration might be expected to cause off-target integration. Indeed, serial number sequencing of integration in cells with the sap1-1 mutation showed position changes in 10% of the integration events.

In another approach to determine whether Sap1 contributes to integration, we compared the integration data from the serial number system with previously published maps of Sap1 binding created with ChIP-seq. Analysis of the ChIP-seq data showed that 6.85% of the S. pombe genome was bound by Sap1. Importantly, we found that 73.4% of Tf1 insertions occurred within these Sap1–bound sequences (Reference 2). An example of this close association can be seen in a segment of chromosome 1 (Figure 1). Another important observation is that a strong correlation was observed between levels of integration in intragenic sequences and the amount of Sap1 bound. If Sap1 were directly responsible for positioning Tf1 integration, we would expect integration to take place at specific nucleotide positions relative to the nucleotides bound by Sap1. Using the ChIP-Seq data, we were able to identify a Sap1–binding motif, which closely resembled previously published motifs. We used the FIMO program of the MEME Suite to perform genomic searches, which identified 5,013 locations that matched this motif. The alignment of all these motifs revealed that 82% of all integration events cluster within 1 kb of this motif. Importantly, 43% of all integrations occurred within 50 bp of the motif and they had two dominant positions: 9 bp upstream and 19 bp downstream of the motif. The clustering of inserts at the Sap1 motif would be expected to occur if Sap1 covers its binding site on the DNA and directs integration to either side of the protein. Thus far, we have been unable to detect a direct interaction between Sap1 and Tf1 integrase (IN) with pull-down assays. However, our two-hybrid assays detected a strong Sap1–IN interaction. The two-hybrid result together with the strong alignments of integration with Sap1 motif sequence and the reduction in integration in the sap1-1 mutant argue that Sap1 plays an important role in Tf1 integration.

Figure 1

Click image to enlarge.
Figure 1. Serial number integration data correlates with the position of Sap1 enrichment from ChIP-seq data.

A representative segment of chromosome 1 is shown.

Host factors that promote retrotransposon integration are similar in distantly related eukaryotes.

Retroviruses and LTR retrotransposons have distinct patterns of integration sites. The oncogenic potential of retrovirus-based vectors used in gene therapy is dependent on the selection of integration sites associated with promoters. The LTR-retrotransposon Tf1 of Schizosaccharomyces pombe is studied as a model for oncogenic retroviruses because it integrates into the promoters of stress-response genes. Although integrases (INs) encoded by retroviruses and LTR retrotransposons are responsible for catalyzing the insertion of cDNA into the host genome, distinct host factors are required for the efficiency and specificity of integration. Our finding that Sap1 is located at positions of integration but does not interact with integrase suggested that other host factors are required for integration. We tested this hypothesis with a genome-wide screen of host factors that promote Tf1 integration. By combining an assay for transposition with a genetic assay that measures cDNA present in the nucleus, we could identify factors that contribute to integration. We used this assay to test a collection of 3,004 S. pombe strains with single gene deletions (Reference 3). Using these screens and immunoblot measures of Tf1 proteins, we identified a total of 61 genes that promote integration. The candidate integration factors participate in a range of processes including nuclear transport, transcription, mRNA processing, vesicle transport, chromatin structure, and DNA repair. Two candidates, Rhp18 and the NineTeen complex, were tested in two-hybrid assays and were found to interact with Tf1 IN. Surprisingly, a number of pathways we identified were found previously to promote integration of the LTR retrotransposons Ty1 and Ty3 in Saccharomyces cerevisiae, indicating that the contribution of host factors to integration is common among distantly related organisms. The DNA repair factors are of particular interest because they may identify the pathways that repair the single-stranded gaps opposite integration sites of LTR retroelements.

A long terminal repeat retrotransposon of Schizosaccharomyces japonicus integrates upstream of RNA pol III–transcribed gene.

Transposable elements (TEs) are common constituents of centromeres. However, it is not known what causes this relationship. Schizosaccharomyces japonicus contains 10 families of LTR retrotransposons, elements that cluster in centromeres and telomeres. In the related yeast Schizosaccharomyces pombe, the LTR retrotransposons Tf1 and Tf2 are distributed in the promoter regions of RNA pol II–transcribed genes. Sequence analysis of TEs indicates that the retrotransposon Tj1 of S. japonicus is related to Tf1 and Tf2 and uses the same mechanism of self-primed reverse transcription. Thus, we wondered why these related retrotransposons localized in different regions of the genome.

To characterize the integration behavior of Tj1, we expressed it in S. pombe (Reference 3). We found that Tj1 was active and capable of generating de novo integration in the chromosomes of S. pombe. The expression of Tj1 is similar to Type C retroviruses in that a stop codon at the end of the Gag retroviral gene must be present for efficient integration. Seventeen inserts were sequenced; thirteen occurred within 12 bp upstream of tRNA genes and three occurred at other RNA pol III–transcribed genes. The link between Tj1 integration and RNA pol III transcription is reminiscent of Ty3, an LTR-retrotransposon of Saccharomyces cerevisiae, which interacts with the transcription factor TFIIIB and integrates upstream of tRNA genes. The integration of Tj1 upstream of tRNA genes and the centromeric clustering of tRNA genes in S. japonicus demonstrate that the clustering of this TE in centromere sequences is the result of a unique pattern of integration (Reference 3).

Retrotransposon Tf1 induces genetic adaptation to environmental stress.

Schizosaccharomyces pombe possesses a compact genome that tightly restricts retrotransposon expression under normal growth conditions. However, when the retrotransposon Tf1 is expressed, it integrates into promoters of RNA Pol II–transcribed genes and, in many cases, this increases transcription of adjacent genes. This result, together with the Tf1 preference for stress-response promoters, led to the idea that Tf1 could be beneficial to its host by creating a pool of new insertions that improve survival of environmental stress. We tested this hypothesis by studying the fitness of cells with genomic insertions of Tf1 when exposed to stress. Diverse cultures containing Tf1 integrated at 42,000 positions were grown competitively in cobalt. The proportion of cells with Tf1 at 141 positions greatly increased, suggesting that the integrations improved growth in cobalt. Analysis of the positions and reconstruction of strains with single insertions indicate that Tf1 integration improved growth in cobalt by inducing key regulators of the TOR pathway. The results provide strong evidence that retrotransposons have the potential to promote evolution, and they identify mechanisms that mitigate the toxicity of cobalt.

LEDGF/p75 interacts with mRNA splicing factors and targets HIV-1 integration to highly spliced gene.

The promise of immunotherapy of cancer using gene therapy relies on retroviral vectors to stably integrate the corrective/therapeutic sequences in the genomes of the patient’s cells. First-generation gene therapy used vectors derived from gamma retroviruses that were successful in correcting X-linked severe combined immunodeficiency (SCID-X1). However, the integration pattern had a bias for promoter sequences that resulted in the activation of proto-oncogenes and progression to T cell leukemia. Such adverse outcomes led to the use of lentivirus vectors for recent gene-therapy treatments. This switch to HIV-1–based vectors has occurred despite a fundamental lack of information about integration levels at specific genes, including proto-oncogenes. Structural and biochemical data show that HIV-1 integrase (IN) interacts with the host factor LEDGF/p75 (a chromatin-binding protein and transcription coactivator), and the interaction favors integration in the actively transcribed portions of genes (transcription units). However, little is known about how LEDGF/p75 recognizes transcribed sequences and whether cancer genes are favored.

To measure integration levels in individual transcription units and to identify the determinants of integration-site selection, we generated a high-density map of the integration sites of a single-round HIV-1 vector in HEK293T tissue culture cells (Reference 5). Improvements in sequencing methods allowed us to map 961,274 independent integration sites; most of the sites occurred in just 2,000 transcription units. Importantly, the 1,000 transcription units with the highest numbers of integration sites were highly enriched for cancer-associated genes, which raised concerns about the safety of using lentivirus vectors in gene therapy. Analysis of the integration site densities in transcription units (integration sites per kb) revealed a striking bias that favored transcription units that produced multiple spliced mRNAs and with transcription units that contain high numbers of introns (Figures 2A and 2B) (Reference 5). The correlations were independent of transcription levels, size of transcription units, and length of the introns. Analysis of previously published HIV-1 integration site data showed that integration density in transcription units in mouse embryonic fibroblasts also correlated strongly with intron number and that the correlation was absent from cells lacking LEDGF (Figures 2C,D). The data suggest that LEDGF/p75 not only tethers HIV-1 integrase to chromatin of active transcription units but also interacts with mRNA splicing factors. To test this, our collaborators Matthew Plumb and Mamuka Kvaratskhelia used tandem mass-spectrometry (MS-MS) to identify cellular proteins from nuclear extracts of HEK293T cells that interacted with GST-LEDGF/p75 (LEDGF/p75 tagged with glutathione S-transferase). The proteomic experiments found that LEDGF/p75 interacted with many components of the splicing machinery, including the small nuclear ribonucleic proteins (snRNP) SF3B1, SF3B2, and SF3B3 of U2 (a small nuclear RNA component of the spliceosome), U2–associated proteins PRPF8 and U2SURP, a factor of the U5 snRNP (SNRNP200), and many hnRNPs (heterologous ribonucleoproteins) that are associated with alternative splicing. The broad range of interactions with splicing factors suggested that LEDGF/p75 might contribute to splicing reactions. To test this, we performed RNAseq on HEK293T cells that were altered with TALEN endonucleases to truncate or delete PSIP1, the gene encoding LEDGF/p75. Analysis of transcription units that produced two or more spliced mRNA products showed that bi-allelic deletion of LEDGF/p75 significantly changed the ratio of spliced products in large numbers of transcription units. These results, together with our finding that integration in highly spliced transcription units was dependent on LEDGF, provide strong support for a model in which LEDGF/p75 interacts with splicing machinery and directs integration to highly spliced transcription units.

Figure 2

Click image to enlarge.
Figure 2. Integration density in transcription units correlates with amounts of splicing.

The numbers of HIV-1 integrations per kb in transcription units correlates with the amount of splicing (A and B). The preference for highly spliced transcription units depends on host factor LEDGF (C and D). MEFs, mouse embryonic fibroblasts; MRC, Matched Random Control.

Additional Funding

  • NIH Intramural AIDS Targeted Antiviral Program (2017 and 2018)


  1. Sangesland M, Atwood-Moore A, Rai SK, Levin HL. Qualitative and quantitative assays of transposition and homologous recombination of the retrotransposon Tf1 in Schizosaccharomyces pombe. Methods Mol Biol 2016 1400:117-130.
  2. Hickey A, Esnault C, Majumdar A, Chatterjee A, Iben J, McQueen P, Yang A, Mizuguchi T, Grewal S, Levin HL. Single nucleotide specific targeting of the Tf1 retrotransposon promoted by the DNA-binding protein Sap1 of Schizosaccharomyces pombe. Genetics 2015 201:905-924.
  3. Rai S, Sangesland M, Esnault C, Lee M, Chatterjee A, Levin HL. Host factors that promote retrotransposon integration are similar in distantly related eukaryotes. PLoS Genetics 2017 13(12):e1006775.
  4. Guo Y, Singh P, Levin HL. A long terminal repeat retrotransposon of Schizosaccharomyces japonicus integrates upstream of RNA pol III transcribed genes. Mob DNA 2015 6:19.
  5. Singh PK, Plumb MR, Ferris AL, Iben JB, Wu X, Fadel HJ, Luke BT, Esnault C, Poeschla EM, Hughes SH, Kvaratskhelia M, Levin HL. LEDGF/p75 interacts with mRNA splicing factors and targets HIV-1 integration to highly spliced genes. Genes Dev 2015 29:2287-2297.


  • Nancy Craig, PhD, The Johns Hopkins Medical School, Baltimore, MD
  • Shiv Grewel, PhD, Laboratory of Biochemistry and Molecular Biology, NCI, Bethesda, MD
  • Stephen Hughes, PhD, Retroviral Replication Laboratory, HIV Drug Resistance Program, NCI, Frederick, MD
  • Mamuka Kvaratskhelia, PhD, Ohio State University, Columbus, OH
  • Philip McQueen, PhD, Mathematical and Statistical Computing Laboratory, CIT, NIH, Bethesda, MD
  • Matthew Plumb, BS, Ohio State University, Columbus, Ohio
  • Eric M. Poeschla, MD, University of Colorado, Aurora, CO


For more information, email or visit

Top of Page