Skip to main content

National Institutes of Health

Eunice Kennedy Shriver National Institute of Child Health and Human Development

2023 Annual Report of the Division of Intramural Research

The Arms Race between Transposable Elements and KRAB-ZFPs and its Impact on Mammals

Todd Macfarlan
  • Todd S. Macfarlan, PhD, Head, Section on Mammalian Development and Evolution
  • Dawn Watkins-Chow, PhD, Staff Scientist
  • Sherry Ralls, BA, Biologist
  • Melania Bruno, PhD, Visiting Fellow
  • Jinpu Jin, PhD, Visiting Fellow
  • Anna Dorothea Senft, PhD, Visiting Fellow
  • Rachel Cosby, PhD, Postdoctoral Intramural Research Training Award Fellow
  • Shelley Dolitsky, MD, Clinical Fellow
  • Meghan Yamasaki, MD, Clinical Fellow
  • Jada Gonzales, BS, Postbaccalaureate Fellow
  • Sharaf Maisha, BS, Postbaccalaureate Fellow

The central mission of the NICHD is to ensure that every human is born healthy. Despite much progress in understanding the many ways the mother interacts with the fetus during development, we still know little about the molecular changes that promoted the emergence of placental mammals from our egg-laying relatives over 100 million years ago, nor about those mechanisms that continue to drive phenotypic differences amongst mammals. One attractive hypothesis is that retroviruses and their endogenization into the genomes of our ancestors played an important role in eutherian evolution, by providing protein-coding genes such as syncytins (derived from retroviral env genes that cause cell fusions in placental trophoblasts) and novel gene-regulatory sequences that contributed to mammalian-specific traits, including the evolution of the placenta. Our primary interest is to explore the impact of such endogenous retroviruses (ERVs), which account for about 10% of our genomic DNA, on embryonic development and on the evolution of new traits in mammals. This has led us to examine the rapidly evolving Kruppel-associated box zinc-finger protein (KZFP) family, the single largest family of transcription factors (TFs) in most, if not all, mammalian genomes. Our hypothesis is that KZFP gene expansion and diversification was driven primarily by the constant onslaught of ERVs and other transposable elements (TEs) on the genomes of our ancestors, as a means to transcriptionally repress them. The hypothesis is supported by recent evidence demonstrating that the majority of KZFPs bind to TEs and that TEs and nearby genes are activated in KZFP–knockout mice. We will continue to explore the impacts of the TE/KZFP “arms race” on the evolution of mammals. We will also begin a new phase of our research to explore whether KZFPs play broader roles in genome regulation, beyond gene silencing, and how such functions impact mammalian development and evolution.

Kruppel-associated box zinc-finger proteins (KRAB-ZFPs)

Kruppel-associated box zinc-finger (ZF) proteins (KRAB-ZFPs) are rapidly evolving transcriptional repressors, which emerged in a common ancestor of coelacanths, birds, and tetrapods; they constitute the largest family of transcription factors in mammals (estimated to be several hundred in mice and humans). Each species has its own unique repertoire of KRAB-ZFPs, with some shared by closely related species and others specific to each species. Remarkably, there was an explosion of KRAB-ZFP genes in the earliest mammals, many of which have been retained by purifying selection, but the function of these (as well as the hundreds of species-restricted KRAB-ZFPs) have been largely unexplored. KRAB-ZFPs consist of an N-terminal KRAB domain, which binds to the co-repressor KAP1, and a variable number of C-terminal C2H2 ZF domains that mediate sequence-specific DNA binding. KAP1 directly interacts with the KRAB domain, which recruits the histone methyltransferase (HMT) SETDB1 and heterochromatin protein 1 (HP1) to initiate heterochromatic silencing. Several lines of evidence point to a role for the KRAB-ZFP family in ERV silencing. First, the number of C2H2 ZF genes in mammals correlates with the number of ERVs. Second, the KRAB-ZFP protein ZFP809 was isolated based on its ability to bind to the primer-binding site for proline tRNA (PBSpro) of murine leukemia virus (MuLV). Third, deletion of the KRAB-ZFP co-repressors Trim28 or Setdb1 leads to activation of many ERVs. We therefore began a systematic interrogation of KRAB-ZFP function as a potential adaptive repression system against ERVs.

We began a systematic analysis of KRAB-ZFPs using a medium-throughput ChIP-seq screen and functional genomics of KRAB-ZFP clusters and individual KRAB-ZFP genes. Our ChIP-seq data demonstrate that the majority of recently evolved KRAB-ZFP genes interact with and repress distinct and partially overlapping ERVs and other retrotransposons targets. The hypothesis is strongly supported by the distinct ERV reactivation phenotypes we observed in mouse ESC (embryonic stem cell) lines lacking one of five of the largest KRAB-ZFP gene clusters. Furthermore, KRAB-ZFP cluster knockout (KO) mice are viable, but have elevated rates of somatic retrotransposition of specific retrotransposon families, providing the first direct genetic link between KRAB-ZFP gene diversification and retrotransposon mobility. In contrast to the young (species-restricted) KRAB-ZFPs, we found that the older KRAB-ZFPs (that are conserved across mammals) bind to genetic loci that have themselves undergone regulatory innovations during evolution. By systematically studying these KRAB-ZFP genes and their targets, we are uncovering regulatory innovations unique to placental mammals.

CTCF barrier–breaking by ZFP661 promotes protocadherin diversity in mammalian brains.

Mammalian brains are larger and more densely packed with neurons than those of reptiles, but the genetic mechanisms underlying the increased connection complexity amongst neurons are unclear. The expression diversity of clustered protocadherins (Pcdhs), which is controlled by CTCF (CCCTC [DNA sequence]–binding factor) and cohesin, is crucial for proper dendritic arborization and cortical connectivity in vertebrates. We identified a highly conserved and mammalian-restricted KRAB-ZFP, ZFP661, that binds antagonistically at CTCF barriers at the Pcdh locus, preventing CTCF from trapping cohesin. ZFP661 balances the usage of Pcdh isoforms and increases Pcdh expression diversity. We demonstrated that loss of Zfp661 causes cortical dendritic arborization defects and autism-like social deficits in mice. Our study reveals both a novel mechanism that regulates the trapping of cohesin by CTCF and a mammalian adaptation that promoted Pcdh expression diversity to accompany the expanded mammalian brain.

Figure 1. ZFP661 binds adjacent to a small subset of CTCF sites within loop anchors

Figure 1

Click image to view.

A. ZFP661 binding peaks overlap with a small subset of CTCF binding peaks.

B. Heat maps of ChIP-seq signal across ZFP661 binding sites indicate strong overlap with CTCF and the cohesin subunit Rad21

C. ZFP661 binding is found adjacent to CTCF binding motifs, within loop anchors, at the same position at which cohesin is typically trapped.

Dual histone methyl readers ZCWPW2 and ZCWPW1 connect PRDM9 to DNA double-strand breaks and their repair during meiotic recombination.

We also began a new exploration of the function of PRDM9, the most ancient KRAB-ZFP, which emerged in jawless fish and which plays a highly specialized role in meiotic recombination (MR). MR generates genetic diversity in sexually reproducing organisms and ensures proper synapsis and segregation of homologous chromosomes in gametes. Errors in MR that lead to mis-segregation of chromosomes are a leading cause of miscarriage and childhood disease. MR is initiated by programmed double-strand breaks (DSBs) in DNA that are distributed non-randomly at thousands of specific 1–2 kb regions called hotspots. In most mammals, hotspots are defined by PRDM9, a protein that contains a rapidly evolving DNA–binding ZF array and a specialized HMT (histone methyltransferase) activity that catalyzes dual trimethylation marks on histone H3 at lysine 4 and 36 (H3K4me3 and H3K36me3), both of whose activities are required for hotspot specification. Prdm9 loss-of-function causes sterility in mice, and PRDM9 mutations have been associated with male infertility in humans. In species lacking Prdm9, including yeast, plants, and birds, hotspots are located in H3K4me3–rich regions at gene promoters. Thus, the emergence of PRDM9 during evolution reshaped the MR landscape by relocating DSBs away from promoters to chromatin sites bound by the rapidly evolving PRDM9, which allowed for rapid interspecies hotspot diversification.

We set out to address whether other factors, in addition to PRDM9, are required to ‘re-engineer’ hotspot selection and how the DNA break and repair machinery is recruited to sites marked by PRDM9. We first identified the dual histone methylation reader Zcwpw1, which co-evolved with and is tightly co-expressed with Prdm9. Using a mouse model, we found that ZCWPW1 is an essential meiotic recombination factor required for efficient repair of PRDM9–dependent DSBs and for pairing homologous chromosomes in male mice. However, ZCWPW1 is not required for the initiation of DSBs at PRDM9 binding sites. Our results indicate that the evolution of a dual histone methylation writer (PRDM9) and reader (ZCWPW1) system in vertebrates remodeled genetic recombination hotspot selection from an ancestral static pattern near genes towards a flexible pattern controlled by the rapidly evolving DNA–binding activity of PRDM9. Since publishing these findings, we identified a Zcwpw1 paralog, which was initially mis-annotated in the mouse genome, called Zcwpw2. Importantly, in the past year, we found that Zcwpw2 is essential for both mouse meiosis and fertility in males and females, and that it is important for the efficient generation of double-strand breaks at hotspots relative to promoters. The studies have thus revealed a three-component system, comprising a rapidly evolving DNA–binding histone methyltransferase (PRDM9) and two dual histone methylation readers (ZCWPW2 and ZCWPW1), which play at least partially separable roles in mediating the PRDM9–dependent generation of DNA DSBs and their repair at meiotic recombination hotspots.

Additional Funding

  • NIGMS PRAT (Rachel Cosby)

Publications

  1. Hoge C, de Manuel M, Mahgoub M, Okami N, Fuller Z, Banerjee S, Baker Z, McNulty M, Andolfatto P, Macfarlan TS, Schumer M, Tzika AC, Przeworski M. Patterns of recombination in snakes reveal a tug of war between PRDM9 and promoter-like features. bioRxiv 2023 07.11.548536.
  2. Jin J, Ralls S, Wu E, Wolf G, Sun MA, Springer DA, Cosby RL, Senft AD, Macfarlan TS. CTCF barrier breaking by ZFP661 promotes protocadherin diversity in mammalian brains. bioRxiv 2023 https://doi.org/10.1101/2023.05.08.539838.
  3. Xie G, Lee JE, Senft AD, Park YK, Jang Y, Chakraborty S, Thompson JJ, McKernan K, Liu C, Macfarlan TS, Rocha PP, Peng W, Ge K. MLL3/MLL4 methyltransferase activities control early embryonic development and embryonic stem cell differentiation in a lineage-selective manner. Nat Genet 2023 55(4):693–705.
  4. Du C, Jiang J, Li Y, Yu M, Jin J, Chen S, Fan H, Macfarlan TS, Cao B, Sun MA. Regulation of endogenous retrovirus-derived regulatory elements by GATA2/3 and MSX2 in human trophoblast stem cells. Genome Res 2023 33(2):197–207.

Collaborators

  • Molly Przeworski, PhD, Columbia University, New York, NY

Contact

For more information, email todd.macfarlan@nih.gov or visit https://www.nichd.nih.gov/research/atNICHD/Investigators/macfarlan.

Top of Page