Control of Gene Expression during Development
- Judith A. Kassis, PhD, Head, Section on Gene Expression
- J. Lesley Brown, PhD, Staff Scientist
During development and differentiation, genes become competent to be expressed or are stably silenced in an epigenetically heritable manner. The selective activation/repression of genes leads to differentiation of tissue types. Much evidence supports the model in which modifications of histones in chromatin contribute substantially to determining whether a gene is expressed. Two groups of genes, the Polycomb group (PcG) and Trithorax group (TrxG), are important for inheritance of the silenced and active chromatin state, respectively. In Drosophila, regulatory elements called Polycomb group response elements (PREs) are required for the recruitment of chromatin-modifying PcG protein complexes. TrxG proteins may act through the same or overlapping cis-acting sequences. Our group aims to understand how PcG and TrxG proteins are recruited to DNA. Toward that end, one major project in the lab has been to determine all sequences and DNA–binding proteins required for PRE activity. In the Drosophila genome, there are hundreds of PREs that regulate a similar number of genes, and it was not known whether all PREs are alike. Our data showed that there is functional and architectural diversity among PREs, suggesting that PREs adapt to the environment of the gene they regulate. PREs are made up of binding sites for several DNA–binding proteins. Over the years, our lab identified Pho, Pho-like, Spps, Croc, and Combgap as DNA proteins that bind to PREs. Our recent genome-wide studies show that different PREs require distinct DNA–binding proteins. In addition, our work illustrates the combinatorial nature and redundancy of PcG recruitment in Drosophila.
A second major project in the lab is to determine how the PREs of the invected-engrailed (inv-en) gene complex control these genes in their native location. Surprisingly, we found that not all PREs are required in vivo, suggesting a redundancy in PRE function. To understand the interplay between PREs and enhancers (sequences important for activation of gene expression), we completed an analysis of the regulatory DNA of the inv-en gene complex. We found that regulatory sequences are spread throughout a region of at least 79kb in that gene complex and that the same enhancers activate both engrailed and invected expression. In addition, we showed that a 79 kb transgene (HA-en79), which contains the en gene and flanking regulatory DNA, is able to rescue a deletion for the entire inv-en locus. Our current studies explore the effects of the chromosomal neighborhood on gene expression. We found that there are subtle differences in the gene expression program of the 79 kb transgene and the endogenous locus. Polycomb domains are flanked by active genes or by insulators that limit the size of the domain. We hypothesized that delimiting the size of a Polycomb domain contributes to the stability of both gene activation and repression, making gene expression reproducible and robust. We recently completed experiments showing that flanking the 79 kb transgene by insulator elements strengthens the expression of the transgene, providing evidence for our hypothesis. Thus, providing “ends” to the inv-en domain stabilizes both its “ON” and “OFF” transcriptional states.
We also recently completed an analysis of PcG protein binding, chromatin marks, and 3D chromatin structure of PcG target genes in the ON and OFF transcriptional states. One aspect of chromatin structure is the presence of loops between distant regulatory DNA. Our data show that PREs form loops with other PREs in both the ON and OFF transcriptional states. Further, our data also show that PREs loop with enhancers, data that are consistent with our genetic work that showed that PREs can facilitate enhancer-promoter communication and thus have been called promoter-tethering elements (PTEs).
Polycomb group response elements (PREs)
PcG proteins act in protein complexes that repress gene expression by modifying chromatin [Reference 1]. The best studied PcG protein complexes are PRC1 and PRC2. PRC2 contains the histone methyltransferase Enhancer of Zeste, which tri-methylates lysine 27 on histone H3 (H3K27me3). The chromatin mark H3K27me3 is the signature of PRC2 function. At most well studied genes, PRC2 acts with PRC1, which binds to H3K27me3, inhibits chromatin remodeling, and compacts chromatin. In Drosophila, PRC1 and PRC2 are recruited to the DNA by PREs. We are interested in determining how this occurs, and, to that end, we defined all the DNA sequences and DNA–binding proteins required for the activity of a single 181-bp PRE of the Drosophila engrailed gene (PRE2). We found that binding sites for seven different proteins are required for the activity of the PRE2 (Figure 1). There are several binding sites for some of these proteins. Different PREs have distinct architectures (Figure 1). Our laboratory identified four PRE DNA–binding proteins: Pho, Phol, Spps, Combgap, and recently collaborated on the characterization of the zinc-finger protein Crol as another PRE DNA–binding protein [Reference 2]. Clearly PREs are complex elements.
en PRE1 and 2 are from the engrailed gene; iab-7/Fab-7 PRE is from the Abd-B gene; eve PRE is from the even-skipped gene. The symbols represent consensus binding sites for the proteins indicated below (Figure reprinted from Brown JL, Kassis JA. Genetics 2013;195:433).
PRE activity can be studied in transgenes, where a single PRE can recruit PcG protein complexes and silence the expression of a reporter gene. In transgenes, mutation of binding sites for a single PRE–binding protein can obliterate its ability to recruit PcG proteins and to repress gene expression. Thus, transcriptional silencing by a single PRE in a transgene requires the combinatorial activity of many DNA–binding proteins. We were interested to determine what happens when one of the PRE DNA–binding proteins (the ‘recruiters’) from the genome is removed, and we examined the effect on PcG recruitment genome-wide. We studied PcG binding genome-wide in mutants that lack the recruiters Spps or Pho [Reference 3]. We found that PcG recruitment to some PREs was completely disrupted, whereas recruitment of PcG proteins was hardly diminished at most PREs. Most PcG target genes, which are covered by the chromatin mark H3K27me3, contain several PREs. We believe that the structure of the H3K27me3 domains stabilizes genomic PREs to the loss of one recruiter. However, there are different kinds of PREs, and some are uniquely sensitive to the loss of one recruiter. Our study highlights the complexity and diversity of PcG recruitment mechanisms.
We took another approach to address the function of PRE DNA–binding proteins and chromatin environment in PRE function. Early data showed that mutation of Pho binding sites in PREs in transgenes abrogated the ability of those PREs to repress gene expression. In contrast, genome-wide experiments in pho mutants or by Pho knockdown showed that PcG proteins can bind to PREs in the absence of Pho. What could account for these differences? We directly addressed the importance of Pho binding sites in two engrailed (en) PREs at the endogenous locus and in transgenes [Reference 4]. Our results showed that Pho binding sites are required for PRE activity in transgenes with a single PRE. In a transgene, two PREs together lead to stronger, more stable repression and confer some resistance to the loss of Pho binding sites. Making the same mutation in Pho binding sites has little effect on PcG–protein binding at the endogenous en gene. Overall, our data support the model that Pho is important for PcG binding but emphasize how multiple PREs and chromatin environment increase the ability of PREs to function in the absence of Pho. This supports the view that many mechanisms contribute to PcG recruitment in Drosophila.
The role of PREs at the en gene
The Drosophila engrailed (en) gene encodes a homeodomain protein that plays an important role in the development of many parts of the embryo, including formation of the segments, nervous system, head, and gut. By specifying the posterior compartment of each imaginal disc, en also plays a significant role in the development of the adult. Accordingly, en is expressed in a highly specific and complex manner in the developing organism. The en gene exists in a gene complex with invected (inv), an adjacent gene; inv encodes a protein with a nearly identical homeodomain; en and inv are co-regulated and express proteins with largely redundant functions. Unlike en, inv is dispensable for Drosophila viability in the laboratory.
The en and inv genes exist in a 113kb domain that is covered by the H3K27me3 chromatin mark (Figure 2). Within the en/inv domain there are four major PREs, which are strong peaks of PcG protein binding. One popular model posits that DNA–binding proteins bound to the PREs recruit PcG protein complexes and that PRC2 tri-methylates histone H3 throughout the domain until PRC2 comes to either an insulator or an actively transcribed gene. There are two PREs upstream of the en transcription unit, PRE1 and PRE2 (Figure 1). Both PREs reside within a 1.5kb fragment located from –1.9kb to –400bp upstream of the major en transcription start site. There are also two major inv PREs, one located at the promoter and another about 6kb upstream of that. Our laboratory showed that all these PREs have the functional properties attributed to PREs in transgenic assays. To test their function at the intact en-inv domain, we set out to delete these PREs from the genome. Given that PREs work as repressive elements, the predicted phenotype of a PRE deletion is gain-of-function ectopic expression. Unexpectedly, when we made a 1.5kb deletion removing PRE1 and PRE2, flies were viable and had a partial loss-of-function phenotype in the wing. Similarly, deletion of inv PREs yielded viable flies with no mis-expression of en or inv. Importantly, the H3K27me3 inv-en domain is not disrupted in either of these mutants.
The inv and en genes are covered with H3K27me3 and are transcriptionally silent. PcG proteins are associated with this domain. There are strong, constitutive PREs, as well as ‘weak,’ tissue-specific PREs. ‘Weak’ PREs often overlap enhancers and are active in some tissues but inactive in others. Actively transcribed genes remain segregated from the PcG domain and determine the limits of the PcG domain (Figure reprinted from Reference 3).
In Drosophila, PREs are easily recognizable in chromatin immunoprecipitation experiments as discrete peaks of PcG protein binding, but the H3K27me3 mark spreads throughout large regions. PcG proteins are conserved in mammals; however, PcG binding usually does not occur in sharp peaks, and PREs have been much harder to identify. We created a chromosome in which both the en and inv PREs are deleted. Surprisingly, the flies are viable, and there is no mis-expression of en or inv in embryos or larvae. The question arises as to how PcG proteins are recruited to the inv-en domain in the absence of these PREs. We performed chromatin-immunoprecipitation followed by Next-Gen sequencing (ChIP-seq) on the PcG proteins Pho and Polyhomeotic (Ph). The data showed that, in addition to the large Pho/Ph peaks at the known PREs, there are many smaller Pho/Ph peaks within the inv-en domain. We found that those peaks may also function as PREs. Thus, rather than a few PREs, there are many PREs controlling inv-en expression, and some may act in tissue-specific ways. Our work shows that there are two types of PREs in Drosophila: strong, constitutive PREs and tissue-specific PREs that tend to overlap with enhancers (Figure 2).
The inv-en gene complex is flanked by tou and E(Pc), two ubiquitously expressed genes (Figures 2 & 3). The H3K27me3 mark stops at these two genes. We believe that it is their transcription genes that blocks the spreading of the H3K27me3 mark and stabilizes the repression of inv and en by PcG proteins. To test this assumption, we made a large transgene marked by HA–tagged Engrailed protein. A 79-kb HA-en transgene was able to correctly express En and completely rescue inv-en double mutants. We inserted the transgene into other places in the Drosophila genome [Reference 4]. Our data showed that, while the information to form the H3K27me3 domain is contained within the 79-kb HA-en transgene, the structure of the H3K27me3 domains differs from that at the endogenous locus. Specifically, the H3K27me3 mark spreads beyond the transgene into flanking DNA. Further, enhancers within the 79-kb HA-en transgene could interact with some flanking genes and drive their expression in subsets of the En pattern. Also, removal of the PREs from the transgene led to loss of PcG silencing in the abdominal segments of the flies. These data provide evidence that the endogenous inv-en domain imparts stability to the locus and facilitates both transcriptional activation and silencing of these two developmentally important genes. Our recent experiments show that adding insulator elements that block the spreading of H3K27me3 and the activity of the inv-en enhancers stabilizes the 79-kb HA-en transgene, making it behave more like the endogenous locus [Reference 5].
Precise gene expression patterns are governed by a vast array of regulatory DNA.
Genes that control development are often used at different times and places in a developing embryo. Transcription of these important genes must be tightly regulated; therefore, these genes often have large arrays of regulatory DNA. In Drosophila, discrete fragments of DNA (enhancers) can be identified, which turn genes on in patterns in the early embryo. In cells in which the genes are transcriptionally ON, there are active modifications on chromatin, setting later enhancers in a transcription-permissive environment. In cells in which the genes are OFF, repressive chromatin marks keep later enhancers inactive. Our studies on the regulatory DNA of the inv-en gene have been highly informative.
Enhancers are often located tens or even hundreds of kb away from their promoter, sometimes even closer to the promoters of genes other than the one they activate. Several years ago, we showed that en enhancers can act over large distances, even skipping over other transcription units, choosing the en promoter over promoters of neighboring genes. Such specificity is achieved in at least three ways. First, early-acting enhancers that drive engrailed expression in stripes exhibit promoter specificity. Second, a proximal promoter-tethering element is required for the action of the imaginal disc enhancer (IDE); our data point to two partially redundant promoter-tethering elements. Third, the long-distance action of en enhancers requires a combination of the en promoter and sequences within or closely linked to the promoter-proximal PREs. The data show that several mechanisms ensure proper enhancer-promoter specificity at the Drosophila en locus, providing one of the first detailed views of how promoter-enhancer specificity is achieved.
H3K27me3, a mark deposited by PcG protein complex PRC2, is bound from the 3′ end of the tou gene to the 3′ end of the E(Pc) gene. Arrows indicate the direction and extent of the transcription units for the genes shown. H3K36me3 is a mark of actively transcribed genes and is bound to E(Pc) and tou. Samples from Drosophila 3rd instar larvae, brains, and discs. In these tissues, at least 80% of the cells do not express inv or en (data from Reference 4).
As a follow-up to these studies, we located all the enhancers that regulate the transcription of en and the closely linked co-regulated inv gene (Figure 4). Our dissection of inv-en–regulatory DNA showed that most enhancers are spread throughout a 62kb region. We used two types of construct to analyze the function of this DNA: P-element–based reporter constructs with small pieces of DNA fused to the en promoter driving lacZ expression (Figure 4); and large constructs with HA–tagged en and inv inserted in the genome with the phiC31 integrase. In addition, we generated deletions of inv and en DNA in situ and assayed their effects on inv/en expression. Our results support and extend our knowledge of inv-en regulation. First, inv and en share regulatory DNA, most of which flanks the en transcription unit. In support of this finding, a 79-kb HA-en transgene can rescue inv en double mutants into viable, fertile adults. In contrast, an 84-kb HA-inv transgene lacks most of the enhancers for inv and en expression. Second, there are several enhancers for inv/en stripes in embryos; some may be redundant, but others play discrete roles at different stages of embryonic development. Finally, no small reporter construct gave expression in the posterior compartment of imaginal discs, a hallmark of inv/en expression. Robust expression of HA-en in the posterior compartment of imaginal discs is evident from the 79-kb HA-en transgene, while a 45-kb HA-en transgene gives weaker, variable imaginal disc expression. We suggest that the activity of the imaginal disc enhancer(s) depends on the chromatin structure of the inv-en domain.
A. P-element vector (P[en]), used to assay the function of en–regulatory DNA, contains the en promoter, 396bp of upstream sequences, and an untranslated leader fusion between the en transcript and the Adh-lacZ reporter gene. inv/en DNA fragments were added to this vector at the location of the triangle.
B. The extent of each fragment cloned into P[en] is shown as a black line with a letter above the inv/en genomic DNA map (indicated by a long black line with hatch marks at 10kb intervals; numbers are coordinates on chromosome 2R, Genome Release v5). Expression pattern in embryos or the wing imaginal disc (wd) are shown above or below the genomic DNA, with arrows pointing to the fragment(s) that generate(s) the pattern (Figure reprinted from Cheng Y et al. Dev Biol 2014;395:131).
In recent work, we studied the activity of two engrailed imaginal disc enhancers (IDE) inside and outside the endogenous inv-en domain [Reference 5]. Inside the inv-en domain, IDEs drive expression of inv and en in the posterior compartment of imaginal discs (Figure 5). However, when the IDE is in a reporter gene located outside the inv-en domain, the reporter gene is expressed in the wrong part of the disc. We also showed that Engrailed itself binds to the IDE and represses its expression. Overall, our data show that the activity of enhancers can be greatly influenced by flanking regulatory DNA and the epigenetic state of its chromatin environment. In addition, Engrailed regulates its own expression level by binding to its own IDE.
Diagrams of a wing disc with expression (red shading) in either the posterior (P) or anterior (A) compartment.
Defining the ends of Polycomb domains in Drosophila
Actively transcribed genes flank many Polycomb domains, and previous genomic studies showed that inhibition of transcription using chemical inhibitors leads to a spreading of the chromatin mark H3K27me3 in the genome. We conducted a genome-wide analysis of Polycomb boundaries in Drosophila larvae [Reference 6]. We found six different types of Polycomb-domain boundaries, including those made by insulator proteins and actively transcribed genes. The inv-en Polycomb domain is flanked by two actively transcribed genes, E(Pc) and tou (Figure 3). Insertion of a transcriptional stop within the tou gene causes an extension of the H3K27me3 mark to the point of active transcription. We also suggest that active transcription limits the range of inv-en enhancers and that promoter specificity is important for inv-en enhancer activity [Reference 6].
Why is important that H3K27me3 domains have ends? We addressed this question by adding boundaries to the ends of our 79kb engrailed transgene [Reference 5]. In this paper, we showed that adding a boundary to the transgene confined the activity of the engrailed enhancers and increased the activity of the 79kb transgene. It addition, the boundary element strengthened Polycomb silencing of the transgene, making it more resilient to reduced PcG function. Overall, our data showed that a boundary element strengthened both the ON and OFF transcriptional states of the 79kb engrailed transgene.
PREs can also act as promoter-tethering elements (PTEs).
In the OFF transcriptional state, PREs recruit PcG protein complexes including PRC2 that tri-methylates H3K27, forming large H3K27me3 domains. In addition, PREs make loops in chromatin and the looping strengthens silencing. In a genome-wide study in two difference cell types [Brown JL et al. bioRxiv 2023;11.02.565256], we addressed the question of what PcG proteins bind to PREs when PcG target genes are expressed, and whether PREs loop when these genes are ON. Our data show that the answer to this question is PRE–specific, but general conclusions can be reached. First, within a PcG–target gene, some regulatory DNA can remain covered with H3K27me3 and PcG proteins remain bound to PREs in such regions. Second, when PREs are within H3K27ac domains, PcG binding decreases; however, this depends on the protein and PRE. The DNA–binding protein GAF, and the PcG protein Ph remain at PREs, even when other PcG proteins are greatly depleted. In the ON state, PREs can still loop with each other, but also form loops with presumptive enhancers. These data support the model in which, in addition to their role in PcG silencing, PREs can act as “promoter-tethering elements,” mediating interactions between promoter-proximal PREs and distant enhancers. Further, our studies provide genetic evidence of the PTE activity of engrailed PRE2. Overall, our work shows the importance of PREs to both the ON and OFF transcriptional states.
Publications
- Kassis JA, Kennison JA, Tamkun JW. Polycomb and Trithorax group genes in Drosophila. Genetics 2017 206:1699–1725.
- Erokhin M, Brown JL, Lomaev D, Vorobyeva NE, Zhang L, Fab LV, Mazina MY, Kulakovskiy IV, Ziganshin RH, Schedl P, Georgiev P, Sun MA, Kassis JA, Chetverina D. Crol contributes to PRE-mediated repression and Polycomb group protein recruitment in Drosophila. Nucleic Acids Res 2023 51(12):6087–6100.
- Brown JL, Sun M, Kassis JA. Global changes of H3K27me3 domains and Polycomb group protein distribution in the absence of recruiters Spps or Pho. Proc Natl Acad Sci USA 2018 115(8):E1839–E1848.
- Brown JL, Price JD, Erokhin M, Kassis JA. Context-dependent role of Pho binding sites in Polycomb complex recruitment in Drosophila. Genetics 2023 224:iyad096.
- Cheng Y, Chan F, Kassis JA. The activity of engrailed imaginal disc enhancers is modulated epigenetically by chromatin and autoregulation. PLoS Genetics 2023 19(11):e1010826.
- De S, Gehred ND, Fujioka M, Chan FW, Jaynes JB, Kassis, JA. Defining the boundaries of Polycomb domains in Drosophila. Genetics 2020 216:689–700.
Collaborator
- Karl Pfeifer, PhD, Section on Epigenetics, NICHD, Bethesda, MD
Contact
For more information, email jkassis@mail.nih.gov.