Multiplexed super-resolution imaging of genes
Curran Oi
The heat shock response is highly conserved across eukaryotes and critical for cell fitness. Moreover, cancer cells use the heat shock machinery to improve their robustness and virulence. Despite extensive study over 50 years, much remains unknown about the mechanism of heat shock response. Chromatin conformation capture and fixed cell imaging have shown that some genes upregulated during heat shock associate with one another during heat shock. This observation raises many important questions, including: How does heat shock reorganize the nucleus? How is this process coordinated across multiple genes? Are the interactions between genetic loci 1:1 or higher order? Are these interactions transient? And are the associated genes also bound to heat shock-related proteins?
To study nuclear reorganization of genes, we are adapting the LIVE-PAINT super-resolution imaging approach for labeling genes, as well as proteins. In LIVE-PAINT, a short peptide (1-5 kDa) is fused to a target protein and a peptide-binding protein is fused to a fluorescent protein and expressed separately. The fluorescent protein is transiently recruited to the target protein by this peptide-protein interaction. This binding event generates a “blink” that can be fit to a point spread function. Repeated binding events generate super-resolution localization events that are compiled to produce a super-resolution image.
To study nuclear reorganization of genes, we are adapting the LIVE-PAINT super-resolution imaging approach for labeling genes, as well as proteins. In LIVE-PAINT, a short peptide (1-5 kDa) is fused to a target protein and a peptide-binding protein is fused to a fluorescent protein and expressed separately. The fluorescent protein is transiently recruited to the target protein by this peptide-protein interaction. This binding event generates a “blink” that can be fit to a point spread function. Repeated binding events generate super-resolution localization events that are compiled to produce a super-resolution image.
Figure 1. (A) LIVE-PAINT uses reversible peptide-protein interactions to transiently associate a fluorescent protein to a target protein. Diffraction-limited and super-resolution images of Cdc12 tagged and imaged in yeast are shown as an example. (B) LIVE-PAINT imaging of genes will be performed by integrating operator arrays (e.g. lacO) with n repeats near a gene of interest. The operator’s cognate repressor (LacI) will be expressed and fused to a peptide. Imaging is then performed as in (A).
To adapt this approach to visualize genes in live cells, we are integrating an operator array (e.g. lacO) with n repeats near a gene of interest. Then, we express the array’s cognate repressor protein (LacI) fused to a peptide and a peptide-binding protein fused to a fluorescent protein. Transient binding between the peptide and peptide-binding protein will generate super-resolution localizations at the gene of interest, just as in the original implementation of LIVE-PAINT.
We will then use LIVE-PAINT to visualize nuclear reorganization of genes during heat shock in yeast and distinguish between models of nuclear reorganization. We will utilize haploid, aneuploid, and diploid yeast to build a model for how genes associate during nuclear reorganization. Some possible models are outlined in the figure below.
To adapt this approach to visualize genes in live cells, we are integrating an operator array (e.g. lacO) with n repeats near a gene of interest. Then, we express the array’s cognate repressor protein (LacI) fused to a peptide and a peptide-binding protein fused to a fluorescent protein. Transient binding between the peptide and peptide-binding protein will generate super-resolution localizations at the gene of interest, just as in the original implementation of LIVE-PAINT.
We will then use LIVE-PAINT to visualize nuclear reorganization of genes during heat shock in yeast and distinguish between models of nuclear reorganization. We will utilize haploid, aneuploid, and diploid yeast to build a model for how genes associate during nuclear reorganization. Some possible models are outlined in the figure below.
Figure 2. Two genes (A and B) that associate A:B in a haploid strain may prefer self-association A:A and B:B in aneuploid and diploid strains, or may all associate together (e.g. A:A:B:B).
A processive CRISPR-guided base editing system
Ruth Groza
Efficient and precise in vivo mutagenesis of nucleic acids has been a longstanding objective of genetic engineering. Clustered regularly interspersed short palindromic repeat (CRISPR) guided base editing, a technology that couples Cas9 nucleases to nucleotide-modifying enzymes, offers a promising method of achieving that goal. However, current CRISPR-guided editors are limited to targeting regions within 350 nucleotides of protospacer adjacent motifs (PAMs), hampering their ability to diversify full-length genes (Halperin, S.O. et al., Nature 2018). We are seeking to develop a base editing system in S. cerevisiae that addresses this issue by combining the high processivity of a DNA helicase with the targeting precision of CRISPR, precluding the need for more expensive and labor-intensive gene tiling.
Our system hinges upon the recruitment of a chimeric protein to Cas9. In this chimera, a processive DNA modifying enzyme is fused to a base editing enzyme that is itself fused to the MS2 phage coat protein. Recruitment of the chimera to the target site is achieved via the interaction of the MS2 coat protein with MS2 RNA hairpins incorporated into the single guide RNA (sgRNA). In addition to recruiting the chimera, the sgRNA activates Cas9 and specifies its genomic target site. After initiating at the target site, the processive DNA modifying enzyme should move along the DNA in concert with the fused base editor as it installs mutations.
Our system hinges upon the recruitment of a chimeric protein to Cas9. In this chimera, a processive DNA modifying enzyme is fused to a base editing enzyme that is itself fused to the MS2 phage coat protein. Recruitment of the chimera to the target site is achieved via the interaction of the MS2 coat protein with MS2 RNA hairpins incorporated into the single guide RNA (sgRNA). In addition to recruiting the chimera, the sgRNA activates Cas9 and specifies its genomic target site. After initiating at the target site, the processive DNA modifying enzyme should move along the DNA in concert with the fused base editor as it installs mutations.
Figure 1. Schematic depicting a processive CRISPR-guided base editing system. A fusion of a processive DNA modifying enzyme, a base editor, and an MS2 coat protein is used to mutagenize DNA downstream of the Cas9 target site.
Uncovering gene regulatory elements in the plant genome
Tobias Jores, Mike Dorrity, and Josh Cuperus
With climate change threatening global food security, crop plants with higher yields and improved response to abiotic stresses are required to satisfy the needs of our rapidly increasing population. Many beneficial traits in domesticated crops arose through mutations in enhancers and promoters, cis-regulatory elements that govern tissue- and condition-specific expression. Genetic engineering of such elements is a promising strategy for future crop improvement. However, our knowledge of plant cis-regulatory elements and their influence on gene expression is limited. To facilitate the genome-wide identification of active plant regulatory elements, we adapted STARR-seq – a technology for the high-throughput identification of enhancers (Arnold et al., 2013, Science) – for its use in plants.
In a classical STARR-seq approach, DNA fragments are inserted in the 3'-UTR of a gene under the control of a minimal promoter. If the DNA fragment has enhancer activity, it can interact with the minimal promoter and thereby increase its own transcription. The amount of mRNA produced is, therefore, a direct readout of the enhancer activity of the inserted DNA fragment. In initial experiments, we tested whether the cauliflower mosaic virus 35S core enhancer can stimulate the activity of the 35S minimal promoter (Fig. 1). When the 35S enhancer was inserted into the 3'-UTR of the reporter gene, it did not increase the activity of the minimal promoter. In contrast, when cloned upstream of the minimal promoter, the 35S enhancer did lead to increased transcription of the reporter gene.
In a classical STARR-seq approach, DNA fragments are inserted in the 3'-UTR of a gene under the control of a minimal promoter. If the DNA fragment has enhancer activity, it can interact with the minimal promoter and thereby increase its own transcription. The amount of mRNA produced is, therefore, a direct readout of the enhancer activity of the inserted DNA fragment. In initial experiments, we tested whether the cauliflower mosaic virus 35S core enhancer can stimulate the activity of the 35S minimal promoter (Fig. 1). When the 35S enhancer was inserted into the 3'-UTR of the reporter gene, it did not increase the activity of the minimal promoter. In contrast, when cloned upstream of the minimal promoter, the 35S enhancer did lead to increased transcription of the reporter gene.
Therefore, we used a modified version of STARR-seq. DNA fragments of interest were inserted upstream of the 35S minimal promoter with a barcode inside the reporter gene that is linked to the individual fragments and reports on their enhancer activity. To test if this approach can identify plant enhancers and distinguish them from non-regulatory DNA (Fig. 2), we fragmented a plasmid harboring four enhancers (the viral 35S enhancer as well as enhancers of the pea AB80 and rbcS-E9 genes and the wheat Cab-1 gene) and inserted the fragments into the plant STARR vector (pPSup; https://www.addgene.org/149416/). This fragment library was then subjected to STARR-seq in transiently transfected tobacco leaves. Fragments derived from the regions corresponding to the four enhancers showed the strongest enrichment. We performed these experiments with tobacco leaves that were grown in the light or dark and, as expected, observed light-dependent changes in the activity of the three plant enhancers (AB80, Cab-1, and rbcS-E9). The 35S enhancer was active in both conditions.
We have begun applying the STARR-seq assay to DNA fragments derived from the genomes of Arabidopsis thaliana, maize and sorghum. One example for Arabidopsis is presented below (Fig. 3), where we see enrichment of intergenic regions. Some of these enrichment peaks overlap with known chromatin-accessible regions. Overall, peaks in our STARR-seq library have enrichment of regions that are classified in genomic categories similar to those of hypersensitive sites, with the notable exception of transposons. This outcome is expected, given that transposons are heavily methylated in planta but are not methylated in STARR-seq.
In addition to our efforts to identify enhancers in the genomes of Arabidopsis, maize and sorghum, we are also interested in characterizing plant core promotors. We array-synthesized the putative promoter regions of all Arabidopsis, maize and sorghum genes and tested their transcriptional activity in an assay similar to the one used for enhancers (Fig. 4). To compare promoter strengths across species, we tested all libraries in transiently transformed tobacco leaves and maize protoplasts. The promoters spanned a wide range of activity, with over 250-fold difference between the strongest and weakest promoters. Overall, the promoters of the dicot Arabidopsis tended to perform better in the dicot tobacco system, while the promoters of the monocots maize and sorghum showed greater activity in protoplasts of the monocot maize.
We analyzed the sequences of the plant core promoters to identify the underlying elements that control their activity. We observed that core promoter elements, like the TATA-box, as well as promoter GC content and promoter-proximal transcription factor binding sites influence promoter strength (Fig. 5). Using this data, we designed synthetic promoters by introducing core promoter elements into random sequences with a nucleotide composition optimized for the corresponding assay system.
The regulatory landscape of Arabidopsis thaliana roots at single-cell resolution
Mike Dorrity
Single-cell genomics provides a view of the state of each cell individually, allowing us to account for factors that confound typical bulk genomics experiments. These factors include cell type, cell cycle progression and developmental stage. With single cell measurements, each of these sources of heterogeneity in a typical biological sample can be deconvolved using the right tools to do so. As we profile single cells for molecular phenotypes beyond transcription, there is even greater potential to resolve individual cell states, but this resolution depends on our ability to leverage computational tools that integrate different types of data.
We used profiling of open chromatin by single cell ATAC-seq (scATAC) alongside previously generated transcriptional profiling data from single cell RNA-sequencing (scRNA) to discern cell states of the root of Arabidopsis thaliana, the continuously-developing tissue that is responsible for the acquisition of nutrients and water required for plant growth. These cell states can be visualized in a UMAP dimensionality reduction plot (Figure 1A). The power of scATAC-seq to identify cell type-specific accessible sites can be visualized simply by “pseudobulking” cells of each type and examining average cutcounts per site (Figure 1B).
We combined these scATAC-seq data with previous root scRNA-seq data (Ryu et al. 2019 Plant Physiol 179, 1444) to link cell states in these two data types. Using dataset integration tools (Stuart 2019 Cell 177, 1888), we generated a co-embedding of root cells from both scATAC-seq and scRNA-seq sources (Figure 2A). Using this co-embedding, we were able to predict gene expression levels in cells from our scATAC-seq experiment. With these integrated data, we asked whether we could we link the expression level of transcription factors, the regulatory proteins responsible for switching genes on and off, to the accessible sites that “open” up in root epidermis cells. We found that transcription factors like TTG2, which has an epidermis-specific expression pattern, can be linked to individual accessible regions of the genome, providing a first step toward building stepwise models of gene regulatory programs that are deployed during development (Figure 2B).
Empirical Estimation of Robustness from Deep Mutational Scans
Bryan Andrews (Collaboration with Peter Conlin and Ben Kerr)
Are genes more robust to errors than expected by chance?
Because of the way the genetic code is structured, different codons for the same amino acid have different ‘mutational neighborhoods.’ That is, if a single-base substitution occurs at an arginine codon, for instance, the set of possible new amino acids is very different if the original codon was an ‘cgg’ versus an ‘agg’, and the same is true for half of the amino acids: A, G, I, K, L, P, R, S, T & V.
Because of the way the genetic code is structured, different codons for the same amino acid have different ‘mutational neighborhoods.’ That is, if a single-base substitution occurs at an arginine codon, for instance, the set of possible new amino acids is very different if the original codon was an ‘cgg’ versus an ‘agg’, and the same is true for half of the amino acids: A, G, I, K, L, P, R, S, T & V.
As a result, the potential effects of a single-base substitution on a protein depends on what codons are used to encode that protein. In this project, in collaboration with Peter Conlin and Ben Kerr, we computationally reanalyzed published datasets in which proteins were subjected to Deep Mutational Scans, in order to test whether naturally occurring gene sequences are more robust to errors than would be expected by chance, a property we are calling ‘genetic robustness.’
In the above figure, we compare the robustness of the wild type sequences (green lines) to a distribution of 100,000 sequences encoding the same protein with randomly selected codons. With a few exceptions, we see broad evidence that actual genes are more robust than synonymous random encodings. Amino acids that are buried, that are empirically sensitive to mutation, and that are evolutionarily conserved all appear to be more robust than sites that tolerate mutations. Furthermore, gene-wide codon frequencies are not sufficient to explain the degree of robustness observed; most of the signal comes from where specific codons are used rather than how frequently they are used. We posit the existence of an evolutionary mechanism to select for sequences that are robust to errors, but cannot distinguish with our data whether selection acts to remove sequences that give rise to inherited errors (i.e. rare low-fitness offspring), or non-inherited errors such as mistranslation events.