Malladi_2020: Total functional score of enhancer elements identifies lineage-specific enhancers that drive differentiation of pancreatic cells

keywords

Enhancer, epigenome, gene regulation, pancreas, tissue-specific Transcription, transcription factor

Total functional score of enhancer elements identifies lineage-specific enhancers that drive differentiation of pancreatic cells

Abstract

Introduction

Lineage specification is dependent on the interactions of Transcription Factor (TF) and chromatin states at enhancers

Enhancers have been shown to share several common features

Recent genomic assays have shown that active enhancers are bound by RNA polymerase II (Pol II) and are transcribed, producing noncoding RNAs known as enhancer RNAs (eRNAs)

Enhancer Transcription (as measured by total RNA-seq, GRO-seq, or PRO-seq) can be used in the absence of any other genomic information to predict enhancer activity

Advances in technology have facilitated the large-scale functional characterization of enhancer activity and the annotation of TF-binding sites (TFBSs)

genome-wide in various cell types and tissues

Analyses that predict TFBSs

by chance throughout the genome and that TF binding is cell type specific

TFBSs, which are usually 4 to 12 nucleotides in length

Total Functional Score of Enhancer Elements (TFSEE)

Aimed to (1) evaluate TFSEE as an enhancer-calling algorithm and (2) understand the TF-driven transcriptional programs differentiating human embryonic stem cells (hESCs) into pancreatic cells

Figure 1.

Materials and Methods

Genomic data curation

Analysis of ChIP-seq data

Analysis of RNA-seq data

Analysis of GRO-seq data

Kernel density

Defining Transcription Start Site (TSS) and promoters

Enhancer calling by GRO-seq

Calling a universe of transcripts from GRO-seq data.

Calling active enhancers using GRO-seq-defined enhancer transcripts

Motif analyses for GRO-seq-defined enhancers

Enhancer calling by ChIP-seq

Calling active enhancers using histone modification ChIP-seq data

Motif analyses for ChIP-seq-defined enhancers

TODO Generating heatmaps and clusters

TODO Nearest neighboring gene analyses and box plots

TODO Overlapping enhancer analysis

Results

The TFSEE model

Step 1 - Method 1: enhancer calling based on enhancer transcripts defined by GRO-seq

Enhancer calling by GRO-seq

Step 1 - Method 2: enhancer calling based on histone modification defined by ChIP-seq

Enhancer calling by ChIP-seq

Step 2: Calculating enrichment and activity profiles

Figure 2. Data processing for Total Functional Score of Enhancer Elements (TFSEE) method

Steps 3 to 5: De novo motif searching and TF expression

Calculating the TFSEE score by data integration

Figure 3. Overview of Total Functional Score of Enhancer Elements (TFSEE) method

GRO-seq expression + H3K27ac enrichment + H3K4me1 enrichment = Enhancer Activity

Enhancer Activity x Motif prediction

Comparison of enhancer calls by methods 1 and 2

Figure 4. Comparison of approaches for genome-wide prediction of enhancers during pancreatic differentiation
Decided to focus on the enhancers identified based on enhancer transcription using GRO-seq data (method 1)

TFSEE identifies lineage-specific enhancers and their cognate TFs during pancreatic differentiation

TFSEE scores determined by using inputs from method 1

Figure 5. TFSEE identifies cell type-specific enhancers and their cognate TFs that drive gene expression during pancreatic differentiation

  1. Heatmap of the 5 stages of pandcreatic differenitation

  2. Box plots of normalized TFSEE score for clusters identified in pancreatic differentiation

  3. Cluster TFSEE scores

Figure 6. TFSEE-predicted TFs are enriched in pre- and late pancreatic differentiation

  1. TF expression

  2. Enhancer Trascription

  3. Nearest Neighboring Gene Expression

  4. Cluster 3 Rank Order of Enriched TFs

  5. Cluster 4 Rank Order of Enriched TFs

TODO TFSEE scores determined using inputs from method 2.

TODO Comparison of TF identification using inputs from method 1 or method 2

Discussion

Integrating additional genomic data into TFSEE

Conclusions