VRMOD CRM Track Hub Description

Genomic DNA encodes regulatory information that determines where, when, and to what extent genes are expressed. Theoretically, we should be able to identify these transcriptional “instructions” by examining genomic DNA sequence alone, yet this capability has remained elusive. Here we present the Vertebrate Regulatory MOdule Detector (VRMOD), a method that accurately predicts gene expression regulatory sequences using only the query genomic sequences. We applied VRMOD to 309 genomes from the Ensembl database, generating a compendium of high-resolution, genome-position-fixed cis-regulatory modules without adjusting any parameters. We performed extensive computational evaluation and experimental validation of VRMOD predictions. Notably, VRMOD predicted three sub-enhancers within the human enhancer hs52 at the FTO locus from the VISTA database, one of which were missed by existing methods. Using a chicken embryo system combined with 3D tissue imaging, we showed that each sub-enhancer exhibits restricted spatiotemporal activity within specific subsets of tissues where the full enhancer is active. We further demonstrated the utility of VRMOD in identifying evolutionarily non-conserved enhancers, annotating regulatory sequences in non-model organisms, and identifying candidate disease-causal variants associated with Alzheimer’s disease. Collectively, our work provides a universal coordinate reference system for regulatory sequences across 309 genomes, analogous to annotated gene models for protein-coding sequences in the genomes. VRMOD thus enables genome-wide annotation of non-coding regulatory elements in any vertebrate species using genomic sequence information alone. The predicted cis-regulatory modules of the 309 genomes represent a significant resource for the research community.

Methods

Credits

Data were generated and processed at Washington University School of Medicine, St. Louis, MO. For inquiries, please contact us at the following address: gzhao (at) wustl.edu

References

Gonçalves T.M., Stewart C.L., Baxley S.D., Xu J., George B., Li D., Yang C., Gabel H.W., Piao X., Cruchaga C., Li Y.E., Wang T., Avraham O., Zhao G., Unlocking cis-regulatory landscapes across 500 million years of evolution and disease mechanisms. (In Revision)

Data Access

VRMOD CRM mm10 Big Bed File Download: https://genome.ucsc.edu/hubspace/41/gzhao/VRMOD_mm10/Mouse_VRMOD_mm10.bigBed

The data is stored in the binary BigBed format. The bigBedToBed tool accepts a file or the URL above as the input and converts it to text.

If you have an experimentally defined CRM that you would like to include in the ExpCRM collection, please provide the genomic location (hg38), the conditions under which the CRM is active (e.g. cell type, tissue, developmental stage etc ), and the corresponding reference.