m:Explorer is a generic computational method for identifying process-specific gene regulators from high-throughput genomic data. It applies multinomial logistic regression models to select regulators whose target genes are highly predictive of process-related gene function. Target genes may be defined from heterogeneous data sources, and multiple process sub-classes are allowed. Some of the method’s predictions have been validated experimentally in the important model organism budding yeast (S.cerevisiae). Further details can be found in the publication below.
The method and the experimental validation of yeast gene regulators is a collaborative effort between the University of Tartu Department of Computer Science, Department of Molecular and Cell Biology (Estonia) and EMBL European Bioinformatics Institute (United Kingdom). The primary contributors include Jüri Reimand, Anu Aun, Jaak Vilo, Juan M. Vaquerizas, Juhan Sedman and Nicholas M. Luscombe. Tambet Arak has been helpful in setting up our homepage.
Jüri Reimand, Anu Aun, Jaak Vilo, Juan M. Vaquerizas, Juhan Sedman, Nicholas M. Luscombe : m:Explorer – multinomial regression models reveal positive and negative regulators of longevity in yeast quiescence (2012) Genome Biol 13:R55; doi: 10.1186/gb-2012-13-6-r55. [PDF].
New in version 1.0.0: A new function has been added to the package to simplify pathway analyses. Function prepare_gmt_input will read a GMT file and produce the input dataframe for m:Explorer. See function documentation for details.
- Let P be a process profile and T_1..T_j be regulator profiles of TFs. P classifies process-specific genes, while T_i contain regulatory data for TFs and their targets.
- Fit an intercept-only multinomial generalised linear regression model M_0=’P~1′ to represent uniform distribution of process-specific genes in P (the null model).
- Fit single predictor model M_i=’P~T_i’ for a TF, to represent TF-dependent distribution of the genes in P (the alternative model).
- To assess the significance of TF profile T_i in explaining P, compare null M_0 and alternative M_i models using a log-likelihood test with deviance and chi-square distribution.
- Repeat step 4 for all TFs. Correct resulting p-values for multiple testing.
Yeast TF Dataset
The yeast dataset applied here includes genome-wide targets for 285 yeast TFs, using three types of evidence: (i) differentially expressed genes from TF perturbation experiments on microarrays, (ii) TF binding sites in gene promoters from ChIP-chip and PBM experiments and computational predictions, and (iii) nucleosome positioning measurements in TFBS loci. All measurements are discretised using cut-offs of statistical significance, and grouped into categories like ‘upregulated’, ‘nucleosome-depleted binding-site’, or ‘no significant signal’. Two versions of nucleosome positioning are available — measured in rich medium (YPD) and in non-optimal medium (ethanol). We also provide subsets of these data where some sources of evidence have been excluded.
All files are in TAB-delimited text format and zipped. The following notation is used for TF-gene associations: ‘.’ not significant, ‘up’ upregulated in del-TF, ‘down’ downregulated in del-TF, ‘b’ binding site in promoter, ‘b ypd’ nucleosome-depleted binding site in promoter (ypd or eth). Combinations of regulatory features are also present, e.g., ‘down b eth’. Exceptional systematic gene IDs such as ‘YIL082W-A’ need to have dash replaced with period.
Full target dataset for all 285 TFs, including TF perturbation targets, binding sites and YPD nucleosome positioning data.
Full target dataset for all 285 TFs, including TF perturbation targets, binding sites and ethanol nucleosome positioning data.
Partial target dataset for all 285 TFs that only includes binding sites of two categories (nucleosome-depleted and non-depleted).
Partial target dataset for all 285 TFs that only includes binding sites with no additional information about nucleosome binding.
Partial target dataset for all 285 TFs that only includes up-regulated and downregulated target genes from perturbation microarrays.
Small testing dataset that includes full data for 15 cell cycle-related TFs (Mcm1, Mbp1, Swi4, Swi6, Fkh1, Fkh2, Swi5, Ace2, Ndd1, Stb1, Isw2, Ste12, Hms1, Hcm1, Yox1).
- Dataset details:
The TF target compendium for the budding yeast Saccharomyces cerevisiae was compiled using perturbation microarray data, TF-DNA binding profiles and nucleosome positioning measurements . Statistically significant target genes from delta-TF experiments were retrieved from our recent reanalysis  of a previously published dataset . High-confidence TF binding site (TFBS) profiles were assembled from earlier chromatin immunoprecipitation  and in silico [5,6] analyses as well as more recent refinements with protein-binding microarrays . TFBS profiles were further processed with in vivo nucleosome positioning measurements  to distinguish binding sites where lower nucleosome occupancy creates open chromatin structure. Such sites have higher regulatory potential as they become accessible to DNA-binding transcription factors.
These three complementary types of regulatory features were compiled into a compendium of 285 genome-wide regulator profiles that characterize the genome-wide binding preferences and perturbation signatures of these regulators. The profiles contain in total 128,656 gene-TF pairs with statistically significant evidence. The compendium includes 107 profiles with knockout data, 16 profiles with TFBS and 162 profiles with both types of evidence. In addition to 170 confirmed or putative TFs, we included data for cofactors, chromatin modifiers and other regulatory proteins. In accordance with previous observations, the agreement between TF perturbation and DNA-binding targets is sparse, as only 1.5% of all regulator-gene associations constitute differentially expressed targets with TFBS. In all cases, we used clear statistical procedures to distinguish regulatory features from insignificant noise. As a result, our regulator dataset is sparse and comprises statistically meaningful regulatory signals to 7.2% of approximately 1.8 million TF-gene pairs.
- References to data sources:
 J. Reimand, A.Aun, J. Vilo, J. M. Vaquerizas, J. Sedman, N. M. Luscombe, “m:Explorer – multinomial regression models reveal positive and negative regulators of longevity in yeast quiescence”, Genome Biol., 2012.
 J. Reimand, J. M. Vaquerizas, A. E. Todd, J. Vilo, and N. M. Luscombe, “Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets,” Nucleic Acids Res, 2010.
 Z. Hu, P. J. Killion, and V. R. Iyer, “Genetic reconstruction of a functional transcriptional regulatory network,” Nat. Genet. 2007.
 C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, E. G. Jennings, J. Zeitlinger, D. K. Pokholok, M. Kellis, P. A. Rolfe, K. T. Takusagawa, E. S. Lander, D. K. Gifford, E. Fraenkel, and R. A. Young, “Transcriptional regulatory code of a eukaryotic genome,” Nature, 2004.
 K. D. MacIsaac, T. Wang, D. B. Gordon, D. K. Gifford, G. D. Stormo, and E. Fraenkel, “An improved map of conserved regulatory sites for Saccharomyces cerevisiae,” BMC Bioinformatics, 2006.
 I. Erb and E. van Nimwegen, “Statistical features of yeast’s transcriptional regulatory code.,” IEEE Proceedings ICCSB, 2006.
 C. Zhu, K. J. Byers, R. P. McCord, Z. Shi, M. F. Berger, D. E. Newburger, K. Saulrieta, Z. Smith, M. V. Shah, M. Radhakrishnan, A. A. Philippakis, Y. Hu, F. De Masi, M. Pacek, A. Rolfs, T. Murthy, J. Labaer, and M. L. Bulyk, “High-resolution DNA-binding specificity analysis of yeast transcription factors,” Genome Res., Apr 2009.
 N. Kaplan, I. K. Moore, Y. Fondufe-Mittendorf, A. J. Gossett, D. Tillo, Y. Field, E. M. LeProust, T. R. Hughes, J. D. Lieb, J. Widom, and E. Segal, “The DNA-encoded nucleosome organization of a eukaryotic genome,” Nature, 2009.
Jüri Reimand PhD
University of Toronto