3 years ago

Predicting gene expression in massively parallel reporter assays: A comparative study

Predicting gene expression in massively parallel reporter assays: A comparative study
Sunduz Keles, Ryan Tewhey, Michael A Beer, Yue Li, Sunyoung Shin, Nir Yosef, Manolis Kellis, Jonathan Goke, Yuchun Guo, David K. Gifford, Nikola S. Mueller, Anat Kreimer, Talal Bin AMIN, Gökcen Eraslan, Nicholas A. Sinnott-Armstrong, Pardis C. Sabeti, Rahul Mohan, Rene Welch, Kevin Tian, Matthew D. Edwards, Anshul Kundaje, Michael Wainberg, Haoyang Zeng
In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus [eQTL]), does it in fact play a regulatory role? And if so, how is this role “coded” in its sequence? These questions were the subject of the Critical Assessment of Genome Interpretation eQTL challenge. Participants were given a set of sequences that flank eQTLs in humans and were asked to predict whether these are capable of regulating transcription (as evaluated by massively parallel reporter assays), and whether this capability changes between alternative alleles. Here, we report lessons learned from this community effort. By inspecting predictive properties in isolation, and conducting meta-analysis over the competing methods, we find that using chromatin accessibility and transcription factor binding as features in an ensemble of classifiers or regression models leads to the most accurate results. We then characterize the loci that are harder to predict, putting the spotlight on areas of weakness, which we expect to be the subject of future studies. Deciphering the functionality of the non-coding genome has been the focus of many recent studies, aiming to annotate regulatory regions and understand their specific role in disease and other phenotypes. The goal of the CAGI eQTL challenge was to predict the activity of candidate genomic regions (experimentally evaluated by massively parallel reporter assays). Our meta-analysis of competing submissions highlights features and models that lead to accurate prediction and points to areas for improvement.

Publisher URL: http://onlinelibrary.wiley.com/resolve/doi

DOI: 10.1002/humu.23197

You might also like
Never Miss Important Research

Researcher is an app designed by academics, for academics. Create a personalised feed in two minutes.
Choose from over 15,000 academics journals covering ten research areas then let Researcher deliver you papers tailored to your interests each day.

  • Download from Google Play
  • Download from App Store
  • Download from AppInChina

Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.