rfPred : A new meta-score for functional prediction of missense variants in human exome

Fabienne Jabot-Hanin1*, Hugo Varet1,2, Frédéric Tores3 and Jean-Philippe Jais1,2,3,4

1Université Paris Descartes, 2Hôpital Necker-Enfants Malades, 3Institut Imagine, 4INSERM UMR872
*contact: fabienne.jabothanin (at)

Exome sequencing is becoming a standard tool for gene mapping of monogenic diseases. Given the vast amount of data generated by Next Generation Sequencing techniques, identification of disease causal variants is like finding a needle in a haystack. The impact assessment and the prioritization of potential pathogenic variants are expected to reduce work in biological validation.

rfPred is based on five previously described algorithms (SIFT, Polyphen2, LRT, PhyloP and MutationTaster) compiled in the dbNSFP database. A functional meta-score is derived from a random forest method trained on a dataset of 61,500 non-synonymous SNPs. On Two independent validation datasets, the random forest method appears to be globally better than each of the algorithms separately or in combination in a logistic regression model. rfPred scores have been pre-calculated on more than 80 millions positions of the human exome.

rfPred ROC curve

Using rfPred :

  • Download rfPred on Bioconductor or here in .tar.gz or in .zip;
  • Use the rfPred R package according to the rfPred vignette;
  • Optional: download exome variant database (Tabix file and index, about 3.3 GB) on your computer to send your queries locally (faster than sending them through the Internet);
  • Optional: download rfPred random forest model to compute rfPred scores from your own SIFT, Polyphen2, LRT, PhyloP and MutationTaster scores.

Creative Commons License
rfPred by F JABOT-HANIN, H VARET & JP JAIS is licensed under a Creative Commons Attribution 4.0 International License.