1, How was FunRich database created?

For protein domains, the SMART1 database was used for the entire human proteome; for the gene ontology annotations including biological process, cellular component and molecular function Gene Ontology database, HPRD2, Entrez Gene3 and UniProt 4, were used. For protein-protein interactions, BioGRID5, Intact6, Human Proteinpedia7 and HPRD datasets were downloaded and mapped to Entrez Gene or UniProt accession identifiers. The respective datasets were parsed by customized Perl scripts.

Sites of expression (cell lines, normal and disease tissues) is collected from HPRD, UniProt, Human Protein Atlas8, Human Proteome Browser9, Human Proteome Map10, ProteomicsDB11 and Human Proteinpedia7 databases. Protein annotations pertaining to pathways have been collected from Reactome12, NCI13, Cell map14, HumanCyc15 and NCI13 nature databases. Data for transcription factors were collected from 29 mammalian genome projects16 while for disease terms clinical synopsis phenotypic terms were downloaded from OMIM database17. In addition, human proteome semi-quantitative data set compiled by Human Proteome Map and ProteomicDB were also collated. Similarly, mass spectrometry (PRIDE18, PeptideAtlas19, Peptidome20 and Human Proteinpedia), immunohistochemistry (HPA and Human Proteinpedia), exosome (ExoCarta21, Vesiclepedia22), colorectal cancer (Colorectal Cancer Database), plasma (Plasma Proteome Database23), post-translational modification (PhosphositePlus24, HPRD, Human Proteinpedia and UniProt) databases were used to download the protein annotations and parsed with customized Perl scripts. As no two databases had the download files in same format or same accession identifiers, Perl scripts were customized to every single database file.


2, What statistical methods are used in FunRich?

For statistics, we have used hypergeometric test, BH and Bonferroni in FunRich.