False discovery rate

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
m (Protected "False discovery rate" ([edit=sysop] (indefinite) [move=sysop] (indefinite)) [cascading])

Revision as of 18:54, 5 November 2011

False Positive Rate

The False Positive Rate (FPR) is a conventional statistical measure of the likelyhood of chance results being present in a given set of results. If the expectation (e) that a particular result is stochastic can be determined, then the FPR (in percent) for N results is simply FPR = 100×(∑e)/N. The FPR does not require the use of thresholds or additional Monte Carlo calculations, as it is a property of any set of results that have valid E-values (or p-values) assigned.

False Discovery Rate

The peptide False Discovery Rate (FDR) a calculated number frequently used in the proteomics literature. This number is generated by performing a spectrum-to-peptide sequence assignment process using peptide sequences that are very unlikely to be correct assignments and a very large number of spectra (DECOY). The resulting "scores" obtained from this process are compared with those obtained from an assignment process in which the true peptide sequences are available (TRUE). The two distributions are compared by first setting an arbitrary "threshold" score for the TRUE values and taking as a hypothesis that all assignments with a score greater than this threshold are correct (T assignments). The same threshold is set in the DECOY distribution and the number of assignments above this threshold are assumed to be false (D). The FDR (in percent) is calculated as FDR = 100×D/T. There are a large number of potential variations to this simple method that are in use and there is currently no single accepted way to calculate this number.

The FDR calculated in this way has no direct interpretation in terms of statistical theory and should be treated as an unreliable estimate of the number of false positive peptide sequence assignments in a large data set. It is prone to significant errors in many of the special cases that are commonly found in proteomics and it has no implications regarding the validity of any particular result: it is property of the full set of results only. It should only be used when the search engine SEQUEST is used and for some reason it is impossible to use proper statistical tests, such as those made available by PeptideProphet. If statistical test have been done, the False Positive Rate should always be used instead.

There is no such thing as a protein False Discovery Rate when using peptide identification to assign protein sequences (bottom-up proteomics). It can be applied to top-down proteomics results, but only if appropriate statistical methods are unobtainable for some reason.

Personal tools