GPMDB evidence codes

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
Line 34: Line 34:
[http://gpmdb.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459]
[http://gpmdb.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459]
-
The current version of the algorithm for a protein is summarized below.
+
The current version of the algorithm for determining these evidence codes for a protein (NBS v.2) is summarized below.
-
# Retrieve the lowest E-value assigned to the protein of interest;
 
-
# If that E-value is less than a threshold value (currently log(E) < -13.5), report EC = 4;
 
# Retrieve all of the observed E-values for peptides assigned to the protein of interest;
# Retrieve all of the observed E-values for peptides assigned to the protein of interest;
-
# If none of the peptides have an observation with an E-value less than 0.01, report EC = 1;
 
# Generate a frequency histogram for each of the peptides, using bins based on the observations' log(E);
# Generate a frequency histogram for each of the peptides, using bins based on the observations' log(E);
-
#*<i>e.g.</i>, all appropriate observations are binned using the values floor(log(E)) = -2,-3,-4, ... -11;
+
#* all appropriate observations are binned using the values floor(log(E)) = -2, -3, -4, ... -14 (for example, [http://gpmdb.thegpm.org/protein/model/ENSP00000339186 GRAP2]);
-
# Calculate the skew and excess kurtosis of the resulting histograms;
+
# Any peptide that has not been observed with an E-value &lt; 0.01, report EC = 1;
-
# If the skew and excess kurtosis for any peptide are both &le; 1.5, report EC = 4;
+
# Calculate the weighted mean, skew and excess kurtosis for each of the resulting histograms;
-
# If either the skew or excess kurtosis for any peptide is &le; 1.5, report EC = 3;
+
# For each peptide that has been observed &ge; 5 times:
-
# Report EC = 2.
+
## If the skew and excess kurtosis for the peptide are both &le; 1.5 or the weighted mean &le; -5.5, report EC = 4;
 +
## If either the skew or excess kurtosis for the peptide is &le; 1.5 or the weighted mean &le; -3.5, report EC = 3;
 +
# For all other peptides, report EC = 2;
 +
# Report the highest peptide EC as the protein EC.

Revision as of 17:01, 22 October 2014

GPMDB uses an evidence code system to rate the current observation status of individual protein sequences. The same system is used for gene loci, where applicable. The ratings are as follows:

Code Level Evidence Meaning
black 1 none no observation of a protein has reached a set scoring threshold
red 2 poor at least one observation of a protein has exceeded the scoring threshold
yellow 3 modest multiple observations of a protein have a set of common peptides and the distribution of scores for at least one peptide in that set exceeds a minimum test for non-randomness
green 4 good the set of common peptides contains at least one peptide with a scoring distribution that exceeds a stricter test for non-randomness

The method for calculating these evidence codes in Perl can be found in the subroutine "gpmdbProteinEvidence" in gpmdb_rest.pl.

The evidence code associated with any protein accession number can be retrieved using the GPMDB REST interface with a URL call, for example, the evidence code for the mouse protein ENSMUSP00000026459 can be obtained at:

http://gpmdb.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459

The current version of the algorithm for determining these evidence codes for a protein (NBS v.2) is summarized below.

  1. Retrieve all of the observed E-values for peptides assigned to the protein of interest;
  2. Generate a frequency histogram for each of the peptides, using bins based on the observations' log(E);
    • all appropriate observations are binned using the values floor(log(E)) = -2, -3, -4, ... -14 (for example, GRAP2);
  3. Any peptide that has not been observed with an E-value < 0.01, report EC = 1;
  4. Calculate the weighted mean, skew and excess kurtosis for each of the resulting histograms;
  5. For each peptide that has been observed ≥ 5 times:
    1. If the skew and excess kurtosis for the peptide are both ≤ 1.5 or the weighted mean ≤ -5.5, report EC = 4;
    2. If either the skew or excess kurtosis for the peptide is ≤ 1.5 or the weighted mean ≤ -3.5, report EC = 3;
  6. For all other peptides, report EC = 2;
  7. Report the highest peptide EC as the protein EC.
Personal tools