GPMDB evidence codes

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
Line 31: Line 31:
# Retrieve all of the observed E-values for peptides assigned to the protein of interest;
# Retrieve all of the observed E-values for peptides assigned to the protein of interest;
# If none of the peptides have an observation with an E-value less than 0.01, report EC = 1;
# If none of the peptides have an observation with an E-value less than 0.01, report EC = 1;
-
# Generate a frequency histogram for each of the peptides, using bins based on the observations' log(E), <i>e.g..</i>, all appropriate observations are binned using the values floor(log(E)) = -2,-3,-4, ... -11;
+
# Generate a frequency histogram for each of the peptides, using bins based on the observations' log(E);
 +
#*<i>e.g.</i>, all appropriate observations are binned using the values floor(log(E)) = -2,-3,-4, ... -11;
# Calculate the skew and excess kurtosis of the resulting histograms;
# Calculate the skew and excess kurtosis of the resulting histograms;
# If the skew and excess kurtosis for any peptide are both less than 1.5, report EC = 4;
# If the skew and excess kurtosis for any peptide are both less than 1.5, report EC = 4;
# If either the skew or excess kurtosis for any peptide is less that 1.5, report EC = 3;
# If either the skew or excess kurtosis for any peptide is less that 1.5, report EC = 3;
# Report EC = 2.
# Report EC = 2.

Revision as of 15:37, 5 November 2013

GPMDB uses an evidence code system to rate the current observation status of individual protein sequences. The same system is used for gene loci, where applicable. The ratings are as follows:

Code Level Meaning
black 1 no observation of a protein has reached a set scoring threshold
red 2 at least one observation of a protein has exceeded the scoring threshold
yellow 3 multiple observations of a protein have a set of common peptides and the distribution of scores for at least one peptide in that set exceeds a minimum test for non-randomness
green 4 the set of common peptides contains at least one peptide with a scoring distribution that exceeds a stricter test for non-randomness

The method for calculating these evidence codes in Perl can be found in the subroutine "gpmdbProteinModel" in gpmdb_rest.pl.

The current version of the algorithm for a protein is summarized below.

  1. Retrieve the lowest E-value assigned to the protein of interest;
  2. If that E-value is less than a threshold value (currently log(E) = -15), report EC = 4;
  3. Retrieve all of the observed E-values for peptides assigned to the protein of interest;
  4. If none of the peptides have an observation with an E-value less than 0.01, report EC = 1;
  5. Generate a frequency histogram for each of the peptides, using bins based on the observations' log(E);
    • e.g., all appropriate observations are binned using the values floor(log(E)) = -2,-3,-4, ... -11;
  6. Calculate the skew and excess kurtosis of the resulting histograms;
  7. If the skew and excess kurtosis for any peptide are both less than 1.5, report EC = 4;
  8. If either the skew or excess kurtosis for any peptide is less that 1.5, report EC = 3;
  9. Report EC = 2.
Personal tools