Technical Overview, peptide word index table

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
(New page: The ''peptide_word_index'' table is used for doing peptide residue subsequence searches. It is an on-disk index of the peptides in which four-residue...)
m
Line 6: Line 6:
*'''keyid''': the unique identifier for a given row of data.  A single keyid will be associated with a single four-residue word, but a four-residue word can be associated with many keyids.
*'''keyid''': the unique identifier for a given row of data.  A single keyid will be associated with a single four-residue word, but a four-residue word can be associated with many keyids.
*'''word''': the four-residue word.  The only letters that will not appear in this field are ''J'' and ''O''; ''U'' appears for selenocysteine, and all three placeholder residue letters are also stored in some records in GPMDB.
*'''word''': the four-residue word.  The only letters that will not appear in this field are ''J'' and ''O''; ''U'' appears for selenocysteine, and all three placeholder residue letters are also stored in some records in GPMDB.
-
*'''pepid_list''': a maximum 65,000 character list of peptide identifiers, each enclosed by a single pipe('|') character; e.g. "|123||456||789|".
+
*'''pepid_list''': a maximum 65,000 character list of peptide identifiers, each enclosed by a single pipe ("|") character; e.g. "|123||456||789|".
*'''ts_created''': the system timestamp for the addition or updating of a row of data.  This is used only in troubleshooting; it has no effect on the usage of the table or the search results.
*'''ts_created''': the system timestamp for the addition or updating of a row of data.  This is used only in troubleshooting; it has no effect on the usage of the table or the search results.

Revision as of 19:24, 16 February 2009

The peptide_word_index table is used for doing peptide residue subsequence searches. It is an on-disk index of the peptides in which four-residue strings can be found, so is very large. A full worked example of the operation of this table can be found at http://www.thegpm.org/GPMDB/subsequence_searches.pdf

The contents of this table are updated incrementally as part of the update procedure which executes daily.

Columns:

  • keyid: the unique identifier for a given row of data. A single keyid will be associated with a single four-residue word, but a four-residue word can be associated with many keyids.
  • word: the four-residue word. The only letters that will not appear in this field are J and O; U appears for selenocysteine, and all three placeholder residue letters are also stored in some records in GPMDB.
  • pepid_list: a maximum 65,000 character list of peptide identifiers, each enclosed by a single pipe ("|") character; e.g. "|123||456||789|".
  • ts_created: the system timestamp for the addition or updating of a row of data. This is used only in troubleshooting; it has no effect on the usage of the table or the search results.
Personal tools