Technical Overview, peptide word index table

From TheGPMWiki
Revision as of 19:24, 16 February 2009 by WikiSysop (Talk | contribs)
Jump to: navigation, search

The peptide_word_index table is used for doing peptide residue subsequence searches. It is an on-disk index of the peptides in which four-residue strings can be found, so is very large. A full worked example of the operation of this table can be found at http://www.thegpm.org/GPMDB/subsequence_searches.pdf

The contents of this table are updated incrementally as part of the update procedure which executes daily.

Columns:

  • keyid: the unique identifier for a given row of data. A single keyid will be associated with a single four-residue word, but a four-residue word can be associated with many keyids.
  • word: the four-residue word. The only letters that will not appear in this field are J and O; U appears for selenocysteine, and all three placeholder residue letters are also stored in some records in GPMDB.
  • pepid_list: a maximum 65,000 character list of peptide identifiers, each enclosed by a single pipe ("|") character; e.g. "|123||456||789|".
  • ts_created: the system timestamp for the addition or updating of a row of data. This is used only in troubleshooting; it has no effect on the usage of the table or the search results.
Personal tools