Technical Overview, dblist keyword.pl

From TheGPMWiki
Revision as of 00:23, 18 February 2009 by WikiSysop (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This searches for protein descriptions that contain a specific word or phrase. The database searched is EnspMapDB, which is installed alongside GPMDB in a standard installation.

There are two required arguments: keyword, which can be one or more words of at least four letters in length; and db, which is chosen from the dropdown list. By default, "homo sapiens - ensembl" is chosen for searching.

Optional arguments are proex, page, db_index, offset and sort. The proex argument is used in constructing links to protein-level information. page is used to control the number of results to display per page, with an internal minimum of five. db_index is the numeric version of the database to search. offset is the page number of results to display, which defaults to zero and results in the first page being displayed. sort controls which column is used to sort the results; a value of "totals" will sort by the descending number of identifications; "expects" sorts by ascending expect value; "scores" will sort by the descending score of the keyword search descending score, as calculated by the storage engine. The default value of the sort argument is "expects".

The keyword searching is a fulltext index search, so the results can be modified with the inclusion of some special characters. By default, multiple keywords will generate a result set that contains either keyword, but will rank results with both keywords more highly. Higher importance may be given to one word by prefixing it with a greater than sign (>) or a lower score by prefixing with a less than sign (<). A word can be excluded by prefixing it with a minus sign (-). An exact phrase can be searched for by enclosing it in double quotes (").

Display Elements

  • a list of result nagivation links, with 20 results per page.
  • #: the result counter.
  • Accession: the accession number associated with the description.
  • total: the total number of times this protein has been identified in the GPM.
  • log(e): the lowest (meaning best) expect value for any identification of this protein in the GPM. The expect value is the base-10 logarithm of the possibility that a protein identification was actually a random match. E.g., if the expect value is -6.0, the changes of this identification being a random match is 1 in 1 million.
  • description: the description of the protein.

This script uses the following tables: in GPMDB, protein, proseq, result, best_expect. From EnspMapDB, map.

Personal tools