Updating Protein Omega Counts

From TheGPMWiki
Jump to: navigation, search

The protein_omega_count table is a part of the peakdb database. The table consists of five columns:

  • label: the accession number of a protein, found in the proseq table in GPMDB
  • seq: the residue sequence of a peptide identified as part of that protein, found in the peptide table in GPMDB
  • z1total: an integer for the number of times this peptide has been identified in the corresponding protein in a singly charged state
  • z2total: an integer for the number of times this peptide has been identified in the corresponding protein in a doubly charged state
  • z3total: an integer for the number of times this peptide has been identified in the corresponding protein in a triply charged state

Updating the table is accomplished by two scripts: /thegpm/scripts/refresh_protein_omega_count_table.pl and /thegpm/scripts/dump_protein_omega_count_script.sql Executing the perl script will rename the dumped contents of the table (if they exist), execute the SQL commands to create a new dump file, and then import the newly-dumped information into the database.

The SQL script contains commands to build the full content of the table with a single SQL command from the content of the peptide and proseq table in GPMDB. The dumped information is by default created in the /gpmdb/ directory, and it's name is hard-coded into both the perl and SQL script. The total runtime for the process is between three and four hours with a peptide table size of 17GiB.

Once the process is complete, the newly-created dump file can be transferred to any machine with a peakdb installation. From the MySQL command prompt, a user with the proper permissions can delete the existing content and load the new set.

Personal tools