GPMDB Data Sources

From TheGPMWiki
Revision as of 19:13, 6 January 2022 by WikiSysop (Talk | contribs)
Jump to: navigation, search

GPMDB was originally constructed to serve as a reference work for all publicly available proteomics generated using tandem mass spectrometry. Public data is downloaded and reanalyzed using the current version of X! Tandem. The result files generated by the reanalysis and the relevant metadata are imported into the database and made available through the associated web site, ftp site and REST interfaces.

Contents

Current Public Data Sources

The following public data repositories are checked daily for new suitable raw data for reanalysis:

  1. ProteomeXchange/PRIDE;
  2. JPOST;
  3. MASSIVE;
  4. PeptideAtlas/PASSEL;
  5. ProteomicsDB;
  6. The Chorus Project; and
  7. iProX.

Data made available from specific large projects, such as CPTAC or the Human Proteome Atlas, are also included when they are made available. Every effort is made so that reanalyzed results from all data sources are made available within 48 hours of their being released. In addition, data from lab web sites, ftp sites and direct contributions through the GPM sites made available to researchers are imported into GPMDB as part of a daily incremental update process.

Previous Data Sources

GPMDB has been in operation since Jan. 1, 2004. Several large data source repositories have come into existence and ceased activity in the period since that time. All of the data from those repositories (e.g., TRANCHE, Peptidome) were reanalyzed and stored in GPMDB and they are still available even though the source repository sites are no longer active.

Review process

Simply because data is made available does not mean that it will be included in GPMDB. The data must be approved our quality control AI for its initial acceptance and it may be rejected subsequently because of either quality or originality concerns.

CAUTION: Many papers contain serious errors in their Methods sections. When using data from the literature, it is important to be skeptical of any experimental parameter (cell line, tissue type, modification reagents, quantitation methoods, etc.) that may impact on your use of the data. We have tried to correct any obvious errors, but there is no way to guarantee that we found them all. When attempting to analyze or reproduce results, keep in mind the likelyhood that even key parts of the experiment methods may have been recorded incorrectly in the associated manuscript, as methods are rarely reviewed properly in the current journal publication process.

Data from publications

The following is a list of data sets with associated PubMed IDs that have supplied data to the GPMDB Project through the data sources mentioned above. The list was current, as of Jan 2, 2022.

Personal tools