X! p3

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
(New page: ==Project== The idea of using "proteotypic peptides" is a relatively new notion in protein/peptide identification. It is simply the recognition of the fact that if you cleave a protein int...)
Line 11: Line 11:
A potential problem with this type of approach is clearly the lack of a good set of proteotypic peptides to use. This has been solved through the GPMDB, which is the largest collection of proteomics data available to the public. By querying GPMDB to find the best peptides representative of a particular protein, it is now possible to produce very good quality libraries of these peptides for two model organisms, namely Homo sapiens and Saccharomyces cerevisiae, as well as several commonly observed experimental artifacts, such as BSA and trypsin. The sequence libraries are updated daily from GPMDB, so the system has the ability to learn about new proteotypic peptides, as they are generated by the overall Global Proteome Machine.
A potential problem with this type of approach is clearly the lack of a good set of proteotypic peptides to use. This has been solved through the GPMDB, which is the largest collection of proteomics data available to the public. By querying GPMDB to find the best peptides representative of a particular protein, it is now possible to produce very good quality libraries of these peptides for two model organisms, namely Homo sapiens and Saccharomyces cerevisiae, as well as several commonly observed experimental artifacts, such as BSA and trypsin. The sequence libraries are updated daily from GPMDB, so the system has the ability to learn about new proteotypic peptides, as they are generated by the overall Global Proteome Machine.
 +
 +
==Latest release==
 +
 +
This release is the first of the new TORNADO versions of X! P3, which have the goal of utilizing available external annotation information it improve the performance of sequence identifications. The 2007.04.01 release started this project, by adding single nucleotide induce amino acid polymorphism annotation to searches. TORNADO introduces the capability of setting the potential modifications tested on a sequence by sequence basis, controlled by a BIOML annotation file.
 +
 +
System level changes
 +
 +
#A fix for the method to force the use of specific file formats (made by Patrick Lacasse)
 +
#Addition of a class to handle sequence annotation files in BIOML format (saxmodhandler).
 +
#Addition of a method to load the annotation file information into an STL map, in the class mprocess.
 +
#When compiling on Linux platforms, several possible makefiles are provided. The default makefile will work for GCC version 4, with the expat libraries dynamically linked. The other makefiles are all in the src directory, with names like "Makefile_XXX" where XXX is a descriptive name indicating in which situations this file is appropriate. To use these files, use a command line like this:
 +
 +
      >make -f Makefile_GCCv3

Revision as of 21:43, 18 April 2008

Project

The idea of using "proteotypic peptides" is a relatively new notion in protein/peptide identification. It is simply the recognition of the fact that if you cleave a protein into peptides, not all of the peptides are equally likely to be detected by current mass spectrometry-based techniques. Some peptides from a particular protein sequence are detected easily, while others are very difficult to find. The peptides generated from a sequence that are always detected are called proteotypic, i.e., those peptides alone are indicative of a the presence of a particular protein.

This idea suggests that it should be possible to scan through a set of data, for example an LC/MS/MS run, looking only for the known proteotypic peptides for a particular organism. Finding those proteotypic peptides is enough to know that the protein was present in the original sample. Because there will only be a few proteotypic peptides for a protein, it should be possible to improve both the speed and accuracy of the resultant protein identifications.

The X! P3 (Proteotypic Peptide Profiler) project is the first publically available search engine that takes advantage of this idea. Built using the X! TANDEM refinement idea and the open source X! TANDEM code, X! P3 takes the proteotypic peptide idea to its logical conclusion by adding a few simple steps. Rather than simply identifying the proteins, a proteotypic approach is used to find protein sequences and then refinement is used on the full spectrum data set to find all of the peptides present, as well as looking for post-translational modifications, point mutations and unanticipated peptide cleavages. It works this way:

  1. In the first round, the spectrum data set is examined for the presence of proteotypic peptides.
  2. The full protein sequences of the proteins identified in the first round are then pulled from a sequence library.
  3. Using this small set of full sequences, multiple rounds of refinement are performed to extract all of the non-proteotypic peptides from the full spectrum data set

A potential problem with this type of approach is clearly the lack of a good set of proteotypic peptides to use. This has been solved through the GPMDB, which is the largest collection of proteomics data available to the public. By querying GPMDB to find the best peptides representative of a particular protein, it is now possible to produce very good quality libraries of these peptides for two model organisms, namely Homo sapiens and Saccharomyces cerevisiae, as well as several commonly observed experimental artifacts, such as BSA and trypsin. The sequence libraries are updated daily from GPMDB, so the system has the ability to learn about new proteotypic peptides, as they are generated by the overall Global Proteome Machine.

Latest release

This release is the first of the new TORNADO versions of X! P3, which have the goal of utilizing available external annotation information it improve the performance of sequence identifications. The 2007.04.01 release started this project, by adding single nucleotide induce amino acid polymorphism annotation to searches. TORNADO introduces the capability of setting the potential modifications tested on a sequence by sequence basis, controlled by a BIOML annotation file.

System level changes

  1. A fix for the method to force the use of specific file formats (made by Patrick Lacasse)
  2. Addition of a class to handle sequence annotation files in BIOML format (saxmodhandler).
  3. Addition of a method to load the annotation file information into an STL map, in the class mprocess.
  4. When compiling on Linux platforms, several possible makefiles are provided. The default makefile will work for GCC version 4, with the expat libraries dynamically linked. The other makefiles are all in the src directory, with names like "Makefile_XXX" where XXX is a descriptive name indicating in which situations this file is appropriate. To use these files, use a command line like this:
     	>make -f Makefile_GCCv3
Personal tools