GPMDB REST

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
(GET /protein/count/acc=ACCESSION)
 
(68 intermediate revisions not shown)
Line 1: Line 1:
-
This document is a request for comment on a proposed new REST interface for GPMDB. The RFC process began on June 11, 2012 and will end on September 10, 2012.
+
[[File:Rest_gpmdb.png|center|GPMDB REST API v.1]]
 +
This document is a request for comment on a set of proposed new REST services for GPMDB. The RFC process began on June 11, 2012 and ended on September 10, 2012. This version (1.0) of the interface is complete and adopted.
-
The purpose of the RFC is to define a straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] interface to commonly required information based on the data in GPMDB. This interface will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of an network-aware application.
+
These services are available to anyone involved in computational, biological or biomedical research. We, the providers, retain the right to block access by any user or system that we feel is using the information or infrastructure inappropriately.
 +
 
 +
The purpose of the RFC is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services to commonly required information based on the data in GPMDB. These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application.
==Conventions==
==Conventions==
Line 7: Line 10:
The following statements refer to all methods and description given below:
The following statements refer to all methods and description given below:
-
#all GETs are made to the base URL "http://gpmdb.thegpm.org/1";
+
#all GETs are made to the base URL "http://rest.thegpm.org/1";
#all protein and peptide sequences are expressed in [[Amino acid symbols|single letter code]];
#all protein and peptide sequences are expressed in [[Amino acid symbols|single letter code]];
#protein accession numbers use the same format as normally used in GPMDB; and
#protein accession numbers use the same format as normally used in GPMDB; and
-
#all "tuples" are returned as comma-separated text strings.
+
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT.
-
==/interface/ methods==
+
==/interface/ services==
-
Interface methods give the programmer access to information about the current state of the REST interface.
+
Interface services give the programmer access to information about the current state of the REST interface.
-
===GET /interface/help===
+
===GET /interface/help => [ARRAY]===
   input: none
   input: none
-
  return: a text description of the current REST interface
+
  return: [ARRAY - "string"] a text description of the current REST interface
-
Example: find help information through the interface:
+
Example: find help information through the interface
-
[http://gpmdb.thegpm.org/1/interface/help http://gpmdb.thegpm.org/1/interface/help].
+
[http://rest.thegpm.org/1/interface/help http://rest.thegpm.org/1/interface/help]
-
===GET /interface/version===
+
===GET /interface/version => [ARRAY]===
   input: none
   input: none
-
  return: the version number of the REST interface
+
  return: [ARRAY - "string"] the version number of the REST interface
Example: find the current REST interface version number
Example: find the current REST interface version number
-
[http://gpmdb.thegpm.org/1/interface/version http://gpmdb.thegpm.org/1/interface/version].
+
[http://rest.thegpm.org/1/interface/version http://rest.thegpm.org/1/interface/version]
-
==/protein/ methods==
+
-
Protein methods give the programmer access of information about the proteins currently stored in GPMDB. All
+
==/model/ services==
-
protein methods require an appropriate protein accession number to identify the protein sequence being
+
 
 +
Model services give access to information about a calculated model for a particular data set.
 +
 
 +
===GET /model/metadata/gpm=GPM => {OBJECT}===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
return: {OBJECT - "string":{"string":"string"}}} objects containing text descriptions of how, when and what was modeled
 +
 
 +
Example: list sample, analytical parameters and result values for the data set GPM10100159682
 +
 
 +
[http://rest.thegpm.org/1/model/metadata/gpm=GPM10100159682 http://rest.thegpm.org/1/model/metadata/gpm=GPM10100159682]
 +
 
 +
===GET /db/metadata/gpm=GPM => {OBJECT}===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
return: {OBJECT - "string":"string"} metadata abstract text stored in GPMDB
 +
 
 +
Example: list metadata abstract text information for the data set GPM10100159682
 +
 
 +
[http://rest.thegpm.org/1/db/metadata/gpm=GPM10100159682 http://rest.thegpm.org/1/db/metadata/gpm=GPM10100159682]
 +
 
 +
===GET /model/proteins/gpm=GPM => [ARRAY]===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
return: [ARRAY - "string"} accession numbers for primary proteins in the model
 +
 
 +
Example: list proteins found in GPM10100159682
 +
 
 +
[http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682 http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682]
 +
 
 +
===GET /model/protein_modifications/gpm=GPM&acc=ACC => [ARRAY]===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
        accession number for the protein of interest (ACC)
 +
return: [ARRAY - "string"] all residue modifications assigned to ACC, as specified in [http://wiki.thegpm.org/wiki/Nomenclature_for_the_description_of_protein_sequence_modifications#.28.23.29_Specifying_modifications_as_a_change_in_mass RFC GPM-2011.12.14 §2.3].
 +
 
 +
Example: list residue modifications found in GPM10100159682 for the protein ENSP00000324804
 +
 
 +
[http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804 http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804]
 +
 
 +
===GET /model/protein_peptides/gpm=GPM&acc=ACC => {OBJECT}===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
        accession number for the protein of interest (ACC)
 +
return: {OBJECT - "string":int} peptide sequence : ×observed for peptide sequences assigned to ACC
 +
 
 +
Example: list peptides found in GPM10100159682 for the protein ENSP00000324804
 +
 
 +
[http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804 http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804]
 +
 
 +
===GET /model/protein_savs/gpm=GPM&acc=ACC => [ARRAY]===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
        accession number for the protein of interest (ACC)
 +
return: [ARRAY - "string"] all single amino acid variants (SAVs) assigned to ACC, as specified in [http://wiki.thegpm.org/wiki/Nomenclature_for_the_description_of_protein_sequence_modifications#.28:p..29_Specifying_amino_acid_residue_changes_caused_by_nucleic_acid_sequence_variants RFC GPM-2011.12.14 §2.5].
 +
NOTE: this method was originally "protein_polymorphisms", which still functions as a URL.
 +
 
 +
Example: list SAVs found in GPM10100159682 for the protein ENSP00000331514
 +
 
 +
[http://rest.thegpm.org/1/model/protein_savs/gpm=GPM10100159682&acc=ENSP00000331514 http://rest.thegpm.org/1/model/protein_savs/gpm=GPM10100159682&acc=ENSP00000331514]
 +
 
 +
===GET /model/protein_sequence/gpm=GPM&acc=ACC => [ARRAY]===
 +
 
 +
  input: a GPM model accession number (GPMddddddddddd)
 +
        accession number for the protein of interest (ACC)
 +
return: [ARRAY - "string"] the sequence of protein ACC identified in the model.
 +
 
 +
Example: get the protein sequence of ENSP00000331514 found in GPM10100159682
 +
 
 +
[http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514 http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514]
 +
 
 +
===GET /model/protein_species/acc=ACC1,ACC2,ACC3,... => [OBJECT]===
 +
 
 +
  input: a comma-separated list of protein accession numbers
 +
return: [OBJECT - "string":"string"] the accession numbers and associated binomial species names.
 +
 
 +
Example: get the species associated with the protein accession numbers ENSMUSP00000026459, ENSP00000339186, gi|240103100|, and sp|ALBU_BOVIN|
 +
 
 +
[http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN| http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN|]
 +
 
 +
 
 +
==/peptide/ services==
 +
 
 +
Peptide services give the programmer access of information about the peptides currently stored in GPMDB. All
 +
peptide services require an appropriate peptide amino acid sequence to identify the target peptide.
 +
 
 +
===GET /peptide/accessions/seq=SEQUENCE => {OBJECT}===
 +
 
 +
  input: peptide sequence of interest
 +
return: {OBJECT - "string":int} of {accession number:number of observations} pairs
 +
 
 +
Example: retrieve the accession numbers and observations of SPSSVEPVADMLMGLFFR
 +
 
 +
[http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR].
 +
 
 +
===GET /peptide/count/seq=SEQUENCE => [ARRAY]===
 +
 
 +
  input: peptide sequence of interest
 +
return: [ARRAY - int] the total number of observations
 +
 
 +
Example: retrieve the total observations of SPSSVEPVADMLMGLFFR
 +
 
 +
[http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR].
 +
 
 +
===GET /peptide/count_z/seq=SEQUENCE => {OBJECT}===
 +
 
 +
  input: peptide sequence of interest
 +
return: {OBJECT - "string":int} of {parent ion charge:number of observations} pairs
 +
 
 +
Example: retrieve the total observations of SPSSVEPVADMLMGLFFR by parent ion charge
 +
 
 +
[http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR].
 +
 
 +
===GET /peptide/xx/acc=ACCESSION&pos=1-N&w=n => {OBJECT}===
 +
 
 +
  input: peptide sequence of interest
 +
return: {OBJECT - "string":int} of {protein coordinate:number of observations} pairs for select modifications
 +
 
 +
"xx" can be any of the following:
 +
# "pf"- S/T/Y phosphorylation;
 +
# "af"- K acetylation;
 +
# "uf"- K ubiquitination;
 +
# "di"- R dimethylation;
 +
# "ox"- P/K oxidation;
 +
# "su"- K sumoylation;
 +
# "nq"- N/Q deamidation;
 +
# "ol"- S/T O-linked glycosylation;
 +
# "ct"- R citrulination;
 +
# "sc"- K succinylation;
 +
# "qc"- Q cyclization; and
 +
# "gl"- E gamma-carboxylation.
 +
 
 +
Example: retrieve the total observations of phosphorylation for ENSP00000384660
 +
 
 +
[http://gpmdb.thegpm.org/1/peptide/pf/acc=ENSP00000384660&pos=1-706&w=n http://gpmdb.thegpm.org/1/peptide/pf/acc=ENSP00000384660&pos=1-706&w=n].
 +
 
 +
==/protein/ services==
 +
 
 +
Protein services give the programmer access of information about the proteins currently stored in GPMDB. All
 +
protein services require an appropriate protein accession number to identify the protein sequence being
queried.
queried.
-
===GET /protein/best_e/acc=ACCESSION===
+
===GET /protein/best_e/acc=ACCESSION => [ARRAY]===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: the log10(E) for the best observation of ACCESSION
+
  return: [ARRAY - float] the log10(E) for the best observation of ACCESSION
Example: retrieve the lowest E value for ENSMUSP00000026459
Example: retrieve the lowest E value for ENSMUSP00000026459
-
[http://gpmdb.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459].
+
[http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459].
-
===GET /protein/count/acc=ACCESSION===
+
===GET /protein/count/acc=ACCESSION => [ARRAY]===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: the total number of observations of ACCESSION
+
  return: [ARRAY - int] the total number of observations of ACCESSION
Example: retrieve the number of time ENSMUSP00000026459 has been observed
Example: retrieve the number of time ENSMUSP00000026459 has been observed
-
[http://gpmdb.thegpm.org/1/protein/count/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/count/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459]
-
===GET /protein/description/acc=ACCESSION===
+
===GET /protein/description/acc=ACCESSION => [ARRAY]===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: the log10(E) for the best observation of ACCESSION
+
  return: [ARRAY - "string"] text description of the protein ACCESSION
Example: retrieve a text description of ENSMUSP00000026459
Example: retrieve a text description of ENSMUSP00000026459
-
[http://gpmdb.thegpm.org/1/protein/description/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/description/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459]
-
===GET /protein/modifications/acc=ACCESSION&mod=MODMASS&res=LISTOFRESIDUES&maxe=MAXIMUMEXPECTION===
+
===GET /protein/evidence/acc=ACCESSION => {OBJECT}===
 +
 
 +
  input: accession number for the protein of interest
 +
return: {OBJECT - "string":"string"} [[GPMDB evidence codes|evidence code]] information about status of the protein
 +
 
 +
Example: retrieve the evidence codes for ENSMUSP00000026459
 +
 
 +
[http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459]
 +
 
 +
===GET /protein/keyword/key=KEYWORD(&filter=FILTER) => {OBJECT}===
 +
 
 +
  input: a keyword in a protein description
 +
        a string to filter the accession numbers (default = "human")
 +
return: {OBJECT - "string":"string"} information about the proteins matching the keyword
 +
 
 +
Example: list human proteins with the keyword "actinin"
 +
 
 +
[http://rest.thegpm.org/1/protein/keyword/key=actinin http://rest.thegpm.org/1/protein/keyword/key=actinin]
 +
 
 +
===GET /protein/modifications/acc=ACCESSION&mod=MODMASS&res=LIST&maxe=MAXEXPECT => {OBJECT}===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
Line 72: Line 232:
         list of residue types that may be modified, e.g., phosphorylation = STY
         list of residue types that may be modified, e.g., phosphorylation = STY
         log10(E) of the maximum allowed expectation value for a valid observation
         log10(E) of the maximum allowed expectation value for a valid observation
-
  return: a tuple of modificatied site in ACCESSION in the format pos1:num1,pos2:num2, ...
+
  return: {OBJECT - "string":int} of {modified site:number of observations} for ACCESSION
-
        where: pos1 is the protein coordinate of the first modified site; and
+
 
-
                num1 is the number of times modified pos1 has been observed
+
Example: retrieve the phosphorylated S, T or Y residues for YKL112W
 +
 
 +
[http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0 http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0]
-
Example: retrieve the phosphorylated residues for ENSMUSP00000026459
+
Example: retrieve all acetylated residues for ENSP00000281111
-
[http://gpmdb.thegpm.org/1/protein/modifications/acc=ENSMUSP00000026459&mod=80&res=STY&maxe=-2.0 http://gpmdb.thegpm.org/1/protein/modifications/acc=ENSMUSP00000026459&mod=80&res=STY&maxe=-2.0]
+
[http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0 http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0]
-
===GET /protein/peptide_count/acc=ACCESSION&seq=SEQUENCE===
+
===GET /protein/omega/acc=ACCESSION&seq=SEQUENCE(,SEQUENCE2,...) => {OBJECT}===
 +
 
 +
  input: accession number for the protein of interest
 +
        comma separated list of peptide sequences in single letter code
 +
return: {OBJECT - "string":[{"string":[float,float,float,float]}]}
 +
        {"ACCESSION":[{"SEQUENCE":[ω(z=1),ω(z=2),ω(z=3),ω(z=4)]}]}
 +
  added: 2013-05-05
 +
 
 +
Example: retrieve the [[Technical Overview, omega display|omega peptide frequencies]] for LDATTVLSR and AAQASAAPK observed for ENSMUSP00000026459
 +
 
 +
[http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK]
 +
 
 +
===GET /protein/peptide_count/acc=ACCESSION&seq=SEQUENCE => {OBJECT}===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
         peptide sequence in single letter code
         peptide sequence in single letter code
-
  return: a comma-separated tuple of observations SEQUENCE  
+
  return: {OBJECT - "string":int} of {parent ion charge:number of observations} of SEQUENCE in ACCESSION
-
        in ACCESSION, by parent ion charge state
+
Example: retrieve the number of times LDATTVLSR was observed for ENSMUSP00000026459
Example: retrieve the number of times LDATTVLSR was observed for ENSMUSP00000026459
-
[http://gpmdb.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR http://gpmdb.thegpm.org/1/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR]
+
[http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR]
-
===GET /protein/peptide_sequences/acc=ACCESSION===
+
===GET /protein/peptide_sequences/acc=ACCESSION => [ARRAY]===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: a comma-separated tuple of all peptides observed for ACCESSION
+
  return: [ARRAY - "string"] of all peptides observed for ACCESSION
Example: retrieve the peptide sequences observed for ENSMUSP00000026459  
Example: retrieve the peptide sequences observed for ENSMUSP00000026459  
-
[http://gpmdb.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459]
-
===GET /protein/peptides_z/acc=ACCESSION===
+
===GET /protein/peptides_z/acc=ACCESSION => {OBJECT}===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: a comma-separated tuple of the number of peptide observations  
+
  return: {OBJECT - "string":int} of the {parent ion charge:number of peptide observations} for ACCESSION
-
        of ACCESSION, by parent ion charge state.
+
Example: retrieve the number of peptide observations by parent ion charge state for ENSMUSP00000026459
Example: retrieve the number of peptide observations by parent ion charge state for ENSMUSP00000026459
-
[http://gpmdb.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459]
-
===GET /protein/peptides_total/acc=ACCESSION===
+
===GET /protein/peptides_total/acc=ACCESSION => [ARRAY]===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: number of observations of peptides associated with the reference protein
+
  return: [ARRAY - int] number of observations of peptides associated with ACCESSION
-
Example: retrieve the total number of peptide observationsfor ENSMUSP00000026459  
+
Example: retrieve the total number of peptide observations for ENSMUSP00000026459  
-
[http://gpmdb.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459]
-
===GET /protein/sequence/acc=ACCESSION===
+
===GET /protein/savs/acc=ACCESSION => [ARRAY of OBJECT "string":int]===
   input: accession number for the protein of interest
   input: accession number for the protein of interest
-
  return: the single-letter amino acid sequence of the referenced protein
+
  return: [ARRAY - OBJECT "string":int] an array of observed single amino acid variants (SAVs) and the number of observations for the protein ACCESSION
 +
NOTE: this method was originally "/protein/polymorphisms", which still functions as a URL.
 +
 
 +
Example: retrieve a summary of SAVs found for ENSMUSP00000026459
 +
 
 +
[http://rest.thegpm.org/1/protein/savs/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/savs/acc=ENSMUSP00000026459]
 +
 
 +
===GET /protein/sequence/acc=ACCESSION => [ARRAY]===
 +
 
 +
  input: accession number for the protein of interest
 +
return: [ARRAY - "string"] the amino acid sequence of ACCESSION
Example: retrieve the protein sequence for ENSMUSP00000026459
Example: retrieve the protein sequence for ENSMUSP00000026459
-
[http://gpmdb.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459 http://gpmdb.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459]
 +
 
 +
==Source code==
 +
These services are being implemented in PERL, the language that is used to provide most of the web page creation logic for the GPMDB system. The code is available at [ftp://ftp.thegpm.org/repos/gpmdb_rest ftp://ftp.thegpm.org/repos/gpmdb_rest]. The files currently available are as follows:
 +
 
 +
#rest.pl - provides an interface between Apache CGI calls and the database layer;
 +
#gpmdb_rest.pl - a PERL module for most requests;
 +
#gpm_rest.pl - a PERL module for "model" requests; and
 +
#rest_client.pl - a demonstration command-line client that provides examples of how to use the services with PERL.
==Comments and suggestions==
==Comments and suggestions==
Any one interested in making suggestions or commenting on the ideas in this document should send them by email to [mailto:rbeavis@thegpm.org Ron Beavis, rbeavis@thegpm.org].
Any one interested in making suggestions or commenting on the ideas in this document should send them by email to [mailto:rbeavis@thegpm.org Ron Beavis, rbeavis@thegpm.org].
 +
 +
==Manuscripts and Presentations==
 +
A manuscript describing this API has been published: [http://www.ncbi.nlm.nih.gov/pubmed/25697819 The GPMDB REST Interface], David Fenyö; Ronald C. Beavis, Bioinformatics 2015; doi: 10.1093/bioinformatics/btv107.
 +
 +
A poster presenting the API was shown at the US HUPO Conference (2015) in Tempe. You can download a PDF of the presentation [http://www.thegpm.org/present/rest_tempe.pdf here].
==Revision date and status==
==Revision date and status==
Line 139: Line 334:
! Stable URL
! Stable URL
|- align="center"
|- align="center"
-
| GPM-2012.06.11
+
| GPM-2012.6.11
-
| 2012.06.11
+
| 2015.02.19
-
| draft specification
+
| final specification
-
| [http://wiki.thegpm.org/wiki/GPMDB_REST http://wiki.thegpm.org/wiki/GPMDB_REST]  
+
| [http://icex.ca/ice.0.m http://icex.ca/ice.0.m]  
|}
|}

Latest revision as of 19:13, 16 March 2023

GPMDB REST API v.1

This document is a request for comment on a set of proposed new REST services for GPMDB. The RFC process began on June 11, 2012 and ended on September 10, 2012. This version (1.0) of the interface is complete and adopted.

These services are available to anyone involved in computational, biological or biomedical research. We, the providers, retain the right to block access by any user or system that we feel is using the information or infrastructure inappropriately.

The purpose of the RFC is to define a set of straightforward REST (REpresentational State Transfer) services to commonly required information based on the data in GPMDB. These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application.

Contents

Conventions

The following statements refer to all methods and description given below:

  1. all GETs are made to the base URL "http://rest.thegpm.org/1";
  2. all protein and peptide sequences are expressed in single letter code;
  3. protein accession numbers use the same format as normally used in GPMDB; and
  4. all return values are in JSON notation (RFC 4627) as either an ARRAY or OBJECT.

/interface/ services

Interface services give the programmer access to information about the current state of the REST interface.

GET /interface/help => [ARRAY]

 input: none
return: [ARRAY - "string"] a text description of the current REST interface

Example: find help information through the interface

http://rest.thegpm.org/1/interface/help

GET /interface/version => [ARRAY]

 input: none
return: [ARRAY - "string"] the version number of the REST interface

Example: find the current REST interface version number

http://rest.thegpm.org/1/interface/version

/model/ services

Model services give access to information about a calculated model for a particular data set.

GET /model/metadata/gpm=GPM => {OBJECT}

 input: a GPM model accession number (GPMddddddddddd)
return: {OBJECT - "string":{"string":"string"}}} objects containing text descriptions of how, when and what was modeled

Example: list sample, analytical parameters and result values for the data set GPM10100159682

http://rest.thegpm.org/1/model/metadata/gpm=GPM10100159682

GET /db/metadata/gpm=GPM => {OBJECT}

 input: a GPM model accession number (GPMddddddddddd)
return: {OBJECT - "string":"string"} metadata abstract text stored in GPMDB

Example: list metadata abstract text information for the data set GPM10100159682

http://rest.thegpm.org/1/db/metadata/gpm=GPM10100159682

GET /model/proteins/gpm=GPM => [ARRAY]

 input: a GPM model accession number (GPMddddddddddd)
return: [ARRAY - "string"} accession numbers for primary proteins in the model

Example: list proteins found in GPM10100159682

http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682

GET /model/protein_modifications/gpm=GPM&acc=ACC => [ARRAY]

 input: a GPM model accession number (GPMddddddddddd)
        accession number for the protein of interest (ACC)
return: [ARRAY - "string"] all residue modifications assigned to ACC, as specified in RFC GPM-2011.12.14 §2.3.

Example: list residue modifications found in GPM10100159682 for the protein ENSP00000324804

http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804

GET /model/protein_peptides/gpm=GPM&acc=ACC => {OBJECT}

 input: a GPM model accession number (GPMddddddddddd)
        accession number for the protein of interest (ACC)
return: {OBJECT - "string":int} peptide sequence : ×observed for peptide sequences assigned to ACC

Example: list peptides found in GPM10100159682 for the protein ENSP00000324804

http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804

GET /model/protein_savs/gpm=GPM&acc=ACC => [ARRAY]

 input: a GPM model accession number (GPMddddddddddd)
        accession number for the protein of interest (ACC)
return: [ARRAY - "string"] all single amino acid variants (SAVs) assigned to ACC, as specified in RFC GPM-2011.12.14 §2.5. 

NOTE: this method was originally "protein_polymorphisms", which still functions as a URL.

Example: list SAVs found in GPM10100159682 for the protein ENSP00000331514

http://rest.thegpm.org/1/model/protein_savs/gpm=GPM10100159682&acc=ENSP00000331514

GET /model/protein_sequence/gpm=GPM&acc=ACC => [ARRAY]

 input: a GPM model accession number (GPMddddddddddd)
        accession number for the protein of interest (ACC)
return: [ARRAY - "string"] the sequence of protein ACC identified in the model.

Example: get the protein sequence of ENSP00000331514 found in GPM10100159682

http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514

GET /model/protein_species/acc=ACC1,ACC2,ACC3,... => [OBJECT]

 input: a comma-separated list of protein accession numbers
return: [OBJECT - "string":"string"] the accession numbers and associated binomial species names.

Example: get the species associated with the protein accession numbers ENSMUSP00000026459, ENSP00000339186, gi|240103100|, and sp|ALBU_BOVIN|

http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN|


/peptide/ services

Peptide services give the programmer access of information about the peptides currently stored in GPMDB. All peptide services require an appropriate peptide amino acid sequence to identify the target peptide.

GET /peptide/accessions/seq=SEQUENCE => {OBJECT}

 input: peptide sequence of interest
return: {OBJECT - "string":int} of {accession number:number of observations} pairs

Example: retrieve the accession numbers and observations of SPSSVEPVADMLMGLFFR

http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR.

GET /peptide/count/seq=SEQUENCE => [ARRAY]

 input: peptide sequence of interest
return: [ARRAY - int] the total number of observations

Example: retrieve the total observations of SPSSVEPVADMLMGLFFR

http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR.

GET /peptide/count_z/seq=SEQUENCE => {OBJECT}

 input: peptide sequence of interest
return: {OBJECT - "string":int} of {parent ion charge:number of observations} pairs

Example: retrieve the total observations of SPSSVEPVADMLMGLFFR by parent ion charge

http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR.

GET /peptide/xx/acc=ACCESSION&pos=1-N&w=n => {OBJECT}

 input: peptide sequence of interest
return: {OBJECT - "string":int} of {protein coordinate:number of observations} pairs for select modifications

"xx" can be any of the following:

  1. "pf"- S/T/Y phosphorylation;
  2. "af"- K acetylation;
  3. "uf"- K ubiquitination;
  4. "di"- R dimethylation;
  5. "ox"- P/K oxidation;
  6. "su"- K sumoylation;
  7. "nq"- N/Q deamidation;
  8. "ol"- S/T O-linked glycosylation;
  9. "ct"- R citrulination;
  10. "sc"- K succinylation;
  11. "qc"- Q cyclization; and
  12. "gl"- E gamma-carboxylation.

Example: retrieve the total observations of phosphorylation for ENSP00000384660

http://gpmdb.thegpm.org/1/peptide/pf/acc=ENSP00000384660&pos=1-706&w=n.

/protein/ services

Protein services give the programmer access of information about the proteins currently stored in GPMDB. All protein services require an appropriate protein accession number to identify the protein sequence being queried.

GET /protein/best_e/acc=ACCESSION => [ARRAY]

 input: accession number for the protein of interest
return: [ARRAY - float] the log10(E) for the best observation of ACCESSION

Example: retrieve the lowest E value for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459.

GET /protein/count/acc=ACCESSION => [ARRAY]

 input: accession number for the protein of interest
return: [ARRAY - int] the total number of observations of ACCESSION

Example: retrieve the number of time ENSMUSP00000026459 has been observed

http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459

GET /protein/description/acc=ACCESSION => [ARRAY]

 input: accession number for the protein of interest
return: [ARRAY - "string"] text description of the protein ACCESSION

Example: retrieve a text description of ENSMUSP00000026459

http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459

GET /protein/evidence/acc=ACCESSION => {OBJECT}

 input: accession number for the protein of interest
return: {OBJECT - "string":"string"} evidence code information about status of the protein

Example: retrieve the evidence codes for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459

GET /protein/keyword/key=KEYWORD(&filter=FILTER) => {OBJECT}

 input: a keyword in a protein description
        a string to filter the accession numbers (default = "human")
return: {OBJECT - "string":"string"} information about the proteins matching the keyword

Example: list human proteins with the keyword "actinin"

http://rest.thegpm.org/1/protein/keyword/key=actinin

GET /protein/modifications/acc=ACCESSION&mod=MODMASS&res=LIST&maxe=MAXEXPECT => {OBJECT}

 input: accession number for the protein of interest
        mass (Da) of the modification, e.g., phosphorylation = 80
        list of residue types that may be modified, e.g., phosphorylation = STY
        log10(E) of the maximum allowed expectation value for a valid observation
return: {OBJECT - "string":int} of {modified site:number of observations} for ACCESSION

Example: retrieve the phosphorylated S, T or Y residues for YKL112W

http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0

Example: retrieve all acetylated residues for ENSP00000281111

http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0

GET /protein/omega/acc=ACCESSION&seq=SEQUENCE(,SEQUENCE2,...) => {OBJECT}

 input: accession number for the protein of interest
        comma separated list of peptide sequences in single letter code
return: {OBJECT - "string":[{"string":[float,float,float,float]}]}
        {"ACCESSION":[{"SEQUENCE":[ω(z=1),ω(z=2),ω(z=3),ω(z=4)]}]}
 added: 2013-05-05

Example: retrieve the omega peptide frequencies for LDATTVLSR and AAQASAAPK observed for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK

GET /protein/peptide_count/acc=ACCESSION&seq=SEQUENCE => {OBJECT}

 input: accession number for the protein of interest
        peptide sequence in single letter code
return: {OBJECT - "string":int} of {parent ion charge:number of observations} of SEQUENCE in ACCESSION

Example: retrieve the number of times LDATTVLSR was observed for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR

GET /protein/peptide_sequences/acc=ACCESSION => [ARRAY]

 input: accession number for the protein of interest
return: [ARRAY - "string"] of all peptides observed for ACCESSION

Example: retrieve the peptide sequences observed for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459

GET /protein/peptides_z/acc=ACCESSION => {OBJECT}

 input: accession number for the protein of interest
return: {OBJECT - "string":int} of the {parent ion charge:number of peptide observations} for ACCESSION

Example: retrieve the number of peptide observations by parent ion charge state for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459

GET /protein/peptides_total/acc=ACCESSION => [ARRAY]

 input: accession number for the protein of interest
return: [ARRAY - int] number of observations of peptides associated with ACCESSION

Example: retrieve the total number of peptide observations for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459

GET /protein/savs/acc=ACCESSION => [ARRAY of OBJECT "string":int]

 input: accession number for the protein of interest
return: [ARRAY - OBJECT "string":int] an array of observed single amino acid variants (SAVs) and the number of observations for the protein ACCESSION

NOTE: this method was originally "/protein/polymorphisms", which still functions as a URL.

Example: retrieve a summary of SAVs found for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/savs/acc=ENSMUSP00000026459

GET /protein/sequence/acc=ACCESSION => [ARRAY]

 input: accession number for the protein of interest
return: [ARRAY - "string"] the amino acid sequence of ACCESSION

Example: retrieve the protein sequence for ENSMUSP00000026459

http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459

Source code

These services are being implemented in PERL, the language that is used to provide most of the web page creation logic for the GPMDB system. The code is available at ftp://ftp.thegpm.org/repos/gpmdb_rest. The files currently available are as follows:

  1. rest.pl - provides an interface between Apache CGI calls and the database layer;
  2. gpmdb_rest.pl - a PERL module for most requests;
  3. gpm_rest.pl - a PERL module for "model" requests; and
  4. rest_client.pl - a demonstration command-line client that provides examples of how to use the services with PERL.

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Manuscripts and Presentations

A manuscript describing this API has been published: The GPMDB REST Interface, David Fenyö; Ronald C. Beavis, Bioinformatics 2015; doi: 10.1093/bioinformatics/btv107.

A poster presenting the API was shown at the US HUPO Conference (2015) in Tempe. You can download a PDF of the presentation here.

Revision date and status

Reference name Revision date Document status Stable URL
GPM-2012.6.11 2015.02.19 final specification http://icex.ca/ice.0.m
Personal tools