G2PDB REST

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
(Created page with "The purpose of this document is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services …")
Line 7: Line 7:
#all GETs are made to the base URL "http://rest.thegpm.org/2";
#all GETs are made to the base URL "http://rest.thegpm.org/2";
#all bases are expressed in single letter code (A, C, T, G);
#all bases are expressed in single letter code (A, C, T, G);
 +
#modifications allowed are Phosphoryl, Acetyl or GlyGly (ubiquitinyl);
#protein accession numbers use ENSEMBL 70; and
#protein accession numbers use ENSEMBL 70; and
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT.
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT.
Line 39: Line 40:
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome  
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome  
-
  return: {OBJECT} objects containing text descriptions about the specified base
+
  return: {OBJECT} objects containing text description of all modification linked to the specified base
-
Example: list PTM information linked to base 8925354 on Chromsome 1
+
Example: list PTM information linked to base 8925354 on Chromosome 1
[http://rest.thegpm.org/2/g2pdb/dna/1/8925354/ http://rest.thegpm.org/2/g2pdb/dna/1/8925354]
[http://rest.thegpm.org/2/g2pdb/dna/1/8925354/ http://rest.thegpm.org/2/g2pdb/dna/1/8925354]
-
===GET /model/proteins/gpm=GPM => [ARRAY]===
+
===GET /dna/CHR/POS/mod=MOD => [ARRAY]===
-
   input: a GPM model accession number (GPMddddddddddd)
+
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, MOD = modification
-
  return: [ARRAY - "string"} accession numbers for primary proteins in the model
+
  return: {OBJECT} objects containing text description of the specified modification linked to the specified base
-
Example: list proteins found in GPM10100159682
+
Example: list acetylation modification information linked to base 8925354 on Chromosome 1
-
[http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682 http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682]
+
[http://rest.thegpm.org/2/g2pdb/dna/1/8925354/mod=Acetyl http://rest.thegpm.org/2/g2pdb/dna/1/8925354/mod=Acetyl]
-
===GET /model/protein_modifications/gpm=GPM&acc=ACC => [ARRAY]===
 
-
  input: a GPM model accession number (GPMddddddddddd)
+
===GET /dna/CHR/POS/snp=BASE => {OBJECT}===
-
        accession number for the protein of interest (ACC)
+
-
return: [ARRAY - "string"] all residue modifications assigned to ACC, as specified in [http://wiki.thegpm.org/wiki/Nomenclature_for_the_description_of_protein_sequence_modifications#.28.23.29_Specifying_modifications_as_a_change_in_mass RFC GPM-2011.12.14 §2.3].
+
-
Example: list residue modifications found in GPM10100159682 for the protein ENSP00000324804
+
  input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base
 +
return: {OBJECT} objects containing text description of modifications changed by the variant BASE
-
[http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804 http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804]
+
Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1
-
===GET /model/protein_peptides/gpm=GPM&acc=ACC => {OBJECT}===
+
[http://rest.thegpm.org/2/g2pdb/dna/1/8030998/snp=A http://rest.thegpm.org/2/g2pdb/dna/1/8030998/snp=A]
-
 
+
-
  input: a GPM model accession number (GPMddddddddddd)
+
-
        accession number for the protein of interest (ACC)
+
-
return: {OBJECT - "string":int} peptide sequence : ×observed for peptide sequences assigned to ACC
+
-
 
+
-
Example: list peptides found in GPM10100159682 for the protein ENSP00000324804
+
-
 
+
-
[http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804 http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804]
+
-
 
+
-
===GET /model/protein_polymorphisms/gpm=GPM&acc=ACC => [ARRAY]===
+
-
 
+
-
  input: a GPM model accession number (GPMddddddddddd)
+
-
        accession number for the protein of interest (ACC)
+
-
return: [ARRAY - "string"] all residue polymorphisms assigned to ACC, as specified in [http://wiki.thegpm.org/wiki/Nomenclature_for_the_description_of_protein_sequence_modifications#.28:p..29_Specifying_amino_acid_residue_changes_caused_by_nucleic_acid_sequence_variants RFC GPM-2011.12.14 §2.5].
+
-
 
+
-
Example: list residue polymorphisms found in GPM10100159682 for the protein ENSP00000331514
+
-
 
+
-
[http://rest.thegpm.org/1/model/protein_polymorphisms/gpm=GPM10100159682&acc=ENSP00000331514 http://rest.thegpm.org/1/model/protein_polymorphisms/gpm=GPM10100159682&acc=ENSP00000331514]
+
-
 
+
-
===GET /model/protein_sequence/gpm=GPM&acc=ACC => [ARRAY]===
+
-
 
+
-
  input: a GPM model accession number (GPMddddddddddd)
+
-
        accession number for the protein of interest (ACC)
+
-
return: [ARRAY - "string"] the sequence of protein ACC identified in the model.
+
-
 
+
-
Example: get the protein sequence of ENSP00000331514 found in GPM10100159682
+
-
 
+
-
[http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514 http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514]
+
-
 
+
-
===GET /model/protein_species/acc=ACC1,ACC2,ACC3,... => [OBJECT]===
+
-
 
+
-
  input: a comma-separated list of protein accession numbers
+
-
return: [OBJECT - "string":"string"] the accession numbers and associated binomial species names.
+
-
 
+
-
Example: get the species associated with the protein accession numbers ENSMUSP00000026459, ENSP00000339186, gi|240103100|, and sp|ALBU_BOVIN|
+
-
 
+
-
[http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN| http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN|]
+
-
 
+
-
 
+
-
==/peptide/ services==
+
-
 
+
-
Peptide services give the programmer access of information about the peptides currently stored in GPMDB. All
+
-
peptide services require an appropriate peptide amino acid sequence to identify the target peptide.
+
-
 
+
-
===GET /peptide/accessions/seq=SEQUENCE => {OBJECT}===
+
-
 
+
-
  input: peptide sequence of interest
+
-
return: {OBJECT - "string":int} of {accession number:number of observations} pairs
+
-
 
+
-
Example: retrieve the accession numbers and observations of SPSSVEPVADMLMGLFFR
+
-
 
+
-
[http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR].
+
-
 
+
-
===GET /peptide/count/seq=SEQUENCE => [ARRAY]===
+
-
 
+
-
  input: peptide sequence of interest
+
-
return: [ARRAY - int] the total number of observations
+
-
 
+
-
Example: retrieve the total observations of SPSSVEPVADMLMGLFFR
+
-
 
+
-
[http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR].
+
-
 
+
-
===GET /peptide/count_z/seq=SEQUENCE => {OBJECT}===
+
-
 
+
-
  input: peptide sequence of interest
+
-
return: {OBJECT - "string":int} of {parent ion charge:number of observations} pairs
+
-
 
+
-
Example: retrieve the total observations of SPSSVEPVADMLMGLFFR by parent ion charge
+
-
 
+
-
[http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR].
+
==/protein/ services==
==/protein/ services==
-
Protein services give the programmer access of information about the proteins currently stored in GPMDB. All
+
Peptide services access to all of the genomic mapping information associated with PTMs for a particular
-
protein services require an appropriate protein accession number to identify the protein sequence being
+
protein.
-
queried.
+
-
 
+
-
===GET /protein/best_e/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - float] the log10(E) for the best observation of ACCESSION
+
-
 
+
-
Example: retrieve the lowest E value for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459].
+
-
 
+
-
===GET /protein/count/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - int] the total number of observations of ACCESSION
+
-
 
+
-
Example: retrieve the number of time ENSMUSP00000026459 has been observed
+
-
 
+
-
[http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/description/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - "string"] text description of the protein ACCESSION
+
-
 
+
-
Example: retrieve a text description of ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/evidence/acc=ACCESSION => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: {OBJECT - "string":"string"} [[GPMDB evidence codes|evidence code]] information about status of the protein
+
-
 
+
-
Example: retrieve the evidence codes for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/keyword/key=KEYWORD(&filter=FILTER) => {OBJECT}===
+
-
 
+
-
  input: a keyword in a protein description
+
-
        a string to filter the accession numbers (default = "human")
+
-
return: {OBJECT - "string":"string"} information about the proteins matching the keyword
+
-
 
+
-
Example: list human proteins with the keyword "actinin"
+
-
 
+
-
[http://rest.thegpm.org/1/protein/keyword/key=actinin http://rest.thegpm.org/1/protein/keyword/key=actinin]
+
-
 
+
-
===GET /protein/modifications/acc=ACCESSION&mod=MODMASS&res=LIST&maxe=MAXEXPECT => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
        mass (Da) of the modification, e.g., phosphorylation = 80
+
-
        list of residue types that may be modified, e.g., phosphorylation = STY
+
-
        log10(E) of the maximum allowed expectation value for a valid observation
+
-
return: {OBJECT - "string":int} of {modified site:number of observations} for ACCESSION
+
-
 
+
-
Example: retrieve the phosphorylated S, T or Y residues for YKL112W
+
-
 
+
-
[http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0 http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0]
+
-
 
+
-
Example: retrieve all acetylated residues for ENSP00000281111
+
-
 
+
-
[http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0 http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0]
+
-
 
+
-
===GET /protein/omega/acc=ACCESSION&seq=SEQUENCE(,SEQUENCE2,...) => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
        comma separated list of peptide sequences in single letter code
+
-
return: {OBJECT - "string":[{"string":[float,float,float,float]}]}
+
-
        {"ACCESSION":[{"SEQUENCE":[ω(z=1),ω(z=2),ω(z=3),ω(z=4)]}]}
+
-
  added: 2013-05-05
+
-
 
+
-
Example: retrieve the [[Technical Overview, omega display|omega peptide frequencies]] for LDATTVLSR and AAQASAAPK observed for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK]
+
-
 
+
-
===GET /protein/peptide_count/acc=ACCESSION&seq=SEQUENCE => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
        peptide sequence in single letter code
+
-
return: {OBJECT - "string":int} of {parent ion charge:number of observations} of SEQUENCE in ACCESSION
+
-
 
+
-
Example: retrieve the number of times LDATTVLSR was observed for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR]
+
-
 
+
-
===GET /protein/peptide_sequences/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - "string"] of all peptides observed for ACCESSION
+
-
 
+
-
Example: retrieve the peptide sequences observed for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/peptides_z/acc=ACCESSION => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: {OBJECT - "string":int} of the {parent ion charge:number of peptide observations} for ACCESSION
+
-
 
+
-
Example: retrieve the number of peptide observations by parent ion charge state for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/peptides_total/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - int] number of observations of peptides associated with ACCESSION
+
-
 
+
-
Example: retrieve the total number of peptide observations for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459]
+
-
===GET /protein/polymorphisms/acc=ACCESSION => [ARRAY of OBJECT "string":int]===
+
===GET /protein/ACC => {OBJECT}===
-
   input: accession number for the protein of interest
+
   input: ACC = ENSEMBL protein accession number
-
  return: [ARRAY - OBJECT "string":int] an array of observed polymorphisms and the number of observations for the protein ACCESSION
+
  return: {OBJECT} of PTM mapping information
-
Example: retrieve a summary of amino acid polymorphisms found for ENSMUSP00000026459
+
Example: retrieve the PTM mappings for ENSP00000234590
-
[http://rest.thegpm.org/1/protein/polymorphisms/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/polymorphisms/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/2/g2pdb/protein/ENSP00000234590 http://rest.thegpm.org/2/g2pdb/protein/ENSP00000234590].
-
===GET /protein/sequence/acc=ACCESSION => [ARRAY]===
+
===GET /protein/ACC/mod=MOD => [ARRAY]===
-
   input: accession number for the protein of interest
+
   input: ACC = ENSEMBL protein accession number, MOD = modification
-
  return: [ARRAY - "string"] the amino acid sequence of ACCESSION
+
  return: {OBJECT} of PTM mapping information for MOD only
-
Example: retrieve the protein sequence for ENSMUSP00000026459
+
Example: retrieve the ubiquitinylation mappings for ENSP00000234590
-
[http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459]
+
[http://rest.thegpm.org/2/g2pdb/protein/ENSP00000234590/mod=GlyGly http://rest.thegpm.org/2/g2pdb/protein/ENSP00000234590/mod=GlyGly].
==Source code==
==Source code==
-
These services are being implemented in PERL, the language that is used to provide most of the web page creation logic for the GPMDB system. The code is available at [ftp://ftp.thegpm.org/repos/gpmdb_rest ftp://ftp.thegpm.org/repos/gpmdb_rest]. The files currently available are as follows:
+
These services are being implemented in PERL, the language that is used to provide most of the web page creation logic for the GPMDB system. The code is available at [ftp://ftp.thegpm.org/repos/g2pdb_rest ftp://ftp.thegpm.org/repos/g2p_rest]. The files currently available are as follows:
#rest.pl - provides an interface between Apache CGI calls and the database layer;
#rest.pl - provides an interface between Apache CGI calls and the database layer;
-
#gpmdb_rest.pl - a PERL module for most requests;
+
#g2pdb_rest.pl - a PERL module for most requests;
-
#gpm_rest.pl - a PERL module for "model" requests; and
+
-
#rest_client.pl - a demonstration command-line client that provides examples of how to use the services with PERL.
+
==Comments and suggestions==
==Comments and suggestions==
Line 290: Line 105:
! Stable URL
! Stable URL
|- align="center"
|- align="center"
-
| GPM-2012.6.11
+
| GPM-2014.11.26
-
| 2013.05.05
+
| 2014.11.26
| final specification
| final specification
-
| [http://icex.ca/ice.0.m http://icex.ca/ice.0.m]  
+
| http://wiki.thegpm.org/wiki/G2PDB_REST]  
|}
|}

Revision as of 22:07, 26 November 2014

The purpose of this document is to define a set of straightforward REST (REpresentational State Transfer) services to commonly required information based on the data in g2pDB, a collection of post-translational modification acceptor site mappings to the human genome. These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application.

Contents

Conventions

The following statements refer to all methods and description given below:

  1. all GETs are made to the base URL "http://rest.thegpm.org/2";
  2. all bases are expressed in single letter code (A, C, T, G);
  3. modifications allowed are Phosphoryl, Acetyl or GlyGly (ubiquitinyl);
  4. protein accession numbers use ENSEMBL 70; and
  5. all return values are in JSON notation (RFC 4627) as either an ARRAY or OBJECT.

/interface/ services

Interface services give the programmer access to information about the current state of the REST interface.

GET /help => [ARRAY]

 input: none
return: [ARRAY - "string"] a text description of the current REST interface

Example: find help information through the interface

http://rest.thegpm.org/2/help

GET /interface/version => [ARRAY]

 input: none
return: [ARRAY - "string"] the version number of the REST interface

Example: find the current REST interface version number

http://rest.thegpm.org/2/interface/version

/dna/ services

Dna services give access to PTM-linked information about particular genomic locations.

GET /dna/CHR/POS => {OBJECT}

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome 
return: {OBJECT} objects containing text description of all modification linked to the specified base

Example: list PTM information linked to base 8925354 on Chromosome 1

http://rest.thegpm.org/2/g2pdb/dna/1/8925354

GET /dna/CHR/POS/mod=MOD => [ARRAY]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, MOD = modification
return: {OBJECT} objects containing text description of the specified modification linked to the specified base

Example: list acetylation modification information linked to base 8925354 on Chromosome 1

http://rest.thegpm.org/2/g2pdb/dna/1/8925354/mod=Acetyl


GET /dna/CHR/POS/snp=BASE => {OBJECT}

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base 
return: {OBJECT} objects containing text description of modifications changed by the variant BASE

Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1

http://rest.thegpm.org/2/g2pdb/dna/1/8030998/snp=A

/protein/ services

Peptide services access to all of the genomic mapping information associated with PTMs for a particular protein.

GET /protein/ACC => {OBJECT}

 input: ACC = ENSEMBL protein accession number
return: {OBJECT} of PTM mapping information

Example: retrieve the PTM mappings for ENSP00000234590

http://rest.thegpm.org/2/g2pdb/protein/ENSP00000234590.

GET /protein/ACC/mod=MOD => [ARRAY]

 input: ACC = ENSEMBL protein accession number, MOD = modification
return: {OBJECT} of PTM mapping information for MOD only

Example: retrieve the ubiquitinylation mappings for ENSP00000234590

http://rest.thegpm.org/2/g2pdb/protein/ENSP00000234590/mod=GlyGly.

Source code

These services are being implemented in PERL, the language that is used to provide most of the web page creation logic for the GPMDB system. The code is available at ftp://ftp.thegpm.org/repos/g2p_rest. The files currently available are as follows:

  1. rest.pl - provides an interface between Apache CGI calls and the database layer;
  2. g2pdb_rest.pl - a PERL module for most requests;

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Revision date and status

Reference name Revision date Document status Stable URL
GPM-2014.11.26 2014.11.26 final specification http://wiki.thegpm.org/wiki/G2PDB_REST]
Personal tools