G2PDB REST

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
(Created page with "The purpose of this document is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services …")
 
(56 intermediate revisions not shown)
Line 1: Line 1:
-
The purpose of this document is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services to commonly required information based on the data in g2pDB, a collection of post-translational modification acceptor site mappings to the human genome. These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application.
+
[[File:Rest_g2pdb.png|center|GPMDB REST API channel 2]]
 +
The purpose of this document is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services to commonly required information based on the data in the novel database g2pDB. g2pDB is a collection of auto-curated post-translational modification acceptor sites mapped to their associated genomic codons (in chromosome coordinates) stored in a MongoDB database. The current version has mappings for the most reproducible human STY-phosphorylation, K-ubiquitinylation and K-acetylation acceptor sites found using the data available in the GPMDB system. The details of the construction of the database and REST interface have been published ([https://doi.org/10.1021/acs.jproteome.5b01018 doi:10.1021/acs.jproteome.5b01018]).
 +
 
 +
These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application. The services are available to anyone involved in computational, biological or biomedical research. We, the providers, retain the right to block access by any user or system that we feel is using the information or infrastructure inappropriately.
 +
 
 +
The methods described here use the following base URLs:
 +
 
 +
#http://openslice.fenyolab.org/g2pdb; or
 +
#http://g2pdb.thegpm.org/g2pdb.
 +
 
 +
Anyone interested in direct access to the production database of experimental results that was used to construct g2pDB should look at the [http://wiki.thegpm.org/wiki/GPMDB_REST GPMDB REST API].
 +
 
 +
==Web site==
 +
A preliminary web site that shows the contents of g2pDB for a particular gene, transcript or protein is available at [http://g2pdb.thegpm.org/search http://g2pdb.thegpm.org/search].
==Conventions==
==Conventions==
Line 5: Line 18:
The following statements refer to all methods and description given below:
The following statements refer to all methods and description given below:
-
#all GETs are made to the base URL "http://rest.thegpm.org/2";
+
#all GETs are made to the base URL, which is either:
 +
##"http://openslice.fenyolab.org/g2pdb";
 +
##or "http://g2pdb.thegpm.org/g2pdb".;
#all bases are expressed in single letter code (A, C, T, G);
#all bases are expressed in single letter code (A, C, T, G);
-
#protein accession numbers use ENSEMBL 70; and
+
#modifications allowed are "Phospho", "Acetyl" or "GlyGly" (ubiquitinyl);
-
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT.
+
#genome coordinates refer to Genome Reference Consortium Human genome builds (GRCh37 and GRCh38);
 +
#protein accession numbers use ENSEMBL numbers for the appropriate build;
 +
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT; and
 +
#protein post-translational modifications currently covered in g2pDB are S,T & Y phosphorylation (MOD=Phospho), lysine ubiquitinylation (MOD=GlyGly) and lysine acetylation (MOD=Acetyl).
 +
 
 +
==Available sequence builds==
 +
 
 +
The protein and genome sequence dependent information requires the specification of particular ENSEMBL and GRCh sequence assemblies. The assemblies currently available are as follows:
 +
 
 +
# /grch37/ensembl_70; and
 +
# /grch38/ensembl_76.
==/interface/ services==
==/interface/ services==
Line 14: Line 39:
Interface services give the programmer access to information about the current state of the REST interface.
Interface services give the programmer access to information about the current state of the REST interface.
-
===GET /help => [ARRAY]===
+
===GET / => [ARRAY]===
   input: none
   input: none
Line 21: Line 46:
Example: find help information through the interface
Example: find help information through the interface
-
[http://rest.thegpm.org/2/interface/help http://rest.thegpm.org/2/help]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/ http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/]
-
 
+
-
===GET /interface/version => [ARRAY]===
+
-
 
+
-
  input: none
+
-
return: [ARRAY - "string"] the version number of the REST interface
+
-
 
+
-
Example: find the current REST interface version number
+
-
 
+
-
[http://rest.thegpm.org/2/interface/version http://rest.thegpm.org/2/interface/version]
+
==/dna/ services==
==/dna/ services==
Line 36: Line 52:
Dna services give access to PTM-linked information about particular genomic locations.
Dna services give access to PTM-linked information about particular genomic locations.
-
===GET /dna/CHR/POS => {OBJECT}===
+
===GET /grch37/ensembl_70/dna/CHR/POS => [{OBJECT1},{OBJECT2},...]===
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome  
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome  
-
  return: {OBJECT} objects containing text descriptions about the specified base
+
  return: ARRAY of objects containing text description of all modification linked to the specified base
-
Example: list PTM information linked to base 8925354 on Chromsome 1
+
Example: list PTM information linked to base 8925354 on Chromosome 1
-
[http://rest.thegpm.org/2/g2pdb/dna/1/8925354/ http://rest.thegpm.org/2/g2pdb/dna/1/8925354]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354 http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354]
-
===GET /model/proteins/gpm=GPM => [ARRAY]===
+
===GET /grch37/ensembl_70/dna/CHR/POS/mod=MOD => [{OBJECT1},{OBJECT2},...]===
-
   input: a GPM model accession number (GPMddddddddddd)
+
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, MOD = modification
-
  return: [ARRAY - "string"} accession numbers for primary proteins in the model
+
  return: ARRAY of objects containing text description of the specified modification linked to the specified base
-
Example: list proteins found in GPM10100159682
+
Example: list acetylation modification information linked to base 8925354 on Chromosome 1
-
[http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682 http://rest.thegpm.org/1/model/proteins/gpm=GPM10100159682]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl]
-
===GET /model/protein_modifications/gpm=GPM&acc=ACC => [ARRAY]===
 
-
  input: a GPM model accession number (GPMddddddddddd)
+
===GET /grch37/ensembl_70/dna/CHR/POS/snp=BASE => [[{OBJECT1},{OBJECT2},...],[{OBJECT1},{OBJECT2},...]]===
-
        accession number for the protein of interest (ACC)
+
-
return: [ARRAY - "string"] all residue modifications assigned to ACC, as specified in [http://wiki.thegpm.org/wiki/Nomenclature_for_the_description_of_protein_sequence_modifications#.28.23.29_Specifying_modifications_as_a_change_in_mass RFC GPM-2011.12.14 §2.3].
+
-
Example: list residue modifications found in GPM10100159682 for the protein ENSP00000324804
+
  input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base
 +
return: ARRAY of objects containing text description of modifications linked to the canonical BASE<br/> at the given position, followed by ARRAY of objects containing text description of modifications<br /> changed by the variant BASE
-
[http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804 http://rest.thegpm.org/1/model/protein_modifications/gpm=GPM10100159682&acc=ENSP00000324804]
+
Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1 (no change) <br />
 +
Note: First and second ARRAYs are identical due to no change as a result of variant base
-
===GET /model/protein_peptides/gpm=GPM&acc=ACC => {OBJECT}===
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A]
-
  input: a GPM model accession number (GPMddddddddddd)
+
Example: list modification changes caused by the variant base C at position 8925354 on Chromosome 1 (removes modification) <br />
-
        accession number for the protein of interest (ACC)
+
Note: Second ARRAY is empty due to removal of modification as a result of the variant base
-
return: {OBJECT - "string":int} peptide sequence : &times;observed for peptide sequences assigned to ACC
+
-
Example: list peptides found in GPM10100159682 for the protein ENSP00000324804
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C]
-
 
+
-
[http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804 http://rest.thegpm.org/1/model/protein_peptides/gpm=GPM10100159682&acc=ENSP00000324804]
+
-
 
+
-
===GET /model/protein_polymorphisms/gpm=GPM&acc=ACC => [ARRAY]===
+
-
 
+
-
  input: a GPM model accession number (GPMddddddddddd)
+
-
        accession number for the protein of interest (ACC)
+
-
return: [ARRAY - "string"] all residue polymorphisms assigned to ACC, as specified in [http://wiki.thegpm.org/wiki/Nomenclature_for_the_description_of_protein_sequence_modifications#.28:p..29_Specifying_amino_acid_residue_changes_caused_by_nucleic_acid_sequence_variants RFC GPM-2011.12.14 &sect;2.5].
+
-
 
+
-
Example: list residue polymorphisms found in GPM10100159682 for the protein ENSP00000331514
+
-
 
+
-
[http://rest.thegpm.org/1/model/protein_polymorphisms/gpm=GPM10100159682&acc=ENSP00000331514 http://rest.thegpm.org/1/model/protein_polymorphisms/gpm=GPM10100159682&acc=ENSP00000331514]
+
-
 
+
-
===GET /model/protein_sequence/gpm=GPM&acc=ACC => [ARRAY]===
+
-
 
+
-
  input: a GPM model accession number (GPMddddddddddd)
+
-
        accession number for the protein of interest (ACC)
+
-
return: [ARRAY - "string"] the sequence of protein ACC identified in the model.
+
-
 
+
-
Example: get the protein sequence of ENSP00000331514 found in GPM10100159682
+
-
 
+
-
[http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514 http://rest.thegpm.org/1/model/protein_sequence/gpm=GPM10100159682&acc=ENSP00000331514]
+
-
 
+
-
===GET /model/protein_species/acc=ACC1,ACC2,ACC3,... => [OBJECT]===
+
-
 
+
-
  input: a comma-separated list of protein accession numbers
+
-
return: [OBJECT - "string":"string"] the accession numbers and associated binomial species names.
+
-
 
+
-
Example: get the species associated with the protein accession numbers ENSMUSP00000026459, ENSP00000339186, gi|240103100|, and sp|ALBU_BOVIN|
+
-
 
+
-
[http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN| http://rest.thegpm.org/1/protein/species/acc=ENSMUSP00000026459,ENSP00000339186,gi|240103100|,sp|ALBU_BOVIN|]
+
-
 
+
-
 
+
-
==/peptide/ services==
+
-
 
+
-
Peptide services give the programmer access of information about the peptides currently stored in GPMDB. All
+
-
peptide services require an appropriate peptide amino acid sequence to identify the target peptide.
+
-
 
+
-
===GET /peptide/accessions/seq=SEQUENCE => {OBJECT}===
+
-
 
+
-
  input: peptide sequence of interest
+
-
return: {OBJECT - "string":int} of {accession number:number of observations} pairs
+
-
 
+
-
Example: retrieve the accession numbers and observations of SPSSVEPVADMLMGLFFR
+
-
 
+
-
[http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/accessions/seq=SPSSVEPVADMLMGLFFR].
+
-
 
+
-
===GET /peptide/count/seq=SEQUENCE => [ARRAY]===
+
-
 
+
-
  input: peptide sequence of interest
+
-
return: [ARRAY - int] the total number of observations
+
-
 
+
-
Example: retrieve the total observations of SPSSVEPVADMLMGLFFR
+
-
 
+
-
[http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/count/seq=SPSSVEPVADMLMGLFFR].
+
-
 
+
-
===GET /peptide/count_z/seq=SEQUENCE => {OBJECT}===
+
-
 
+
-
  input: peptide sequence of interest
+
-
return: {OBJECT - "string":int} of {parent ion charge:number of observations} pairs
+
-
 
+
-
Example: retrieve the total observations of SPSSVEPVADMLMGLFFR by parent ion charge
+
-
 
+
-
[http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR http://rest.thegpm.org/1/peptide/count_z/seq=SPSSVEPVADMLMGLFFR].
+
==/protein/ services==
==/protein/ services==
-
Protein services give the programmer access of information about the proteins currently stored in GPMDB. All
+
Peptide services access to all of the genomic mapping information associated with PTMs for a particular
-
protein services require an appropriate protein accession number to identify the protein sequence being
+
protein.
-
queried.
+
-
===GET /protein/best_e/acc=ACCESSION => [ARRAY]===
+
===GET /grch37/ensembl_70/protein/ACC => [{OBJECT1},{OBJECT2},...]===
-
   input: accession number for the protein of interest
+
   input: ACC = ENSEMBL protein accession number
-
  return: [ARRAY - float] the log10(E) for the best observation of ACCESSION
+
  return: ARRAY of PTM mapping information objects
-
Example: retrieve the lowest E value for ENSMUSP00000026459
+
Example: retrieve the PTM mappings for ENSP00000234590
-
[http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/best_e/acc=ENSMUSP00000026459].
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590 http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590].
-
===GET /protein/count/acc=ACCESSION => [ARRAY]===
+
===GET /grch37/ensembl_70/protein/ACC/mod=MOD => [{OBJECT1},{OBJECT2},...]]===
-
   input: accession number for the protein of interest
+
   input: ACC = ENSEMBL protein accession number, MOD = modification
-
  return: [ARRAY - int] the total number of observations of ACCESSION
+
  return: ARRAY of PTM mapping information objects for MOD only
-
Example: retrieve the number of time ENSMUSP00000026459 has been observed
+
Example: retrieve the ubiquitinylation mappings for ENSP00000234590
-
[http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/count/acc=ENSMUSP00000026459]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590/mod=GlyGly http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/g2pdb/protein/ENSP00000234590/mod=GlyGly].
-
===GET /protein/description/acc=ACCESSION => [ARRAY]===
+
==Source code==
 +
These services and for creating g2pDB were implemented in Python. The code is available at [ftp://ftp.thegpm.org/projects/g2pDB/human/code ftp://ftp.thegpm.org/projects/g2pDB/human/code].
-
  input: accession number for the protein of interest
+
==Data sources==
-
return: [ARRAY - "string"] text description of the protein ACCESSION
+
The information used to construct g2pDB can be found at [ftp://ftp.thegpm.org/projects/g2pDB ftp://ftp.thegpm.org/projects/g2pDB]. The data directories are as follows:
-
Example: retrieve a text description of ENSMUSP00000026459
+
#[ftp://ftp.thegpm.org/projects/g2pDB/human/ptm /projects/g2pDB/human/ptm] - contains the files mapping the three PTM types in proteome and genome coordinates;
-
 
+
#[ftp://ftp.thegpm.org/projects/g2pDB/human/gff /projects/g2pDB/human/gff] - contains the corresponding GFF3 genome feature annotation files; and
-
[http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/description/acc=ENSMUSP00000026459]
+
#[ftp://ftp.thegpm.org/projects/g2pDB/human/mongodb /projects/g2pDB/human/mongodb] - the database files used by MongoDB.
-
 
+
-
===GET /protein/evidence/acc=ACCESSION => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: {OBJECT - "string":"string"} [[GPMDB evidence codes|evidence code]] information about status of the protein
+
-
 
+
-
Example: retrieve the evidence codes for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/evidence/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/keyword/key=KEYWORD(&filter=FILTER) => {OBJECT}===
+
-
 
+
-
  input: a keyword in a protein description
+
-
        a string to filter the accession numbers (default = "human")
+
-
return: {OBJECT - "string":"string"} information about the proteins matching the keyword
+
-
 
+
-
Example: list human proteins with the keyword "actinin"
+
-
 
+
-
[http://rest.thegpm.org/1/protein/keyword/key=actinin http://rest.thegpm.org/1/protein/keyword/key=actinin]
+
-
 
+
-
===GET /protein/modifications/acc=ACCESSION&mod=MODMASS&res=LIST&maxe=MAXEXPECT => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
        mass (Da) of the modification, e.g., phosphorylation = 80
+
-
        list of residue types that may be modified, e.g., phosphorylation = STY
+
-
        log10(E) of the maximum allowed expectation value for a valid observation
+
-
return: {OBJECT - "string":int} of {modified site:number of observations} for ACCESSION
+
-
 
+
-
Example: retrieve the phosphorylated S, T or Y residues for YKL112W
+
-
 
+
-
[http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0 http://rest.thegpm.org/1/protein/modifications/acc=YKL112W&mod=80&res=STY&maxe=-2.0]
+
-
 
+
-
Example: retrieve all acetylated residues for ENSP00000281111
+
-
 
+
-
[http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0 http://rest.thegpm.org/1/protein/modifications/acc=ENSP00000281111&mod=42&res=&maxe=-2.0]
+
-
 
+
-
===GET /protein/omega/acc=ACCESSION&seq=SEQUENCE(,SEQUENCE2,...) => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
        comma separated list of peptide sequences in single letter code
+
-
return: {OBJECT - "string":[{"string":[float,float,float,float]}]}
+
-
        {"ACCESSION":[{"SEQUENCE":[&omega;(z=1),&omega;(z=2),&omega;(z=3),&omega;(z=4)]}]}
+
-
  added: 2013-05-05
+
-
 
+
-
Example: retrieve the [[Technical Overview, omega display|omega peptide frequencies]] for LDATTVLSR and AAQASAAPK observed for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK http://rest.thegpm.org/1/protein/omega/acc=ENSMUSP00000026459&seq=LDATTVLSR,AAQASAAPK]
+
-
 
+
-
===GET /protein/peptide_count/acc=ACCESSION&seq=SEQUENCE => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
        peptide sequence in single letter code
+
-
return: {OBJECT - "string":int} of {parent ion charge:number of observations} of SEQUENCE in ACCESSION
+
-
 
+
-
Example: retrieve the number of times LDATTVLSR was observed for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR http://rest.thegpm.org/1/protein/peptide_count/acc=ENSMUSP00000026459&seq=LDATTVLSR]
+
-
 
+
-
===GET /protein/peptide_sequences/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - "string"] of all peptides observed for ACCESSION
+
-
 
+
-
Example: retrieve the peptide sequences observed for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptide_sequences/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/peptides_z/acc=ACCESSION => {OBJECT}===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: {OBJECT - "string":int} of the {parent ion charge:number of peptide observations} for ACCESSION
+
-
 
+
-
Example: retrieve the number of peptide observations by parent ion charge state for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptides_z/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/peptides_total/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - int] number of observations of peptides associated with ACCESSION
+
-
 
+
-
Example: retrieve the total number of peptide observations for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/peptides_total/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/polymorphisms/acc=ACCESSION => [ARRAY of OBJECT "string":int]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - OBJECT "string":int] an array of observed polymorphisms and the number of observations for the protein ACCESSION
+
-
 
+
-
Example: retrieve a summary of amino acid polymorphisms found for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/polymorphisms/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/polymorphisms/acc=ENSMUSP00000026459]
+
-
 
+
-
===GET /protein/sequence/acc=ACCESSION => [ARRAY]===
+
-
 
+
-
  input: accession number for the protein of interest
+
-
return: [ARRAY - "string"] the amino acid sequence of ACCESSION
+
-
 
+
-
Example: retrieve the protein sequence for ENSMUSP00000026459
+
-
 
+
-
[http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459 http://rest.thegpm.org/1/protein/sequence/acc=ENSMUSP00000026459]
+
-
 
+
-
==Source code==
+
-
These services are being implemented in PERL, the language that is used to provide most of the web page creation logic for the GPMDB system. The code is available at [ftp://ftp.thegpm.org/repos/gpmdb_rest ftp://ftp.thegpm.org/repos/gpmdb_rest]. The files currently available are as follows:
+
-
 
+
-
#rest.pl - provides an interface between Apache CGI calls and the database layer;
+
-
#gpmdb_rest.pl - a PERL module for most requests;
+
-
#gpm_rest.pl - a PERL module for "model" requests; and
+
-
#rest_client.pl - a demonstration command-line client that provides examples of how to use the services with PERL.
+
==Comments and suggestions==
==Comments and suggestions==
Line 283: Line 123:
==Revision date and status==
==Revision date and status==
 +
 +
#2014.11.26 - first draft available
 +
#2014.11.27 - revised URLs to include proteome and genome version information
 +
#2015.03.04 - updated text with expanded discussion of g2pDB
 +
#2015.10.26 - updated text to better explain some methods
{| class="wikitable"
{| class="wikitable"
Line 290: Line 135:
! Stable URL
! Stable URL
|- align="center"
|- align="center"
-
| GPM-2012.6.11
+
| GPM-2014.11.27
-
| 2013.05.05
+
| 2015.10.26
-
| final specification
+
| draft specification
-
| [http://icex.ca/ice.0.m http://icex.ca/ice.0.m]
+
| http://g2pdb.org/rest
|}
|}

Latest revision as of 17:53, 10 June 2016

GPMDB REST API channel 2

The purpose of this document is to define a set of straightforward REST (REpresentational State Transfer) services to commonly required information based on the data in the novel database g2pDB. g2pDB is a collection of auto-curated post-translational modification acceptor sites mapped to their associated genomic codons (in chromosome coordinates) stored in a MongoDB database. The current version has mappings for the most reproducible human STY-phosphorylation, K-ubiquitinylation and K-acetylation acceptor sites found using the data available in the GPMDB system. The details of the construction of the database and REST interface have been published (doi:10.1021/acs.jproteome.5b01018).

These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application. The services are available to anyone involved in computational, biological or biomedical research. We, the providers, retain the right to block access by any user or system that we feel is using the information or infrastructure inappropriately.

The methods described here use the following base URLs:

  1. http://openslice.fenyolab.org/g2pdb; or
  2. http://g2pdb.thegpm.org/g2pdb.

Anyone interested in direct access to the production database of experimental results that was used to construct g2pDB should look at the GPMDB REST API.

Contents

Web site

A preliminary web site that shows the contents of g2pDB for a particular gene, transcript or protein is available at http://g2pdb.thegpm.org/search.

Conventions

The following statements refer to all methods and description given below:

  1. all GETs are made to the base URL, which is either:
    1. "http://openslice.fenyolab.org/g2pdb";
    2. or "http://g2pdb.thegpm.org/g2pdb".;
  2. all bases are expressed in single letter code (A, C, T, G);
  3. modifications allowed are "Phospho", "Acetyl" or "GlyGly" (ubiquitinyl);
  4. genome coordinates refer to Genome Reference Consortium Human genome builds (GRCh37 and GRCh38);
  5. protein accession numbers use ENSEMBL numbers for the appropriate build;
  6. all return values are in JSON notation (RFC 4627) as either an ARRAY or OBJECT; and
  7. protein post-translational modifications currently covered in g2pDB are S,T & Y phosphorylation (MOD=Phospho), lysine ubiquitinylation (MOD=GlyGly) and lysine acetylation (MOD=Acetyl).

Available sequence builds

The protein and genome sequence dependent information requires the specification of particular ENSEMBL and GRCh sequence assemblies. The assemblies currently available are as follows:

  1. /grch37/ensembl_70; and
  2. /grch38/ensembl_76.

/interface/ services

Interface services give the programmer access to information about the current state of the REST interface.

GET / => [ARRAY]

 input: none
return: [ARRAY - "string"] a text description of the current REST interface

Example: find help information through the interface

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/

/dna/ services

Dna services give access to PTM-linked information about particular genomic locations.

GET /grch37/ensembl_70/dna/CHR/POS => [{OBJECT1},{OBJECT2},...]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome 
return: ARRAY of objects containing text description of all modification linked to the specified base

Example: list PTM information linked to base 8925354 on Chromosome 1

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354

GET /grch37/ensembl_70/dna/CHR/POS/mod=MOD => [{OBJECT1},{OBJECT2},...]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, MOD = modification
return: ARRAY of objects containing text description of the specified modification linked to the specified base

Example: list acetylation modification information linked to base 8925354 on Chromosome 1

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl


GET /grch37/ensembl_70/dna/CHR/POS/snp=BASE => [[{OBJECT1},{OBJECT2},...],[{OBJECT1},{OBJECT2},...]]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base 
return: ARRAY of objects containing text description of modifications linked to the canonical BASE
at the given position, followed by ARRAY of objects containing text description of modifications
changed by the variant BASE

Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1 (no change)
Note: First and second ARRAYs are identical due to no change as a result of variant base

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A

Example: list modification changes caused by the variant base C at position 8925354 on Chromosome 1 (removes modification)
Note: Second ARRAY is empty due to removal of modification as a result of the variant base

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C

/protein/ services

Peptide services access to all of the genomic mapping information associated with PTMs for a particular protein.

GET /grch37/ensembl_70/protein/ACC => [{OBJECT1},{OBJECT2},...]

 input: ACC = ENSEMBL protein accession number
return: ARRAY of PTM mapping information objects

Example: retrieve the PTM mappings for ENSP00000234590

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590.

GET /grch37/ensembl_70/protein/ACC/mod=MOD => [{OBJECT1},{OBJECT2},...]]

 input: ACC = ENSEMBL protein accession number, MOD = modification
return: ARRAY of PTM mapping information objects for MOD only

Example: retrieve the ubiquitinylation mappings for ENSP00000234590

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/g2pdb/protein/ENSP00000234590/mod=GlyGly.

Source code

These services and for creating g2pDB were implemented in Python. The code is available at ftp://ftp.thegpm.org/projects/g2pDB/human/code.

Data sources

The information used to construct g2pDB can be found at ftp://ftp.thegpm.org/projects/g2pDB. The data directories are as follows:

  1. /projects/g2pDB/human/ptm - contains the files mapping the three PTM types in proteome and genome coordinates;
  2. /projects/g2pDB/human/gff - contains the corresponding GFF3 genome feature annotation files; and
  3. /projects/g2pDB/human/mongodb - the database files used by MongoDB.

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Revision date and status

  1. 2014.11.26 - first draft available
  2. 2014.11.27 - revised URLs to include proteome and genome version information
  3. 2015.03.04 - updated text with expanded discussion of g2pDB
  4. 2015.10.26 - updated text to better explain some methods
Reference name Revision date Document status Stable URL
GPM-2014.11.27 2015.10.26 draft specification http://g2pdb.org/rest
Personal tools