G2PDB REST

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
 
(30 intermediate revisions not shown)
Line 1: Line 1:
-
[[File:Rest_gpmdb.png|center|GPMDB REST API channel 2]]
+
[[File:Rest_g2pdb.png|center|GPMDB REST API channel 2]]
-
The purpose of this document is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services to commonly required information based on the data in the novel database g2pDB. g2pDB is a collection of auto-curated post-translational modification acceptor sites mapped to their associated genomic codons (in chromosome coordinates) stored in a MongoDB database. The current version has mappings for the best human STY-phosphorylation, K-ubiquitinylation and K-acetylation acceptor sites found using the data available in the GPMDB system. These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application.
+
The purpose of this document is to define a set of straightforward [http://en.wikipedia.org/wiki/Representational_state_transfer REST (REpresentational State Transfer)] services to commonly required information based on the data in the novel database g2pDB. g2pDB is a collection of auto-curated post-translational modification acceptor sites mapped to their associated genomic codons (in chromosome coordinates) stored in a MongoDB database. The current version has mappings for the most reproducible human STY-phosphorylation, K-ubiquitinylation and K-acetylation acceptor sites found using the data available in the GPMDB system. The details of the construction of the database and REST interface have been published ([https://doi.org/10.1021/acs.jproteome.5b01018 doi:10.1021/acs.jproteome.5b01018]).  
-
The methods described here use "http://openslice.fenyolab.org/rest.dna_ptm" as their base URL. A redirect service for these methods is also available at the base URL "http://rest.thegpm.org/2" for the same methods. These methods are new and they in no way affect version 1 methods, which only access the main production data system GPMDB. The existing methods in version 1 (described [http://wiki.thegpm.org/wiki/GPMDB_REST here]) will remain in place, with the same URLs and output JSON formatting.
+
These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application. The services are available to anyone involved in computational, biological or biomedical research. We, the providers, retain the right to block access by any user or system that we feel is using the information or infrastructure inappropriately.
 +
 
 +
The methods described here use the following base URLs:
 +
 
 +
#http://openslice.fenyolab.org/g2pdb; or
 +
#http://g2pdb.thegpm.org/g2pdb.  
 +
 
 +
Anyone interested in direct access to the production database of experimental results that was used to construct g2pDB should look at the [http://wiki.thegpm.org/wiki/GPMDB_REST GPMDB REST API].
 +
 
 +
==Web site==
 +
A preliminary web site that shows the contents of g2pDB for a particular gene, transcript or protein is available at [http://g2pdb.thegpm.org/search http://g2pdb.thegpm.org/search].
==Conventions==
==Conventions==
Line 9: Line 19:
#all GETs are made to the base URL, which is either:
#all GETs are made to the base URL, which is either:
-
##"http://openslice.fenyolab.org/rest.dna_ptm";
+
##"http://openslice.fenyolab.org/g2pdb";
-
##or "http://rest.thegpm.org/2".;
+
##or "http://g2pdb.thegpm.org/g2pdb".;
#all bases are expressed in single letter code (A, C, T, G);
#all bases are expressed in single letter code (A, C, T, G);
#modifications allowed are "Phospho", "Acetyl" or "GlyGly" (ubiquitinyl);
#modifications allowed are "Phospho", "Acetyl" or "GlyGly" (ubiquitinyl);
-
#genome coordinates refer to Genome Reference Consortium Human genome build 37 (grch37);
+
#genome coordinates refer to Genome Reference Consortium Human genome builds (GRCh37 and GRCh38);
-
#protein accession numbers use ENSEMBL 70; and
+
#protein accession numbers use ENSEMBL numbers for the appropriate build;
-
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT.
+
#all return values are in [http://en.wikipedia.org/wiki/JSON JSON] notation ([http://www.ietf.org/rfc/rfc4627.txt RFC 4627]) as either an ARRAY or OBJECT; and
 +
#protein post-translational modifications currently covered in g2pDB are S,T & Y phosphorylation (MOD=Phospho), lysine ubiquitinylation (MOD=GlyGly) and lysine acetylation (MOD=Acetyl).
 +
 
 +
==Available sequence builds==
 +
 
 +
The protein and genome sequence dependent information requires the specification of particular ENSEMBL and GRCh sequence assemblies. The assemblies currently available are as follows:
 +
 
 +
# /grch37/ensembl_70; and
 +
# /grch38/ensembl_76.
==/interface/ services==
==/interface/ services==
Line 21: Line 39:
Interface services give the programmer access to information about the current state of the REST interface.
Interface services give the programmer access to information about the current state of the REST interface.
-
===GET /interface/help => [ARRAY]===
+
===GET / => [ARRAY]===
   input: none
   input: none
Line 28: Line 46:
Example: find help information through the interface
Example: find help information through the interface
-
[http://rest.thegpm.org/2/interface/help http://rest.thegpm.org/2/interface/help]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/ http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/]
-
 
+
-
===GET /interface/version => [ARRAY]===
+
-
 
+
-
  input: none
+
-
return: [ARRAY - "string"] the version number of the REST interface
+
-
 
+
-
Example: find the current REST interface version number
+
-
 
+
-
[http://rest.thegpm.org/2/interface/version http://rest.thegpm.org/2/interface/version]
+
==/dna/ services==
==/dna/ services==
Line 50: Line 59:
Example: list PTM information linked to base 8925354 on Chromosome 1
Example: list PTM information linked to base 8925354 on Chromosome 1
-
[http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8925354 http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8925354]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354 http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354]
===GET /grch37/ensembl_70/dna/CHR/POS/mod=MOD => [{OBJECT1},{OBJECT2},...]===
===GET /grch37/ensembl_70/dna/CHR/POS/mod=MOD => [{OBJECT1},{OBJECT2},...]===
Line 59: Line 68:
Example: list acetylation modification information linked to base 8925354 on Chromosome 1
Example: list acetylation modification information linked to base 8925354 on Chromosome 1
-
[http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl http://rest.thegpm.org/2/g2pd/grch37/ensembl_70b/dna/1/8925354/mod=Acetyl]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl]
-
===GET /grch37/ensembl_70/dna/CHR/POS/snp=BASE => [{OBJECT1},{OBJECT2},...]===
+
===GET /grch37/ensembl_70/dna/CHR/POS/snp=BASE => [[{OBJECT1},{OBJECT2},...],[{OBJECT1},{OBJECT2},...]]===
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base  
   input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base  
-
  return: ARRAY of objects containing text description of modifications changed by the variant BASE
+
  return: ARRAY of objects containing text description of modifications linked to the canonical BASE<br/> at the given position, followed by ARRAY of objects containing text description of modifications<br /> changed by the variant BASE
-
Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1 (no change)
+
Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1 (no change) <br />
 +
Note: First and second ARRAYs are identical due to no change as a result of variant base
-
[http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A]
-
Example: list modification changes caused by the variant base C at position 8925354 on Chromosome 1 (removes modification)
+
Example: list modification changes caused by the variant base C at position 8925354 on Chromosome 1 (removes modification) <br />
 +
Note: Second ARRAY is empty due to removal of modification as a result of the variant base
-
[http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C]
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C]
==/protein/ services==
==/protein/ services==
Line 87: Line 98:
Example: retrieve the PTM mappings for ENSP00000234590
Example: retrieve the PTM mappings for ENSP00000234590
-
[http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/protein/ENSP00000234590 http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/protein/ENSP00000234590].
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590 http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590].
===GET /grch37/ensembl_70/protein/ACC/mod=MOD => [{OBJECT1},{OBJECT2},...]]===
===GET /grch37/ensembl_70/protein/ACC/mod=MOD => [{OBJECT1},{OBJECT2},...]]===
Line 96: Line 107:
Example: retrieve the ubiquitinylation mappings for ENSP00000234590
Example: retrieve the ubiquitinylation mappings for ENSP00000234590
-
[http://rest.thegpm.org/2/g2pdb/grch37/ensembl_70/protein/ENSP00000234590/mod=GlyGly http://rest.thegpm.org/2/grch37/ensembl_70/g2pdb/protein/ENSP00000234590/mod=GlyGly].
+
[http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590/mod=GlyGly http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/g2pdb/protein/ENSP00000234590/mod=GlyGly].
==Source code==
==Source code==
These services and for creating g2pDB were implemented in Python. The code is available at [ftp://ftp.thegpm.org/projects/g2pDB/human/code ftp://ftp.thegpm.org/projects/g2pDB/human/code].
These services and for creating g2pDB were implemented in Python. The code is available at [ftp://ftp.thegpm.org/projects/g2pDB/human/code ftp://ftp.thegpm.org/projects/g2pDB/human/code].
 +
==Data sources==
==Data sources==
-
The information used to construct g2pDB can be found at [ftp://ftp.thegpm.org/projects/g2pDB ftp://ftp.thegpm.org/projects/g2pDB]. The directories available are as follows:
+
The information used to construct g2pDB can be found at [ftp://ftp.thegpm.org/projects/g2pDB ftp://ftp.thegpm.org/projects/g2pDB]. The data directories are as follows:
#[ftp://ftp.thegpm.org/projects/g2pDB/human/ptm /projects/g2pDB/human/ptm] - contains the files mapping the three PTM types in proteome and genome coordinates;
#[ftp://ftp.thegpm.org/projects/g2pDB/human/ptm /projects/g2pDB/human/ptm] - contains the files mapping the three PTM types in proteome and genome coordinates;
-
#[ftp://ftp.thegpm.org/projects/g2pDB/human/gff /projects/g2pDB/human/gff] - contains the corresponding GFF3 genome feature annotation files;
+
#[ftp://ftp.thegpm.org/projects/g2pDB/human/gff /projects/g2pDB/human/gff] - contains the corresponding GFF3 genome feature annotation files; and
-
#[ftp://ftp.thegpm.org/projects/g2pDB/human/mondgodb /projects/g2pDB/human/mongodb] - the database files used by MongoDB; and
+
#[ftp://ftp.thegpm.org/projects/g2pDB/human/mongodb /projects/g2pDB/human/mongodb] - the database files used by MongoDB.
-
#[ftp://ftp.thegpm.org/projects/g2pDB/human/code /projects/g2pDB/human/code] - scripts used to create the database from the annotation files and generate the REST API.
+
==Comments and suggestions==
==Comments and suggestions==
Line 115: Line 126:
#2014.11.26 - first draft available
#2014.11.26 - first draft available
#2014.11.27 - revised URLs to include proteome and genome version information
#2014.11.27 - revised URLs to include proteome and genome version information
 +
#2015.03.04 - updated text with expanded discussion of g2pDB
 +
#2015.10.26 - updated text to better explain some methods
{| class="wikitable"
{| class="wikitable"
Line 123: Line 136:
|- align="center"
|- align="center"
| GPM-2014.11.27
| GPM-2014.11.27
-
| 2014.11.27
+
| 2015.10.26
| draft specification
| draft specification
| http://g2pdb.org/rest
| http://g2pdb.org/rest
|}
|}

Latest revision as of 17:53, 10 June 2016

GPMDB REST API channel 2

The purpose of this document is to define a set of straightforward REST (REpresentational State Transfer) services to commonly required information based on the data in the novel database g2pDB. g2pDB is a collection of auto-curated post-translational modification acceptor sites mapped to their associated genomic codons (in chromosome coordinates) stored in a MongoDB database. The current version has mappings for the most reproducible human STY-phosphorylation, K-ubiquitinylation and K-acetylation acceptor sites found using the data available in the GPMDB system. The details of the construction of the database and REST interface have been published (doi:10.1021/acs.jproteome.5b01018).

These services will provide simple, non-SQL methods to extract this information and make it available over the Internet to anyone wishing to use it as part of a network-aware application. The services are available to anyone involved in computational, biological or biomedical research. We, the providers, retain the right to block access by any user or system that we feel is using the information or infrastructure inappropriately.

The methods described here use the following base URLs:

  1. http://openslice.fenyolab.org/g2pdb; or
  2. http://g2pdb.thegpm.org/g2pdb.

Anyone interested in direct access to the production database of experimental results that was used to construct g2pDB should look at the GPMDB REST API.

Contents

Web site

A preliminary web site that shows the contents of g2pDB for a particular gene, transcript or protein is available at http://g2pdb.thegpm.org/search.

Conventions

The following statements refer to all methods and description given below:

  1. all GETs are made to the base URL, which is either:
    1. "http://openslice.fenyolab.org/g2pdb";
    2. or "http://g2pdb.thegpm.org/g2pdb".;
  2. all bases are expressed in single letter code (A, C, T, G);
  3. modifications allowed are "Phospho", "Acetyl" or "GlyGly" (ubiquitinyl);
  4. genome coordinates refer to Genome Reference Consortium Human genome builds (GRCh37 and GRCh38);
  5. protein accession numbers use ENSEMBL numbers for the appropriate build;
  6. all return values are in JSON notation (RFC 4627) as either an ARRAY or OBJECT; and
  7. protein post-translational modifications currently covered in g2pDB are S,T & Y phosphorylation (MOD=Phospho), lysine ubiquitinylation (MOD=GlyGly) and lysine acetylation (MOD=Acetyl).

Available sequence builds

The protein and genome sequence dependent information requires the specification of particular ENSEMBL and GRCh sequence assemblies. The assemblies currently available are as follows:

  1. /grch37/ensembl_70; and
  2. /grch38/ensembl_76.

/interface/ services

Interface services give the programmer access to information about the current state of the REST interface.

GET / => [ARRAY]

 input: none
return: [ARRAY - "string"] a text description of the current REST interface

Example: find help information through the interface

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/

/dna/ services

Dna services give access to PTM-linked information about particular genomic locations.

GET /grch37/ensembl_70/dna/CHR/POS => [{OBJECT1},{OBJECT2},...]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome 
return: ARRAY of objects containing text description of all modification linked to the specified base

Example: list PTM information linked to base 8925354 on Chromosome 1

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354

GET /grch37/ensembl_70/dna/CHR/POS/mod=MOD => [{OBJECT1},{OBJECT2},...]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, MOD = modification
return: ARRAY of objects containing text description of the specified modification linked to the specified base

Example: list acetylation modification information linked to base 8925354 on Chromosome 1

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8925354/mod=Acetyl


GET /grch37/ensembl_70/dna/CHR/POS/snp=BASE => [[{OBJECT1},{OBJECT2},...],[{OBJECT1},{OBJECT2},...]]

 input: CHR = human chromosome 1-22, MT, X, or Y), POS = base position on chromosome, BASE = variant base 
return: ARRAY of objects containing text description of modifications linked to the canonical BASE
at the given position, followed by ARRAY of objects containing text description of modifications
changed by the variant BASE

Example: list modification changes caused by the variant base A at position 8925354 on Chromosome 1 (no change)
Note: First and second ARRAYs are identical due to no change as a result of variant base

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=A

Example: list modification changes caused by the variant base C at position 8925354 on Chromosome 1 (removes modification)
Note: Second ARRAY is empty due to removal of modification as a result of the variant base

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/dna/1/8030998/snp=C

/protein/ services

Peptide services access to all of the genomic mapping information associated with PTMs for a particular protein.

GET /grch37/ensembl_70/protein/ACC => [{OBJECT1},{OBJECT2},...]

 input: ACC = ENSEMBL protein accession number
return: ARRAY of PTM mapping information objects

Example: retrieve the PTM mappings for ENSP00000234590

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/protein/ENSP00000234590.

GET /grch37/ensembl_70/protein/ACC/mod=MOD => [{OBJECT1},{OBJECT2},...]]

 input: ACC = ENSEMBL protein accession number, MOD = modification
return: ARRAY of PTM mapping information objects for MOD only

Example: retrieve the ubiquitinylation mappings for ENSP00000234590

http://openslice.fenyolab.org/g2pdb/grch37/ensembl_70/g2pdb/protein/ENSP00000234590/mod=GlyGly.

Source code

These services and for creating g2pDB were implemented in Python. The code is available at ftp://ftp.thegpm.org/projects/g2pDB/human/code.

Data sources

The information used to construct g2pDB can be found at ftp://ftp.thegpm.org/projects/g2pDB. The data directories are as follows:

  1. /projects/g2pDB/human/ptm - contains the files mapping the three PTM types in proteome and genome coordinates;
  2. /projects/g2pDB/human/gff - contains the corresponding GFF3 genome feature annotation files; and
  3. /projects/g2pDB/human/mongodb - the database files used by MongoDB.

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Revision date and status

  1. 2014.11.26 - first draft available
  2. 2014.11.27 - revised URLs to include proteome and genome version information
  3. 2015.03.04 - updated text with expanded discussion of g2pDB
  4. 2015.10.26 - updated text to better explain some methods
Reference name Revision date Document status Stable URL
GPM-2014.11.27 2015.10.26 draft specification http://g2pdb.org/rest
Personal tools