(2 intermediate revisions not shown) | |||
Line 204: | Line 204: | ||
An FTP site containing examples demonstrating how to create and validate JSMS files is located at [ftp://ftp.thegpm.org/projects/jsms ftp://ftp.thegpm.org/projects/jsms]. This folder contains software and demo files for the JSMS project. Small files are also host on [https://github.com/RonBeavis/JSMS GitHub]. All of the Python files (.py) are written in Python 3. | An FTP site containing examples demonstrating how to create and validate JSMS files is located at [ftp://ftp.thegpm.org/projects/jsms ftp://ftp.thegpm.org/projects/jsms]. This folder contains software and demo files for the JSMS project. Small files are also host on [https://github.com/RonBeavis/JSMS GitHub]. All of the Python files (.py) are written in Python 3. | ||
- | |||
- | |||
- | |||
- | |||
This folder contains software and demo files for the JSMS project. All of the Python files (.py) | This folder contains software and demo files for the JSMS project. All of the Python files (.py) | ||
Line 222: | Line 218: | ||
: To use, type: | : To use, type: | ||
- | : <code>>python | + | : <code>>python jsms_from_mzml.py MZML_FILE_NAME JSMS_FILE_NAME</code> |
: If the JSMS_FILE_NAME ends with ".gz", a gzip compressed file will be output, otherwise a plain text file | : If the JSMS_FILE_NAME ends with ".gz", a gzip compressed file will be output, otherwise a plain text file | ||
+ | |||
+ | jsms_from_bioml.py - creates a JSMS formatted file from a BIOML formatted file (X! Tandem's default output) | ||
+ | : To use, type: | ||
+ | : <code>>python jsms_from_bioml.py bioml_FILE_NAME JSMS_FILE_NAME</code> | ||
+ | |||
+ | : If the JSMS_FILE_NAME ends with ".gz", a gzip compressed file will be output, otherwise a plain text file | ||
+ | |||
jsms_validator.py - validates an existing JSMS formatted file | jsms_validator.py - validates an existing JSMS formatted file | ||
: To use, type: | : To use, type: | ||
Line 254: | Line 257: | ||
==Date and Revision Level== | ==Date and Revision Level== | ||
- | This document was last revised on 2019- | + | This document was last revised on 2019-01-01T17:53:00Z. |
The current JSMS version is "jsms 1.0". | The current JSMS version is "jsms 1.0". |
JSMS (the JSON MS/MS file format) is a proposed JSON Lines format alternative to the commonly used MGF file format to exchange MS/MS information in proteomics. JSMS uses the widely supported Java Script Object Notation to simplify parsing of the MS/MS information. It also provides for straightforward extensions of the file format without requiring changes in parsers that are only interested in the data.
Contents |
The MGF file format (Mascot Generic Format) was introduced by Matrix Science Inc. in the 1990's are an alternative to the then widely-used DTA format. DTA files each contain data from a single LC-MS/MS scan, so an MS/MS run would result in many DTA files. MGF allowed for the inclusion of any number of scans into a single file, making it a simpler alternative for sending all of the data from an LC-MS/MS run across a network for analysis. Both DTA and MGF file formats resulted in simple, structured text ASCII files that could be easily read using a text editor.
An example of a simple MGF file for a single MS/MS spectrum is shown here:
BEGIN IONS
PEPMASS=413.2661
CHARGE=1+
TITLE=MS/MS scan
189.48956 1.9
283.62076 3.4
301.22977 66.3
311.08008 1.3
399.99106 2.3
END IONS
The simplicity of this format has led to its widespread use in proteomics informatics projects.
The purpose of JSMS is to modernize this format and introduce the benefits of using JSON for simplified parser and generator development. Extending the language is a simple matter of either adding application-specific objects or additional keys to the three types of JSMS-specific objects, taking care not to redefine or reuse the language's reserved key names.
JSMS is:
JSMS is NOT:
A JSMS example that records the information present in the MGF presented above is shown below. It contains the three types of objects defined in JSMS:
{"format": "jsms 1.0", "source": "test.mgf", "created": "2019-02-24 13:16:33.306856"}
{"lv": 2, "pm": 413.2661, "pz": 1, "ti": "MS/MS scan", "sc": 1, "np": 5, "ms": [189.48956, 283.62076, 301.22977, 311.08008, 399.99106], "is": [1.9, 3.4, 66.3, 1.3, 2.3]}
{"validation": "sha256", "value": "42c2b93928c7d4306aa2f4fc6c817efcdb3cbdc4b308b73985bbf28a9cf7604f"}
Each line is a separate JSON object, which can be parsed without reference to the other lines. This feature of JSON Lines is important practically, because MS/MS data can be composed of millions of individual spectra. Such large objects can cause problems with some JSON parsers and the resulting data objects can be cumbersome.
The order of objects/lines in a file has no meaning. Any ordering information must be contained within the objects.
The type column values in the tables below have the following meanings:
"MS/MS scan"
);
123.12
);
In the following tables, a key can either be required (R) or optional (O). Both types of keys are reserved for use by JSMS only: extensions to the language may not use those keys for any purpose.
Only a single key:value pair is required for this type of object. Only one "format" object can be present in a file.
key name | type | R/O | value | description |
---|---|---|---|---|
"format" | string | R | "jsms v 1.0" | identifies the version of JSMS used for the file |
This type of object contains the MS/MS data equivalent to an MGF spectrum: the information between the BEGIN IONS and END IONS lines. Any number of these objects may be present in a valid JSMS file.
key name | type | R/O | value | description |
---|---|---|---|---|
"lv" | float | R | MS/MS level | MS/MS data = 2, MS/MS/MS = 3, MS/MS/MS/MS = 4, etc. |
"pm" | float | R | PEPMASS | parent ion m/z |
"pz" | float | O | CHARGE | parent ion z |
"pi" | float | O | intensity | parent ion intensity for quantitation |
"qs" | [float] | O | intensity | array of quantitation values |
"ti" | string | O | TITLE | description of spectrum |
"sc" | float | O | scan number | the instrument's scan number for the spectrum |
"rt" | float | O | seconds | a chromatographic retention time |
"np" | float | R | array dimension | the length of the "ms", "is" and "zs" arrays |
"ms" | [float] | R | m/z | array of measured fragment ion m/z values |
"is" | [float] | R | intensity | array of measured fragment ion intensity |
"zs" | [float] | O | charge | array of measured fragment ion charges |
This type of object contains overall file validation information. Only one of these objects may be present in a valid JSMS file.
key name | type | R/O | value | description |
---|---|---|---|---|
"validation" | string | R | hash type | currently only SHA-256 |
"value" | string | R | hash | hexadecimal representation of the file's hash |
The hash value for a file is calculated as follows:
Any number of other objects can be added to a JSMS file, so long as they conform to the JSON Lines idea (one line per complete JSON object) and they do not use the keys used by JSMS. These new objects must be included in the hash value calculation.
Any number of additional key/value pairs may be added to any of the JSMS objects described above, so long as the final objects result a in valid JSON Lines format.
An FTP site containing examples demonstrating how to create and validate JSMS files is located at ftp://ftp.thegpm.org/projects/jsms. This folder contains software and demo files for the JSMS project. Small files are also host on GitHub. All of the Python files (.py) are written in Python 3.
This folder contains software and demo files for the JSMS project. All of the Python files (.py) are written in Python 3.
jsms_from_mgf.py - creates a JSMS formatted file from an MGF formatted file
>python jsms_from_mgf.py MGF_FILE_NAME JSMS_FILE_NAME
jsms_from_mzml.py - creates a JSMS formatted file from an mzML formatted file
>python jsms_from_mzml.py MZML_FILE_NAME JSMS_FILE_NAME
jsms_from_bioml.py - creates a JSMS formatted file from a BIOML formatted file (X! Tandem's default output)
>python jsms_from_bioml.py bioml_FILE_NAME JSMS_FILE_NAME
jsms_validator.py - validates an existing JSMS formatted file
>python jsms_validator.py JSMS_FILE_NAME
jsms_min_parser.py - parses an existing JSMS formatted file
>python jsms_min_parser.py JSMS_FILE_NAME
little_test.mgf - a one spectrum MGF file for testing
msconvert_test.mgf - a multiple spectra MGF file for testing
msconvert_test.mzml - a multiple spectra mzML file for testing
little_test.mgf.jsms - little_test.mgf converted to JSMS using jsms_from_mgf.py
msconvert_test.mgf.jsms - msconvert_test.mgf converted to JSMS using jsms_from_mgf.py
msconvert_test.mzml.jsms - msconvert_test.mgf converted to JSMS using jsms_from_mzml.py
If you are interested in helping with the development and use of JSMS, please contact the project coordinator Ron Beavis.
This document was last revised on 2019-01-01T17:53:00Z.
The current JSMS version is "jsms 1.0".