Jsms

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
JSMS is a proposed [http://jsonlines.org/ JSON Lines] file format alternative to the commonly used MGF file format to exchange MS/MS information in proteomics. JSMS uses the widely supported Java Script Object Notation to simplify parsing of the MS/MS information. It also provides for straightforward extensions of the file format without requiring changes in parsers that are only interested in the data.
+
JSMS (the JSON MS file format) is a proposed [http://jsonlines.org/ JSON Lines] format alternative to the commonly used MGF file format to exchange MS/MS information in proteomics. JSMS uses the widely supported [https://www.json.org/ Java Script Object Notation] to simplify parsing of the MS/MS information. It also provides for straightforward extensions of the file format without requiring changes in parsers that are only interested in the data.
==Introduction==
==Introduction==
Line 19: Line 19:
The simplicity of this format has led to its widespread use in proteomics informatics projects.
The simplicity of this format has led to its widespread use in proteomics informatics projects.
 +
 +
The purpose of JSMS is to modernize this format and introduce the benefits of using [https://dzone.com/articles/why-you-cloud-be-using-json-vs-xml JSON] for lightweight data transfer.
 +
 +
JSMS is:
 +
 +
#UTF-8 compatible, by definition. There are no problems associated with using non-ASCII characters, which can be useful when recording scientific symbols or non-English information.
 +
#Easily extensible. Additional lines of JSON objects can be used to record application-specific information and additional keys can be added to each spectrum object, ''i.e.'', spectrum-specific data handling directives.
 +
#Easy to validate on-the-fly. JSMS files are required to contain a hash value that can be used to determine if the file has been correctly transmitted across a network, prior to use.
 +
#Easy to parse. Most programming languages provide simple mechanisms to unpack the data in a JSON object: no file-specific code is required.
 +
 +
JSMS is NOT:
 +
 +
#A replacement for existing XML file formats. mzXML and mzML are both commonly used to archive the MS/MS information from vendor-specific raw data files. Files in these formats serve as a source of information for the creation of a JSMS file.
 +
#An archival file format. JSMS is for rapid, simple-to-do data exchange between software APIs.

Revision as of 20:42, 24 February 2019

JSMS (the JSON MS file format) is a proposed JSON Lines format alternative to the commonly used MGF file format to exchange MS/MS information in proteomics. JSMS uses the widely supported Java Script Object Notation to simplify parsing of the MS/MS information. It also provides for straightforward extensions of the file format without requiring changes in parsers that are only interested in the data.

Introduction

The MGF file format (Mascot Generic Format) was introduced by Matrix Science Inc. in the 1990's are an alternative to the then widely-used DTA format. DTA files each contain data from a single LC-MS/MS scan, so an MS/MS run would result in many DTA files. MGF allowed for the inclusion of any number of scans into a single file, making it a simpler alternative for sending all of the data from an LC-MS/MS run across a network for analysis. Both DTA and MGF file formats resulted in simple, structured text ASCII files that could be easily read using a text editor.

An example of a simple MGF file for a single MS/MS spectrum is shown here:

BEGIN IONS
PEPMASS=413.2661
CHARGE=1+
TITLE=MS/MS scan
189.48956 1.9
283.62076 3.4
301.22977 66.3
311.08008 1.3
399.99106 2.3
END IONS

The simplicity of this format has led to its widespread use in proteomics informatics projects.

The purpose of JSMS is to modernize this format and introduce the benefits of using JSON for lightweight data transfer.

JSMS is:

  1. UTF-8 compatible, by definition. There are no problems associated with using non-ASCII characters, which can be useful when recording scientific symbols or non-English information.
  2. Easily extensible. Additional lines of JSON objects can be used to record application-specific information and additional keys can be added to each spectrum object, i.e., spectrum-specific data handling directives.
  3. Easy to validate on-the-fly. JSMS files are required to contain a hash value that can be used to determine if the file has been correctly transmitted across a network, prior to use.
  4. Easy to parse. Most programming languages provide simple mechanisms to unpack the data in a JSON object: no file-specific code is required.

JSMS is NOT:

  1. A replacement for existing XML file formats. mzXML and mzML are both commonly used to archive the MS/MS information from vendor-specific raw data files. Files in these formats serve as a source of information for the creation of a JSMS file.
  2. An archival file format. JSMS is for rapid, simple-to-do data exchange between software APIs.
Personal tools