Nomenclature for the use of gene symbols

From TheGPMWiki
Revision as of 20:08, 2 September 2012 by WikiSysop (Talk | contribs)
Jump to: navigation, search

This document is a request for comment on a proposed new notation for the use of gene symbols to describe polynucleotide and polypeptide sequences. The RFC process began on Sept 1, 2012 and will end on Sept 14, 2012. The RFC, if successful, will be adopted by GPM and GPMDB.

Contents

Introduction

Gene symbols, such as those produced by the Human Genome Naming Commission, have been commonly used as identifiers for a locus on genomic DNA. These symbols are often descriptive and they are currently frequently used as informal descriptors for all of the macromolecules that have been produced from the information in a particular locus, e.g. messenger RNA or proteins. In the context of the GPM and GPMDB, it is important in the associated displays and annotation to be clear about the specific type of macromolecule being referenced by a gene symbol. Existing methods for increasing the specificity of gene symbols can be difficult to represent in databases, such as using italic letters or mixtures of upper and lower case letters to refer to specific macromolecules.

Nomenclature

The proposed notation for referring to specific types of macromolecules follows the same conventions laid out by Human Genome Variation Society in their notation for the consequences of gene sequence mutations. The gene symbol will be written as usual, followed by a colon (":") followed by one of these symbols for the macromolecule being represented:

  1. c – for cDNAs;
  2. g – for genomic DNAs;
  3. r – for mRNAs; and
  4. p – for proteins.

For example, to refer to molecules derived from the APOB locus:

  1. APOB:c – any cDNA sequence derived for APOB;
  2. APOB:g – the APOB DNA sequence locus;
  3. APOB:r – any transcript messenger RNA derived from APOB; and
  4. APOB:p – any protein sequence derived from APOB.

Notes

This notation is to be used for general references to the associated macromolecules only. An specific reference that requires sequence information, such as the identification of alternate splice variants or residue locations, will be done using the appropriate sequence accession numbers. If desirable, this notation may be used to clarify what type of molecule an accession number refers to in a particular context.

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Revision date and status

Reference name Revision date Document status Stable URL
GPM-2012.09.01 2012.09.01 In progress http://wiki.thegpm.org/wiki/index.php?title=Nomenclature_for_the_use_of_gene_symbols
Personal tools