FTP site layout for the cHPP

From TheGPMWiki
Jump to: navigation, search

This document is a request for comment on a proposed FTP site layout for the chromosome-based Human Proteome Project, cHPP. The RFC process began on February 23, 2011 and ended on March 24, 2012. This RFA has been adopted by the GPM and GPMDB.

There have been a number of efforts to construct novel repositories for proteomics data and documents over the course of the last decade. This RFC does not seek to duplicate those efforts. Instead, it is a proposal for how to construct standard, navigable FTP sites that can be used for short- and long-term information storage and exchange by participants in the cHPP and other interested parties.


Required folders

Project base folder

The base folder for any particular project will be:

 (1.1) chrNn

where Nn is the number of the human chromosome associated with the project, padded with a leading zero so that Nn is two characters long. For example, the base folder for chromosome 1 would be "chr01" and for chromosome 20 it would be "chr20". For chromosomes with no number, the following special names will apply:

  1. chrX - X chromosome;
  2. chrY - Y chromosome;
  3. chrMT - mitochondrial chromosome; and
  4. chrOther - all information that cannot be readily assigned to a chromosome.

This folder should contain a file named "chrNn/contact_info.txt" that has the name and email address of a suitable contact person for this project folder.

Other base folders

The site may contain any other self-descriptive folder names necessary for a particular site's purpose. It is advisable to have a folder containing any licensing information and/or disclosure information named "legal". A terms-of-use file in the root directory of the site is also advisable, but not required.

First level folders

The following self-descriptive folders will be contained within the base folder:

  1. chrNn/annotation - sequence annotation files;
  2. chrNn/data - raw and processed data files;
  3. chrNn/documents - text documents and associated files;
  4. chrNn/libraries - any spectrum or peptide libraries;
  5. chrNn/presentations - presentations and associated files;
  6. chrNn/results - information derived from data;
  7. chrNn/sequences - FASTA and other protein sequence files; and
  8. chrNn/temp - scratch folder for temporary files.

Second level folders

Each of the first level folders should contain folders named after the groups associated with the project. For example, if the chromosome Nn project is using data from the Bob, Ted and Carol labs, the "chrNn/data" folder should contain:

  1. chrNn/data/bob - data from the Bob lab;
  2. chrNn/data/ted - data from the Ted lab; and
  3. chrNn/data/carol - data from the Carol lab.

General naming conventions

The following are suggestions for naming additional folders and files on the FTP site.

  1. No spaces: use underscore characters rather than spaces. For example, rather than "my file.txt" use "my_file.txt".
  2. Use long names: use as few abbreviated filenames as possible. Try to be concise, but describe the contents of a file or folder in its name with sufficient clarity that both readers and depositors will understand what is contained with minimal explanation.

Open data licensing

To assure that all users of a compliant FTP are informed of the status of the material on the site, it is suggested that the "HUPO C-HPP Open Data Licence v1.0" (full text here) or another appropriate license be declared, with the full text of the license placed in the "legal" base folder. If you are interest in why such an agreement should be present, please consult open source/data advocacy sites, such as the Open Knowledge Foundation.

Demonstration site

For the purposes of demonstrating by example, an FTP site has been established at ftp.proteomecentral.org using the principles described in this document. This site will be modified to stay in compliance with this RFA as changes are suggested.

To start your own site:

  1. go to the templates directory;
  2. copy the folder chrNn onto your FTP site; and
  3. rename that folder, using the guideline above for base folders.

Any cHPP project wishing access to this site for data storage, please send an email to contact@thegpm.org.

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Revision date and status

Reference name Revision date Document status Stable URL
GPM-2012.02.23 2012.03.24 Accepted specification http://icex.ca/ice.0.d
Personal tools