FTP site layout for the cHPP

From TheGPMWiki
(Difference between revisions)
Jump to: navigation, search
m (Protected "FTP site layout for the cHPP" ([edit=sysop] (indefinite) [move=sysop] (indefinite)))

Revision as of 21:01, 23 February 2012

This document is a request for comment on a proposed FTP site layout for the chromosome-based Human Proteome Project, cHPP. The RFC process began on February 23, 2011 and will end on March 24, 2012.

There have been a number of efforts to construct novel repositories for proteomics data and documents over the course of the last decade. This RFC does not seek to duplicate those efforts. Instead, it is a proposal for how to construct standard, navigable FTP sites that can be used for short- and long-term information storage and exchange by participants in the cHPP and other interested parties.

Contents

Required folders

Project base folder

The base folder for any particular project will be:

 (1.1) chrNn

where Nn is the number of the human chromosome associated with the project, padded with a leading zero so that Nn is two characters long. For example, for chromosome 1 (chr01) and chromosome 20 (chr20). For chromosomes with no number, the following special names will apply:

  1. chrX - X chromosome;
  2. chrY - Y chromosome;
  3. chrMT - mitochondrial chromosome; and
  4. chrOther - all information that cannot be readily assigned to a chromosome.

This folder should contain a file named "contact_info.txt" that has the name and email address of a suitable contact person for this project folder.

Other base folders

The site may contain any other self-descriptive folder names necessary for a particular site's purpose. It is advisable to have a folder containing any licensing information and/or disclosure information named "legal". A terms-of-use file in the root directory of the site is also advisable, but not required.

First level folders

The following self-descriptive folders will be contained within the base folder:

  1. chrNn/data - will contain all raw and processed data files;
  2. chrNn/documents - will contain all text documents and associated files; and
  3. chrNn/presentations - will contain all presentations and associated files.

Second level folders

Each of the first level folders should contain folders named after the groups associated with the project. For example, if the chromosome Nn project is using data from the Bob, Ted and Carol labs, the "chrNn/data" folder should contain:

  1. chrNn/data/bob
  2. chrNn/data/ted
  3. chrNn/data/carol

Open data licensing

To assure that all users of a compliant FTP are informed of the status of the material on the site, it is suggested that the "Public Domain Dedication and License v1.0" (described here) or another appropriate license be declared, with the full text of the license placed in the "legal" base folder. If you are interest in why such an agreement should be present, please consult open source/data advocacy sites, such as the Open Knowledge Foundation.

Demonstration site

For the purposes of demonstrating by example, an FTP site has been established at ftp.proteomecentral.org using the principles described in this document. This site will be modified to stay in compliance with this RFA as changes are suggested.

Any cHPP project wishing access to this site for data storage, please send an email to contact@thegpm.org.

Comments and suggestions

Any one interested in making suggestions or commenting on the ideas in this document should send them by email to Ron Beavis, rbeavis@thegpm.org.

Revision date and status

Reference name Revision date Document status Stable URL
GPM-2012.02.23 2012.02.23 in development http://icex.ca/ice.0.d
Personal tools