(Created page with 'The contents of the files viewable through http://gpmdb.thegpm.org/go/index.html can be both refreshed and updated on a [[Local_GPM_Installation_Instructions|standard installati…') |
m (Protected "Updating Ontology Collections" ([edit=sysop] (indefinite) [move=sysop] (indefinite))) |
||
(2 intermediate revisions not shown) | |||
Line 16: | Line 16: | ||
==Scripts== | ==Scripts== | ||
- | The scripts that update the various ontologies are available in | + | The scripts that update the various ontologies are available in <tt>/thegpm/scripts/ont_builder/</tt>. |
- | + | Some manual editing of the scripts must currently be done to either update or refresh the gene ontology pages. | |
- | of the scripts must currently be done to either update or refresh the gene ontology pages. | + | |
The scripts that perform the refreshing and updating work are: | The scripts that perform the refreshing and updating work are: | ||
Line 39: | Line 38: | ||
Download a tab-delimited file for any of the species to be updated. The | Download a tab-delimited file for any of the species to be updated. The | ||
column order for the data files for updating the GO category displays is as follows: | column order for the data files for updating the GO category displays is as follows: | ||
- | *Ensembl Protein | + | *Ensembl Protein ID |
- | *GO | + | *GO Term Accession (bp) |
- | *GO | + | *GO Term Accession (cc) |
- | *GO | + | *GO Term Accession (mf) |
The GO Group fields can be in any order, but the protein identifier must be | The GO Group fields can be in any order, but the protein identifier must be | ||
Line 51: | Line 50: | ||
column order for the data files for updating the chromosome displays is as follows: | column order for the data files for updating the chromosome displays is as follows: | ||
*Chromosome Name | *Chromosome Name | ||
- | * | + | *Gene start (bp) |
- | *Ensembl Protein | + | *Ensembl Protein ID |
Any subsequent fields will be ignored. | Any subsequent fields will be ignored. | ||
Line 66: | Line 65: | ||
===Chromosomes=== | ===Chromosomes=== | ||
- | ==== | + | ====Human==== |
+ | #Edit the script '''ont_builder_update_chromosome_proteins.pl'''. The value of the substitution found below the line marked "EDIT BELOW" must be ENSP. | ||
+ | #Run the script as <tt>perl ont_builder_update_chromosome_proteins.pl <nowiki>[source file]</nowiki> HS</tt> where <nowiki>[source file]</nowiki> is the path and name of the tab-delimited file from ENSEMBL. | ||
- | ==== | + | ====Mouse==== |
- | + | #Edit the script '''ont_builder_update_chromosome_proteins.pl'''. The value of the substitution found below the line marked "EDIT BELOW" must be ENSMUSP. | |
- | + | #Run the script as <tt>perl ont_builder_update_chromosome_proteins.pl <nowiki>[source file]</nowiki> MM</tt> where <nowiki>[source file]</nowiki> is the path and name of the tab-delimited file from ENSEMBL. | |
===GO categories=== | ===GO categories=== | ||
- | ==== | + | ====Human==== |
+ | #Run the script <tt>ont_builder_update_go_proteins.pl <nowiki>[source file]</nowiki> HS</tt> where <nowiki>[source file]</nowiki> is the path and name of the tab-delimited file from ENSEMBL. | ||
- | ==== | + | ====Mouse==== |
+ | #Run the script <tt>ont_builder_update_go_proteins.pl <nowiki>[source file]</nowiki> MM</tt> where <nowiki>[source file]</nowiki> is the path and name of the tab-delimited file from ENSEMBL. | ||
- | ==== | + | ====Yeast==== |
- | + | #Run the script <tt>ont_builder_update_go_proteins.pl <nowiki>[source file]</nowiki> SC</tt> where <nowiki>[source file]</nowiki> is the path and name of the tab-delimited file from ENSEMBL. | |
- | + | ||
==Refreshing ontology content== | ==Refreshing ontology content== | ||
Line 89: | Line 91: | ||
===Chromosomes=== | ===Chromosomes=== | ||
- | ==== | + | ====Human==== |
+ | #'''ont_builder_batch.pl''' | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/'''</tt>. | ||
+ | #'''ont_builder.pl'''. | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/'''</tt>. | ||
+ | ##Find the string "EDIT BELOW". Comment out the following if() statement; this will force all proteins to end up in the output files, not just proteins with identifications. | ||
+ | #Run the script <tt>ont_builder_batch.pl HS HTML</tt>. This will create new versions of all three file types (.html, .txt and .xls). Running with the second argument as either 'txt' or 'xls' will generate new versions of only the corresponding file type. | ||
- | ==== | + | ====Mouse==== |
- | + | #'''ont_builder_batch.pl''' | |
- | + | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/'''</tt>. | |
+ | #'''ont_builder.pl'''. | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/'''</tt>. | ||
+ | ##Find the string "EDIT BELOW". Comment out the following if() statement; this will force all proteins to end up in the output files, not just proteins with identifications. | ||
+ | #Run the script <tt>ont_builder_batch.pl MM HTML</tt>. This will create new versions of all three file types (.html, .txt and .xls). Running with the second argument as either 'txt' or 'xls' will generate new versions of only the corresponding file type. | ||
===GO categories=== | ===GO categories=== | ||
- | ==== | + | ====Human==== |
- | + | #'''ont_builder_batch.pl''' | |
- | + | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/human_go/'''</tt>. | |
+ | #'''ont_builder.pl'''. | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/human_go/'''</tt>. | ||
+ | ##Find the string "EDIT BELOW". Restore ("de-comment") the following if() statement; this will result in only identified proteins being present in the display files. | ||
+ | #Run the script <tt>ont_builder_batch.pl GO HTML</tt>. This will create new versions of all three file types (.html, .txt and .xls). Running with the second argument as either 'txt' or 'xls' will generate new versions of only the corresponding file type. | ||
- | ==== | + | ====Mouse==== |
+ | #'''ont_builder_batch.pl''' | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/mouse_go/'''</tt>. | ||
+ | #'''ont_builder.pl'''. | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/mouse_go/'''</tt>. | ||
+ | ##Find the string "EDIT BELOW". Restore ("de-comment") the following if() statement; this will result in only identified proteins being present in the display files. | ||
+ | #Run the script <tt>ont_builder_batch.pl GO HTML</tt>. This will create new versions of all three file types (.html, .txt and .xls). Running with the second argument as either 'txt' or 'xls' will generate new versions of only the corresponding file type. | ||
- | ==== | + | ====Yeast==== |
+ | #'''ont_builder_batch.pl''' | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/yeast_go/'''</tt>. | ||
+ | #'''ont_builder.pl'''. | ||
+ | ##Ensure the value of the global scalar $CONFIG_FILE_LOCATION is <tt>'''drive://thegpm/scripts/ont_builder/yeast_go/'''</tt>. | ||
+ | ##Find the string "EDIT BELOW". Restore ("de-comment") the following if() statement; this will result in only identified proteins being present in the display files. | ||
+ | #Run the script <tt>ont_builder_batch.pl GO HTML</tt>. This will create new versions of all three file types (.html, .txt and .xls). Running with the second argument as either 'txt' or 'xls' will generate new versions of only the corresponding file type. |
The contents of the files viewable through http://gpmdb.thegpm.org/go/index.html can be both refreshed and updated on a standard installation of the GPM. Below is a set of directions on how to refresh the existing files, how to obtain new data and how to update ontologies with the new proteins.
The display information is stored in /thegpm/go/, and any refreshing of the information will result in new versions of the files available in that directory being created.
The source files for the lists of proteins in a given ontology are stored in /thegpm/scripts/ont_builder/ and three subfolders:
Contents |
The scripts that update the various ontologies are available in /thegpm/scripts/ont_builder/. Some manual editing of the scripts must currently be done to either update or refresh the gene ontology pages.
The scripts that perform the refreshing and updating work are:
Data files for both chromosome and GO ontologies are downloaded from BioMart on the ENSEMBL website. Use the most recent Ensembl release, and then choose the appropriate species below.
Data files for the BTO ontologies are generated separately. The source protein lists are generated periodically through the Normal Clinical Tissue Alliance and updated at that time.
Download a tab-delimited file for any of the species to be updated. The column order for the data files for updating the GO category displays is as follows:
The GO Group fields can be in any order, but the protein identifier must be first. Any subsequent fields will be ignored.
Download a tab-delimited file for any of the species to be updated. The column order for the data files for updating the chromosome displays is as follows:
Any subsequent fields will be ignored.
Updating the list of proteins associated with an ontology will modify the source XML files from which the display files are built. This step must be done first if there is new information available from Ensembl (e.g., after a new data release). However, if refreshing the best expect and identification counts for each protein in an ontology is all that is required, this step can be skipped.
Refreshing the content of an ontology will recalculate the identification counts and best expect scores for each of the proteins defined as belonging to the ontology in the source XML files.