Cyclone

Views

From TheGPMWiki

1 Purpose
2 Methods: search engines
3 Methods: GPM
4 Methods: Hardware
5 History

Purpose

The GPM Cyclone project was the successor to the Tornado project, to improve the overall performance of the X! Tandem search engine and GPM interface.

Methods: search engines

The initial stage of the project was a full code review of the search engines X! Tandem and P3, checking the appropriateness of the storage types used for each variable in the portions of the code that result in the majority of memory usage with large data sets. The information storage classes for proteins, peptides and amino acids were modified and rationalized to obtain the same functionality as before, but with a minimum memory requirement.

The second stage of the project involved changes in the code for X! Tandem and P3 to better utilize computational resources outside of the conventional processor registers. This included better coding for using SIMD registers, GPU cores and reducing memory page faults. The coding for GPU cores is not currently part of the general release of the search engines because of the platform compatibility issues still common when using GPU programming methods.

Methods: GPM

The primary tactic used to speed up the GPM display rendering process has been the introduction of specialized cache files created by the display software the first time that a display is created. All of the GPM displays are generated on-the-fly from either a data file that has been generated by a search engine or by accessing databases. Most GPM pages require data from multiple servers, which may be either within the GPM system or external to it.

The new file caching system is used for displays that are primarily created from search engine files (BIOML format). When the display software first tries to load information from a file it creates an abstract of the details needed to create the particular display and records that abstracted information into a cache file that is named according to the data file that is being examined and the specific display being created. The next time that a user calls up the same display from that data file, only the cache file is loaded, reducing the time necessary to extract information from what can be very large XML data files.

The design philosophy behind the creation of the cache files was to abstract all of the necessary information to create a view, rather than simply storing the view itself. Many of the displays in GPM can be customized by the user, so storing simply the view could result in a prohibitively complicated set of cache files. Also, the GPM was designed to store as little as possible view information on the server-side: each display is determined uniquely by the CGI request and does not depend on some state stored on the server.

Methods: Hardware

There is only so much additional speed that can be obtained from coding: at some point the hardware will be the limiting factor. The public GPM search servers have all be upgraded to 64-bit AMD Phenom II X6 1090T processors with 8 GB of 1333 MHz memory. The off-line search servers, used to analyze data obtained from TRANCHE, PeptideAtlas and PRIDE, have been upgraded to Intel i7-2600 processors.

The GPMDB servers now use Solid State Drives (SSDs) for database files, significantly improving their performance. Our testing showed that faster processors resulted in only marginal database speed improvement, as the limiting factor was disk performance for most large queries. Replacing the disks with PCIe-based SSDs resulted in as much as a 10 fold improvement in retrieving information, removing a significant performance bottleneck in our curation and data availability efforts.

Views

Contents

Purpose

Methods: search engines

Methods: GPM

Methods: Hardware

History

Navigation

Search

Toolbox

Personal tools