DSEARCH
Sensitive database searching using distributed computing
Database searching is one of the most fundamental tasks in bioinformatics. The goal of database searching is to identify similar regions in DNA, RNA, or protein sequences. The Needleman-Wunsch and Smith-Waterman alignment algorithms have been widely acknowledged as being the most accurate search techniques. However these algorithms have a high space and time complexity, O(nm), where n and m are lengths of the sequences being compared, meaning that for large databases of sequences it not feasible to perform all searches using only a single processor. We have developed a distributed database search application, called DSEARCH, which implements these algorithms. Using our distributed computing platform, DSEARCH allows the user to distribute the task of searching larger databases of sequences over a set of semi-idle processors. We have completed a full performance analysis that demonstrates the potential of DSEARCH to speedup long search computations. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how DSEARCH can deliver a 'free' database search supercomputer.
Fig. 1: Sample alignment output produced by DSEARCH using Blosum62 scoring matrix
DSEARCH is freely available under the terms of the GNU General Public Licence. As mentioned above, the application is entirely implemented in Java and is completely platform independent. DSEARCH uses the NeoBio Java alignment library to perform the alignment of sequences.
Downloads
Recent Publications
Distributed Monte Carlo Simulation of Light Transportation in TissueApril 2006
This paper is to appear in at 8th International Workshop on Java for Parallel and Distributed Computing.
Framework for task scheduling in heterogeneous distributed computing using genetic algorithmsNovember 2005
This journal paper is to appear in Artificial Intelligence Review. It describes a distributed task scheduling scheme based on genetic algorithms.
DPRml: Distributed Phylogeny Reconstruction by Maximum LikelihoodMarch 2005
This journal paper appeared in Bioinformatics. It describes a distributed phylogeny application.
DSEARCH: sensitive database searching using distributed computingMarch 2005
A distributed bioinformatics database searching application that uses the most accurate search algorithms. It is to appear in Bioinformatics.
Adaptive Scheduling Across a Distributed Computation PlatformJune 2004
This paper was presented at ISPDC'04, and describes an adaptive scheduler for a Java distributed computation system.