MPI-Blast is a program from LANL which parallelizes the NCBI Blast algorithms using Message Passing Interface library. The version of MPI-Blast included with Rocks is v1.5.0-pio patched and compiled against NCBI Blast 2.2.19.
MPI-Blast is used in a similar manner to NCBI-Blast. MPI-Blast uses the same variables that are available for NCBI-Blast.
There are 3 steps to running MPI-Blast.
Download a FASTA database to $BLASTDB. For this example we will download the ecoli nucleotide database.
[nostromo@xxx ~]$ cd $BLASTDB [nostromo@xxx ~]$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ecoli.nt.gz --17:06:23-- ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ecoli.nt.gz => `ecoli.nt.gz' Resolving ftp.ncbi.nlm.nih.gov... 165.112.7.10 Connecting to ftp.ncbi.nlm.nih.gov|165.112.7.10|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /blast/db/FASTA ... done. ==> PASV ... done. ==> RETR ecoli.nt.gz ... done. Length: 1,438,199 (1.4M) (unauthoritative) 100%[========================================================>] 1,438,199 610.14K/s 17:06:27 (607.91 KB/s) - `ecoli.nt.gz' saved [1438199] |
Format the database using mpiformatdb as follows. A good rule is to format the database to atleast 4 processors, as follows.
[nostromo@xxx ~]$ gunzip ecoli.nt.gz [nostromo@xxx ~]$ ls ecoli.nt [nostromo@xxx ~]$ mpiformatdb --nfrags=4 -i ecoli.nt -pF --quiet Reading input file Done, read 58882 lines Reordering 400 sequence entries Breaking ecoli.nt into 4 fragments Executing: formatdb -p F -i /tmp/reorderncq8B1 -N 4 -n /home/nostromo/bio/ncbi/db/ecoli.nt -o T Removed /tmp/reorderncq8B1 Created 4 fragments. [nostromo@xxx ~]$ ls ecoli.nt ecoli.nt.000.nsq ecoli.nt.001.nsq ecoli.nt.002.nsq ecoli.nt.003.nsq ecoli.nt.000.nhr ecoli.nt.001.nhr ecoli.nt.002.nhr ecoli.nt.003.nhr ecoli.nt.mbf ecoli.nt.000.nin ecoli.nt.001.nin ecoli.nt.002.nin ecoli.nt.003.nin ecoli.nt.nal ecoli.nt.000.nnd ecoli.nt.001.nnd ecoli.nt.002.nnd ecoli.nt.003.nnd formatdb.log ecoli.nt.000.nni ecoli.nt.001.nni ecoli.nt.002.nni ecoli.nt.003.nni ecoli.nt.000.nsd ecoli.nt.001.nsd ecoli.nt.002.nsd ecoli.nt.003.nsd ecoli.nt.000.nsi ecoli.nt.001.nsi ecoli.nt.002.nsi ecoli.nt.003.nsi |
Now create a test sequence file and run mpiblast on the sequence against the formatted database.
[nostromo@xxx ~]$ cat > test.txt >Test AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT [nostromo@xxx mpiblast]$ /opt/openmpi/bin/mpirun -np 4 /opt/bio/mpiblast/bin/mpiblast -d ecoli.nt -i $HOME/test.txt -p blastn -o $HOME/result.txt |
After mpirun terminates, result.txt contains the result of your computation.
This section gives a brief overview of running MPI Blast with SGE
Create a simple SGE submission scripts called mpiblast_sge.sh with the following contents
#!/bin/bash #$ -cwd #$ -j y #$ -S /bin/bash export MPI_DIR=/opt/openmpi/ export BLASTDB=$HOME/bio/ncbi/db/ export BLASTMAT=/opt/bio/ncbi/data/ $MPI_DIR/bin/mpirun -np $NSLOTS \ /opt/bio/mpiblast/bin/mpiblast \ -d ecoli.nt -i $HOME/test.txt \ -p blastn -o $HOME/result.txt |
Run
[nostromo@xxx ~]$ qsub -pe orte 4 mpiblast_sge.sh Your job 11 ("mpiblast_sge.sh") has been submitted |
The results of your computation will be present in $HOME/result.txt
![]() | Please note that an MPI blast job requires atleast 3 processors to run. The argument for mpirun specifying the number of processors should be factor of the number of pieces the blast database was divided into. If you're running on a cluster with 2 processors, SGE, by default, will not schedule a job which requires more than 2 slots to run. |
Further information about using mpiblast can be found at the MPI-Blast home page.
For support, please join the mpiblast mailing list