3.3. Job con Blast

Sabiendo como funcionan los script, se les propone ahora un script un poco mas útil. Se basa en una búsqueda con Blast de secuencias de nucleótidos contenidas en de un archivo fasta nuc.fasta. Buscamos en una base de datos nt (ya formateada), y ponemos el resultado en un archivo results.txt

#!/bin/bash

# Nombre del job
#$ -N  Blastn-NT

# Para obtener el PATH adecuado
source /share/apps/Profiles/share-profile.sh

# Ubicación de las bases de datos formateadas.
export BLASTDB=/scratch/BlastDB/

OutFile="results.txt"

blastn -db nt -query nuc.fasta -out $OutFile 
      

Se manda el job, y se checa su estado. Hasta verificar el archivo resultado:

$ qsub blast.sh 
Your job 250 ("Blastn-NT") has been submitted
$ qstat -j 250
==============================================================
job_number:                 240
exec_file:                  job_scripts/250
submission_time:            Tue Feb 16 17:05:40 2016
owner:                      jerome
uid:                        1865
group:                      usmb
gid:                        503
sge_o_home:                 /home/jerome
sge_o_log_name:             jerome
sge_o_path:                 /opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bio/ncbi/bin:/opt/bio/mpiblast/bin:/opt/bio/EMBOSS/bin:/opt/bio/clustalw/bin:/opt/bio/tcoffee/bin:/opt/bio/hmmer/bin:/opt/bio/phylip/exe:/opt/bio/mrbayes:/opt/bio/fasta:/opt/bio/glimmer/bin:/opt/bio/glimmer/scripts:/opt/bio/gromacs/bin:/opt/bio/gmap/bin:/opt/bio/tigr/bin:/opt/bio/autodocksuite/bin:/opt/bio/wgs/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/maven/bin:/opt/pdsh/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/gridengine/bin/linux-x64:/home/jerome/bin
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/jerome
sge_o_host:                 teopanzolco
account:                    sge
mail_list:                  jerome@teopanzolco.local
notify:                     FALSE
job_name:                   Blastn-NT
jobshare:                   0
env_list:                   
script_file:                Scripts/script-test.sh
  ../..
$ qstat -j 250
Following jobs do not exist: 
250
$ head -30 results.txt
BLASTN 2.3.0+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: nt
           34,389,867 sequences; 110,383,924,525 total letters



Query= gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain B
hypothetical protein (PCYB_112670) mRNA, partial cds

Length=1190
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain B ...  2198    0.0  
  gi|672188562|ref|XM_008816634.1| Plasmodium inui San Antonio 1 ...  1173    0.0  
  gi|156101254|ref|XM_001616271.1| Plasmodium vivax SaI-1 hypothe...  1123    0.0  
  gi|817745457|ref|XM_012482260.1| Plasmodium fragile hypothetica...  1098    0.0  
  gi|194247187|emb|AM910993.1| Plasmodium knowlesi strain H chrom...  1098    0.0  
  gi|221057699|ref|XM_002261322.1| Plasmodium knowlesi strain H h...  1096    0.0  


> gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain 
      

Para los curiosos, se puede obtener las estadísticas de uso del job. El comando qacct indica informaciones como memoria, tiempo CPU, y otros detalles:

$ qacct -j 250
==============================================================
qname        all.q               
hostname     compute-0-4.local   
group        usmb                
owner        jerome              
project      NONE                
department   defaultdepartment   
jobname      Blastn-NT           
jobnumber    250                 
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Tue Feb 16 17:05:40 2016
start_time   Tue Feb 16 17:05:52 2016
end_time     Tue Feb 16 17:08:32 2016
granted_pe   NONE                
slots        1                   
failed       0    
exit_status  0                   
ru_wallclock 160          
ru_utime     50.350       
ru_stime     109.798      
ru_maxrss    3844740             
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    6840092             
ru_majflt    12                  
ru_nswap     0                   
ru_inblock   45616               
ru_oublock   64                  
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     492                 
ru_nivcsw    16236               
cpu          160.149      
mem          512.448           
io           0.000             
iow          0.000             
maxvmem      3.836G
arid         undefined