Sabiendo como funcionan los script, se les propone ahora un script un poco mas útil. Se basa en una búsqueda con Blast de secuencias de nucleótidos contenidas en de un archivo fasta nuc.fasta. Buscamos en una base de datos nt (ya formateada), y ponemos el resultado en un archivo results.txt
#!/bin/bash
# Nombre del job
#$ -N Blastn-NT
# Para obtener el PATH adecuado
source /share/apps/Profiles/share-profile.sh
# Ubicación de las bases de datos formateadas.
export BLASTDB=/scratch/BlastDB/
OutFile="results.txt"
blastn -db nt -query nuc.fasta -out $OutFile
|
Se manda el job, y se checa su estado. Hasta verificar el archivo resultado:
$ qsub blast.sh
Your job 250 ("Blastn-NT") has been submitted
$ qstat -j 250
==============================================================
job_number: 240
exec_file: job_scripts/250
submission_time: Tue Feb 16 17:05:40 2016
owner: jerome
uid: 1865
group: usmb
gid: 503
sge_o_home: /home/jerome
sge_o_log_name: jerome
sge_o_path: /opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bio/ncbi/bin:/opt/bio/mpiblast/bin:/opt/bio/EMBOSS/bin:/opt/bio/clustalw/bin:/opt/bio/tcoffee/bin:/opt/bio/hmmer/bin:/opt/bio/phylip/exe:/opt/bio/mrbayes:/opt/bio/fasta:/opt/bio/glimmer/bin:/opt/bio/glimmer/scripts:/opt/bio/gromacs/bin:/opt/bio/gmap/bin:/opt/bio/tigr/bin:/opt/bio/autodocksuite/bin:/opt/bio/wgs/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/maven/bin:/opt/pdsh/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/gridengine/bin/linux-x64:/home/jerome/bin
sge_o_shell: /bin/bash
sge_o_workdir: /home/jerome
sge_o_host: teopanzolco
account: sge
mail_list: jerome@teopanzolco.local
notify: FALSE
job_name: Blastn-NT
jobshare: 0
env_list:
script_file: Scripts/script-test.sh
../..
$ qstat -j 250
Following jobs do not exist:
250
$ head -30 results.txt
BLASTN 2.3.0+
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: nt
34,389,867 sequences; 110,383,924,525 total letters
Query= gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain B
hypothetical protein (PCYB_112670) mRNA, partial cds
Length=1190
Score E
Sequences producing significant alignments: (Bits) Value
gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain B ... 2198 0.0
gi|672188562|ref|XM_008816634.1| Plasmodium inui San Antonio 1 ... 1173 0.0
gi|156101254|ref|XM_001616271.1| Plasmodium vivax SaI-1 hypothe... 1123 0.0
gi|817745457|ref|XM_012482260.1| Plasmodium fragile hypothetica... 1098 0.0
gi|194247187|emb|AM910993.1| Plasmodium knowlesi strain H chrom... 1098 0.0
gi|221057699|ref|XM_002261322.1| Plasmodium knowlesi strain H h... 1096 0.0
> gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain
|
Para los curiosos, se puede obtener las estadísticas de uso del job. El comando qacct indica informaciones como memoria, tiempo CPU, y otros detalles:
$ qacct -j 250
==============================================================
qname all.q
hostname compute-0-4.local
group usmb
owner jerome
project NONE
department defaultdepartment
jobname Blastn-NT
jobnumber 250
taskid undefined
account sge
priority 0
qsub_time Tue Feb 16 17:05:40 2016
start_time Tue Feb 16 17:05:52 2016
end_time Tue Feb 16 17:08:32 2016
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 160
ru_utime 50.350
ru_stime 109.798
ru_maxrss 3844740
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 6840092
ru_majflt 12
ru_nswap 0
ru_inblock 45616
ru_oublock 64
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 492
ru_nivcsw 16236
cpu 160.149
mem 512.448
io 0.000
iow 0.000
maxvmem 3.836G
arid undefined
|