Sabiendo como funcionan los script, se les propone ahora un script un poco mas útil. Se basa en una búsqueda con Blast de secuencias de nucleótidos contenidas en de un archivo fasta nuc.fasta. Buscamos en una base de datos nt (ya formateada), y ponemos el resultado en un archivo results.txt
#!/bin/bash # Nombre del job #$ -N Blastn-NT # Para obtener el PATH adecuado source /share/apps/Profiles/share-profile.sh # Ubicación de las bases de datos formateadas. export BLASTDB=/scratch/BlastDB/ OutFile="results.txt" blastn -db nt -query nuc.fasta -out $OutFile |
Se manda el job, y se checa su estado. Hasta verificar el archivo resultado:
$ qsub blast.sh Your job 250 ("Blastn-NT") has been submitted $ qstat -j 250 ============================================================== job_number: 240 exec_file: job_scripts/250 submission_time: Tue Feb 16 17:05:40 2016 owner: jerome uid: 1865 group: usmb gid: 503 sge_o_home: /home/jerome sge_o_log_name: jerome sge_o_path: /opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bio/ncbi/bin:/opt/bio/mpiblast/bin:/opt/bio/EMBOSS/bin:/opt/bio/clustalw/bin:/opt/bio/tcoffee/bin:/opt/bio/hmmer/bin:/opt/bio/phylip/exe:/opt/bio/mrbayes:/opt/bio/fasta:/opt/bio/glimmer/bin:/opt/bio/glimmer/scripts:/opt/bio/gromacs/bin:/opt/bio/gmap/bin:/opt/bio/tigr/bin:/opt/bio/autodocksuite/bin:/opt/bio/wgs/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/maven/bin:/opt/pdsh/bin:/opt/rocks/bin:/opt/rocks/sbin:/opt/gridengine/bin/linux-x64:/home/jerome/bin sge_o_shell: /bin/bash sge_o_workdir: /home/jerome sge_o_host: teopanzolco account: sge mail_list: jerome@teopanzolco.local notify: FALSE job_name: Blastn-NT jobshare: 0 env_list: script_file: Scripts/script-test.sh ../.. $ qstat -j 250 Following jobs do not exist: 250 $ head -30 results.txt BLASTN 2.3.0+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: nt 34,389,867 sequences; 110,383,924,525 total letters Query= gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain B hypothetical protein (PCYB_112670) mRNA, partial cds Length=1190 Score E Sequences producing significant alignments: (Bits) Value gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain B ... 2198 0.0 gi|672188562|ref|XM_008816634.1| Plasmodium inui San Antonio 1 ... 1173 0.0 gi|156101254|ref|XM_001616271.1| Plasmodium vivax SaI-1 hypothe... 1123 0.0 gi|817745457|ref|XM_012482260.1| Plasmodium fragile hypothetica... 1098 0.0 gi|194247187|emb|AM910993.1| Plasmodium knowlesi strain H chrom... 1098 0.0 gi|221057699|ref|XM_002261322.1| Plasmodium knowlesi strain H h... 1096 0.0 > gi|457872547|ref|XM_004223145.1| Plasmodium cynomolgi strain |
Para los curiosos, se puede obtener las estadísticas de uso del job. El comando qacct indica informaciones como memoria, tiempo CPU, y otros detalles:
$ qacct -j 250 ============================================================== qname all.q hostname compute-0-4.local group usmb owner jerome project NONE department defaultdepartment jobname Blastn-NT jobnumber 250 taskid undefined account sge priority 0 qsub_time Tue Feb 16 17:05:40 2016 start_time Tue Feb 16 17:05:52 2016 end_time Tue Feb 16 17:08:32 2016 granted_pe NONE slots 1 failed 0 exit_status 0 ru_wallclock 160 ru_utime 50.350 ru_stime 109.798 ru_maxrss 3844740 ru_ixrss 0 ru_ismrss 0 ru_idrss 0 ru_isrss 0 ru_minflt 6840092 ru_majflt 12 ru_nswap 0 ru_inblock 45616 ru_oublock 64 ru_msgsnd 0 ru_msgrcv 0 ru_nsignals 0 ru_nvcsw 492 ru_nivcsw 16236 cpu 160.149 mem 512.448 io 0.000 iow 0.000 maxvmem 3.836G arid undefined |