LSF: Submitting Lots of Short Jobs (Job arrays)
Do you need to run a program or application thousands of times on different data sets? Instead of submitting each instance as a single LSF job, think about grouping them such that each LSF job runs multiple instances of the program. There are a few ways to do this.
1) Simply use linux shell commands to run the program multiple times in each LSF job. So your batch script would look something like this:
#!/bin/sh
#BSUB -u hptc@harvard
#BSUB -J prog
#BSUB -o prog_lsf.out
#BSUB -e prog_lsf.err
#BSUB -q short_serial
./prog data_1 1> data_1.out
./prog data_2 1> data_2.out
./prog data_3 1> data_3.out
./prog data_4 1> data_4.out
etc...
How many times you run a program in each LSF job depends on how long each job takes and how many total jobs you have. And of course you can script this using a while or for loop: there are many references on the web explaining how to do this. You are also welcome to send email to rchelp and we can write this for you.
2) Use LSF job arrays.
bsub -J “name[1-10]” submits 10 jobs with one numeric Job ID. %I stands for sub-job index and can be used to distinguish each run within the LSF job when submitting the job. The variable LSB_JOB_INDEX is then used within the script itself. Here is a sample script that runs 4 jobs in a job array:
#BSUB -u hptc@harvard
#BSUB -J prog[1-4]# %I will be replaced by 1, 2, etc. in -e and -o
#BSUB -e prog_lsf_%I.err
#BSUB -o prog_lsf_%I.out
#BSUB -q short_serial
./prog data_${LSB_JOBINDEX} > data_${LSB_JOBINDEX}.out
Some nodes used for short_serial and normal_serial have less memory than others (8GB vs 16GB). If your jobs crash nodes because they run out of memory, then you can use:
#BSUB -R "rusage[mem=15000]"
which tells the scheduler that your job will need 15GB of RAM during its run.
