LSM Throttling
Our Local Site Mover uses ssh and scp to stage in/out files for jobs.
When we jumped from 500 jobs slots to 2040, we ran into the 1024 ulimit for the number of concurrent processes on the atlas.bu.edu gatekeeper:
bash: fork: Resource temporarily unavailable
The ulimit could be raised, but we thought it wiser to do something to throttle the number of concurrent processes spawned by the lsm.
All plots and examples below correspond to running 12 parallel lsm-gets of a 600 MiB file, unless otherwise specified.
Most nodes have 12 job slots, and this assumes a worst case scenario.
The processes spawned by each individual job are sequential.
Using an SSH ControlMaster
One thought was to use a single ssh ControlMaster connection for the node.
This reduces the number of processes on the gatekeeper by a factor of three (actually, 36 -> 14, since there’s still two for the ControlMaster).
I.e. instead of this:
root 25464 0.0 0.0 75860 2852 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25515 14.7 0.0 76140 1832 ? R 13:22 0:07 | \_ sshd: harvard@notty
harvard 25520 2.6 0.0 37344 1388 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25466 0.0 0.0 75860 2860 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25518 14.7 0.0 76140 1844 ? S 13:22 0:07 | \_ sshd: harvard@notty
harvard 25522 2.6 0.0 37344 1392 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25473 0.0 0.0 75860 2856 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25525 14.6 0.0 76140 1840 ? R 13:22 0:07 | \_ sshd: harvard@notty
harvard 25534 2.6 0.0 37344 1392 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25481 0.0 0.0 75860 2852 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25553 14.5 0.0 76140 1832 ? S 13:22 0:07 | \_ sshd: harvard@notty
harvard 25572 2.7 0.0 37344 1388 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25482 0.0 0.0 75860 2856 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25565 15.1 0.0 76140 1840 ? S 13:22 0:08 | \_ sshd: harvard@notty
harvard 25578 2.7 0.0 37344 1388 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25483 0.0 0.0 75860 2852 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25574 14.6 0.0 76204 1896 ? S 13:22 0:07 | \_ sshd: harvard@notty
harvard 25591 2.5 0.0 37344 1392 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25484 0.0 0.0 75860 2856 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25655 14.1 0.0 76172 1872 ? S 13:22 0:07 | \_ sshd: harvard@notty
harvard 25660 2.5 0.0 37344 1388 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25496 0.0 0.0 75860 2860 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25570 14.3 0.0 76140 1844 ? S 13:22 0:07 | \_ sshd: harvard@notty
harvard 25587 2.5 0.0 37344 1384 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25499 0.0 0.0 76752 2856 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25627 14.1 0.0 76752 1808 ? S 13:22 0:07 | \_ sshd: harvard@notty
harvard 25630 2.5 0.0 37344 1388 ? Ss 13:22 0:01 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
root 25500 0.0 0.0 75860 2856 ? Ss 13:22 0:00 \_ sshd: harvard [priv]harvard 25636 15.0 0.0 76140 1840 ? R 13:22 0:08 \_ sshd: harvard@notty
harvard 25645 2.6 0.0 37344 1388 ? Ss 13:22 0:01 \_ scp -f /gpfs1/jab/testing/random_600_MiB
we can have this:
root 24853 0.0 0.0 75860 2852 ? Ss 14:02 0:00 \_ sshd: harvard [priv]harvard 24858 0.3 0.0 79172 4704 ? S 14:02 0:00 | \_ sshd: harvard@notty
harvard 7683 0.2 0.0 37344 1392 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7694 0.2 0.0 37344 1392 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7799 0.2 0.0 37344 1392 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7800 0.4 0.0 37344 1388 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7801 0.4 0.0 37344 1388 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7803 0.4 0.0 37344 1392 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7806 0.4 0.0 38236 1388 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7902 0.2 0.0 37344 1392 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7926 0.2 0.0 37344 1392 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
harvard 7927 0.2 0.0 37344 1388 ? Ss 14:05 0:00 | \_ scp -f /gpfs1/jab/testing/random_600_MiB
(In the above examples, two out of the 12 parallel lsm-gets were failing for other reasons.)
The number of processes on the worker node remain the same (three per lsm call).
The ControlMaster reduces the bandwith by almost a factor of 10, though:

I supposed this is the well-known downside of connection multiplexing and inappropriately sized buffers (http://www.psc.edu/networking/projects/hpn-ssh/faq.php).
Using a Per-Node Exclusive Lock
The second approach was to make each lsm call aquire an exlusive lock on the node before contacting the BU gatekeeper.
This was accomplished by prefixing:
flock -x -w 1200 /scratch/lsm.lock
To each shell command.
This had less of an impact on bandwidth:

Conlusion
The flock approach is much more attractive.
It should guarantee a maximum of 3 times the number of nodes (3 * 192 = 576) processes on the gatekeeper due to lsm.
This should be sufficently below the 1024 ulimit.
