#

Odyssey 2.0 Transition FAQ

As of August 2013, we have transitioned to an upgraded cluster, using operating system CentOS 6 and queuing system SLURM, that we are calling "Odyssey 2.0". The previous system, using CentOS 5 and LSF, is now "legacy."

First we advise you read the following two pages about how to use the new system prior to consulting this FAQ:

 Odyssey Quick Start Guide

 SLURM: An Overview

Along with the software and hardware upgrades, we also transitioned cluster home directories to a larger and faster filesystem. You can find more on the new home directories here:

 Home Directory FAQ

Now on to the Frequently Asked Questions. If you do not find your question answered below or are still confused please contact rchelp@fas.harvard.edu.

How do I use the new CentOS 6 systems and SLURM queues?

Our standard name for cluster ssh access, login.rc.fas.harvard.edu, already points to the new systems. Continue connecting there as usual. See the above documentation links for using SLURM. LSF has been disabled on these new systems.

How do I use the legacy CentOS 5 systems and LSF queues?

Our old scheduler LSF and old OS CentOS 5 are being gradually retired. We encourage everyone to migrate over to the new infrastructure. However, if you still need to use the old infrastructure you may do so with out changing anything. All you need to do is to ssh directly to legacy.rc.fas.harvard.edu instead of going to login.rc.fas.harvard.edu. So as an example you would ssh to:

ssh USERNAME@legacy.rc.fas.harvard.edu

Then you should be able to use all the old queues that existed in LSF. None of the new hardware is available on LSF, so we encourage all to move over. SLURM has been disabled on these legacy systems.

My software won't work/compile on the new OS/hardware/SLURM queues.

You are likely using software that was built specifically for CentOS 5. Due to changes in the locations and versions of certain key files in CentOS 6, CentOS 5 built software may not work. If your code does not work take a look at the modules you are loading to build your software. Make sure first that you build it completely from scratch if you are building from source. That way you won't have any left over hooks into the old OS. Second you should look on the modules page for modules that have the centos6/ prefix ahead of them. Try to use those modules when building your code. For instance if you were using hpc/gcc-4.7.3 before and you notice your code isn't building, upgrade to centos6/gcc-4.8.0. Odds are that will fix your issue.

If your issue still persists try making sure that you have no duplicate modules loaded. For instance one should not load both the hpc/gcc-4.7.3 module and the centos6/gcc-4.8.0 at the same time. This will cause conflicts. Make sure also to simplify your software stack. Only have those modules loaded which are required to build the program.

If you are using a module that isn't in the centos6/ section and it is not working contact rchelp@fas.harvard.edu. We will build you a new module to use for CentOS 6.

My software works on the new systems but not on the old systems.

This is not necessarily bad -- the new hardware is the new standard and old hardware will soon be retired. The sooner you stop using it, the better.

However, if you do still need to use the legacy systems, you need to build CentOS 5 specific code on CentOS 5 hosts (just as above, with CentOS 6). If you wish to build for CentOS 5 and LSF, please use legacy.rc.fas.harvard.edu instead of login.rc.fas.harvard.edu.

I can't submit to the LSF/SLURM queues.

First make sure you are on the right login node to submit to either of these queues. If you are looking for LSF make sure you are on rclogin01 and rclogin02. If you are looking for SLURM make sure you are using login.rc or rclogin03-14. To find out what host you are on do:

hostname

That will tell you where you are.

If you are still having problems, you may not be in the correct group to access those queues. Contact rchelp@fas.harvard.edu to gain access.

I get an error like: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.9' not found (required by ./main.x)</code></strong></h3>
You are attempting to run software built on CentOS 6 in the LSF queues, which only target CentOS 5 nodes. <span style="font-size: 16px;">This is related to gcc and likely to be a problem; see item #3 in this FAQ.</span>
<h3><strong>I get the error: <code>Unknown colorls variable
rs'.

You submitted to the legacy LSF system from one of the current login nodes. This is not supported. Submit to SLURM, or ssh to legacy.rc.fas.harvard.edu (instead of login.rc.fas.harvard.edu) if you really need to use a consistent legacy setup.

I get the error: submission failed: Invalid partition name specified

You attempted to submit to the SLURM system from one of the legacy login nodes. This is not supported. SSH to login.rc.fas.harvard.edu (instead of legacy.rc.fas.harvard.edu) to use SLURM.

I get the error: sbatch: error: Batch job submission failed: User's group not permitted to use this partition

You must contact rchelp@fas.harvard.edu and ask to be added to the cluster_users group. This membership is added by default on all new cluster accounts.

I get the error: Exceeded job memory limit

See here for information on choosing an appropriate –mem value.

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.