#

FAQ

a. Login and Authentication (7)

How do I get a Research Computing account?

Please click here ( https://account.rc.fas.harvard.edu/request/ ) to request an account to access resources operated by Research Computing. (Odyssey Cluster, Storage, Software Downloads, Workstation access, Instrument sign-up, etc.)

If you are unsure whether you qualify for an RC account, please see Qualifications and Affiliations.

Once you've submitted the request, the following happens:

  1. First to RC personnel to check that the request is complete and meets affiliation requirements.
  2. Once approved by RC, an email is sent to your PI to approve/reject the request.
  3. The PI then approves/rejects the request.
  4. If approved, we finalize the account on our side.
  5. Once finalized, you receive an automated confirmation with your new account information.

You can then proceed to set up your OpenAuth token and get connected to Odyssey. The turnaround time is usually one business day, but may take longer if your PI does not approve the request immediately.

NOTE! You are required to attend the Introduction to Odyssey course within 45 days of your account issue; otherwise your account will automatically expire.

Last Updated: October 2, 2014

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

Can I share an account? – Account Security Policies

The sharing of passwords or login credentials is not allowed under RC and Harvard information security policies. Please bear in mind that this policy also protects the end-user. Sharing credentials removes plausible deniability for the account holder in case of account misuse. Accounts which are in violation of this policy may be disabled or otherwise limited.

If you find that you need to share resources among multiple individuals, please contact us and we will be happy to assist you with finding a safe and secure way to do so.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I login to Odyssey?

Step 0: Ensure that you've requested an account, your PI has approved the account request, and that you've received an Account Approved notice from RC.

Step 1: Launch the OpenAuth application. For instructions on how to install and launch OpenAuth please see here.

Step 2: Launch a Terminal application.

Step 3: Using your Terminal application, connect through login.rc.fas.harvard.edu using ssh. If you are running Linux or Mac OSX it is as simple as running: ssh USERNAME@login.rc.fas.harvard.edu

USERNAME is the name you were assigned when you received your Research Computing account. (Add -Y if you have an X11 server installed and desire graphics support.) If you are on Windows, download PuTTY or your favorite ssh software and connect to login.rc.fas.harvard.edu.

You will be asked for your Research Computing password and OpenAuth Verification Code upon connecting. The hostname login.rc.fas.harvard.edu is a round-robin to some of our hosts named rclogin##.rc.fas.harvard.edu, so that is what you will see in your shell prompt once connected.

Note: In certain instances you will need to be logged on to the Research Computing VPN to access Odyssey. Please see the VPN setup page for instructions on how to logon to the Research Computing VPN.

For more details on access to the Odyssey cluster see the Access and Login page.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I reset my Research Computing account password?

Please click here to reset your Research Computing account password.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I unlock my locked Research Computing account?

Once your account is locked, your account will automatically unlock after 15 minutes.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I install and launch OpenAuth?

Please click here to set-up OpenAuth.

The site will prompt you for your Harvard FAS Research Computing username and password. If you don’t yet have an account, you can request one here. Since the site uses email verification to authenticate you, you must also have a valid email address on record with Research Computing. All OpenAuth tokens are software-based, and you will choose whether to use a smart phone or java desktop app to generate your verification codes. Java 1.6 is required for the desktop app.

You must close your browser in order to logout of the site when you’re done. Once you have logged out, launch OpenAuth like any other application. You will need to use OpenAuth when accessing the Research Computing VPN and Odyssey cluster.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I logon to the Research Computing VPN?

Please see our VPN setup guide here.

Linux users please see our guide to using OpenVPN here.

 

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

View category→

b. Filesystems and Authorization (7)

How do I access my Odyssey home directory from my laptop?

Odyssey home directories are available through SAMBA and so can be mounted as a network drive on Mac, Windows, and Linux computers. See the Access and Login page for specific instructions on how to mount the directory.

If you do not need a persistent connection to your home directory, you can also transfer files using SFTP.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I check how much space I’ve used?

The standard linux tool du shows how much disk space is being used by individual files and directories. For example, the command:

du -x --max-depth 1 .

will print how much space is used by each directory in your current working directory, plus a total at the end.

Note: With the legacy home directories, the command df showed your personal quota details; this does not work with our current configuration of home directories on the Isilon filesystem.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How much space do I have in my home directory?

You are given 40 GB in your home directory. This is twice as much as with the legacy home directories. This size limit is referred to as your quota.

Sorry, but we cannot increase this allotment. Please use disk shares associated with your lab or one of our scratch files systems if you require more space.

Please see our Storage document for more information.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

I accidentally deleted my data, how do I get it back?

Your home directory has both checkpoints and backups. Checkpoints are snapshots of the data from various recent points in time that you can access yourself. They are in a hidden directory named .snapshot, within every other directory in your home directory. (In the legacy home directories, it was named .ckpt.) The command ls -a will not show these, but you can ls .snapshot directly, and cd .snapshot to move into them. For more on checkpoints, see our restoring files document.

We also make regular backups of your home directory to separate hardware in a different data center. If what you're looking for is not contained in a snapshot directory, please contact RC Help to see if we have an offline backup. See the above link for more on our backup schedule.

Lab directory backups are for disaster recovery only, as they are handled separately and do not have snapshot capabilities. As such, we cannot recover accidental file deletions. Please contact RC Help if you have any questions.

Please also see our Storage document for more info.

Last Updated: October 2, 2014

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

Why are all my files executable?

You may notice that the x (execute) bit is set on all your files:

[username@rclogin01 ~]# ls -l myfile.txt
-rwxr--r-- 1 username groupname 3029 Aug 20 03:10 myfile.txt

Furthermore, chmod does not remove it:

[username@rclogin01 ~]# chmod u-x myfile.txt
[username@rclogin01 ~]# ls -l myfile.txt
-rwxr--r-- 1 username groupname 3029 Aug 20 03:10 myfile.txt

This is a feature, a result of the storage system doing mixed Unix-style and Windows-style permissions. If this is causing a problem for you, please contact rchelp@fas.harvard.edu.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

Why does my UMASK not work?

You may also notice that your UMASK environment variable does not work as expected:

[username@rclogin01 ~]# umask 002
[username@rclogin01 ~]# touch newfile.txt
[username@rclogin01 ~]# ls -l newfile.txt
-rwx------ 1 username groupname 3029 Aug 20 03:10 newfile.txt

Normally, the outcome would be -rw-rw-r--. If this is causing a problem for you, please contact rchelp@fas.harvard.edu.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

Is my home directory available as a network filesystem share?

Yes, your cluster home directory is available as a network filesystem share to which you can directly connect your own desktop or laptop. The technical protocol for this is called CIFS or Samba, so you will often hear us refer to it in that way. On Windows, this is also referred to as mapping a network drive, and on a Mac it is called connecting to a server.

In all cases, you need your RC username, password, server name, and path. Please see the Mounting Storage document for detailed information.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

View category→

c. Jobs and SLURM (15)

How do I know what memory limit to put on my job?

Add to your job submission:

#SBATCH --mem X

where X is the maximum amount of memory your job will use per node, in MB. The larger your working data set, the larger this needs to be, but the smaller the number the easier it is for the scheduler to find a place to run your job. To determine an appropriate value, start relatively large (job slots on average have about 4000 MB per core, but that’s much larger than needed for most jobs) and then use sacct to look at how much your job is actually using or used:

sacct -o MaxRSS -j JOBID

where JOBID is the one you’re interested in. The number is in KB, so divide by 1024 to get a rough idea of what to use with –mem (set it to something a little larger than that, since you’re defining a hard upper limit).

For more information see here.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

Can I query SLURM programmatically?

I'm writing code to keep an eye on my jobs. How can I query SLURM programmatically?

We highly recommend that people writing meta-schedulers or that wish to interrogate SLURM in scripts do so using the squeue and sacct commands. We strongly recommend that your code performs these queries once every 60 seconds or longer. Using these commands contacts the master controller directly, the same process responsible for scheduling all work on the cluster. Polling more frequently, especially across all users on the cluster, will slow down response times and may bring scheduling to a crawl. Please don't.

SLURM also has an API that is documented on the website of our developer partners SchedMD.com.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I fairly manage dual/multiple lab affiliations for work on Odyssey?

We're really glad you asked us this question! There are two levels to this question, the first concerning filesystem rights and the second SLURM submissions.

For filesystem rights, your primary group ID should be set to your primary lab group, and request from us a secondary group membership in Active Directory. If you wish to switch to the other group for work, use the newgrp 2NDGROUPNAME command.

In SLURM, ensure that your primary group membership is set for the appropriate lab, and request from us a secondary group affiliation in SLURM. When submitting SLURM jobs, all resource usage will be charged to your primary SLURM group. If you wish to submit jobs for the other group, using the --account=2NDGROUPNAME on the sbatch or srun command.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I submit a batch job to the Odyssey queue SLURM?

Step 1: Login to Odyssey through your Terminal window. Please see the Access and Login page for login instructions.

Step 2: Run a batch job by typing: sbatch RUNSCRIPT. Replace RUNSCRIPT with the batch script (a text file) you will use to run your code.

The batch script should contain #SBATCH comments that tell SLURM how to run the job.

#!/bin/bash

#SBATCH -n 1 #Number of cores
#SBATCH -N 1 #Run on 1 node
#SBATCH -t 5 #Runtime in minutes
#SBATCH -p serial_requeue #Partition to submit to
#SBATCH --mem-per-cpu=100 #Memory per cpu in MB (see also --mem)
#SBATCH -o hostname.out #File to which standard out will be written
#SBATCH -e hostname.err #File to which standard err will be written
#SBATCH --mail-type=END #Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=ajk@123.com #Email to which notifications will be sent

hostname

See the batch submission section of the Running Jobs page for detailed instructions and sample batch submission scripts.

Note: You must declare how much memory and how many cores you are using for your job. By default SLURM assumes you need 100 MB. The script assumes that it is running in the current directory and will load your .bashrc.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I submit an interactive job to the Odyssey queue SLURM?

Step 1: Login to Odyssey through your Terminal window. Please see here for login instructions.

Step 2: Run an interactive job by typing: srun -p interact --pty MYPROGRAM

This will open up an interactive run for you to use. If you want a bash prompt, type: srun --mem 500 -p interact --pty bash

If you need X11 forwarding type: srun --mem 500 -p interact --pty --x11=first MYPROGRAM

This will initiate an X11 tunnel to the first node on your list. –-x11 has additional options of batch, first, last and all.

See also the interactive jobs section of the Running Jobs page.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I view or monitor a submitted job?

Step 1: Login to Odyssey through your Terminal window. Please see the Access and Login page for login instructions.

Step 2: From the command line type one of three options: smap, squeue, or showq-slurm

If you want more details about your job, from the command line type: sacct -j JOBID

You can view the runtime and memory usage for a past job by typing: sacct -j JOBID --format=JobID,JobName,MaxRSS,Elapsed, where JobID is the numeric job ID of a past job.

See the Running Jobs page for more details on job monitoring.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

My job is PENDING. How can I fix this?

How soon a job is scheduled is due to a combination of factors: the time requested, the resources requested (e.g. RAM, # of cores, etc), the partition, and one's FairShare score.

Also, if a cluster maintenance window is near and if the -t parameter is not specified at job submission, SLURM assumes that you wish to take the longest amount of time possible for that partition. Will which probably mean the job won't finish before the maintenance. So the job PENDs. Probably not what you want.

Consider shortening the runtime or reducing the requested resources to increase the likelihood that your job will start sooner.

Please see this document for more information.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

SLURM Errors: Device or resource busy

What's up? My SLURM output file terminates early with the following error:

"slurmstepd: error: _slurm_cgroup_destroy: problem deleting step cgroup
path /cgroup/freezer/slurm/uid_57915/job_25009017/step_batch: Device or
resource busy"

Well, usually this is a problem in which your job is trying to write to a network storage device that is busy -- probably overloaded by someone doing high amounts of I/O (input/output) where they shouldn't, usually on low throughput storage like home directories or lab disk shares.

Please contact RCHelp about this problem, giving us the jobID, the filesystem you are working on, and additional details that may be relevant. We'll use this info to track down the problem (and, perhaps, the problem user(s)).

(If you know who it is, tap them on the shoulder and show them our Odyssey Storage page.)
 
 
Last Updated: October 2, 2014

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

SLURM errors: Job cancelled due to preemption

If you've submitted a job to the serial_requeue partition, it is more than likely that your job will be scheduled on a purchased node that is idle. If the node owner submits jobs, SLURM will kill your job and automatically requeue it. This message will appear in your STDOUT or STDERR files you indicated with the -o or -e options. This is simply an informative message from SLURM.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

SLURM Errors: Memory limit

Job <jobid> exceeded <mem> memory limit, being killed:

Your job is attempting to use more memory than you've requested for it. Either increase the amount of memory requested by --mem or --mem-per-cpuor, if possible, reduce the amount your application is trying to use. For example, many Java programs set heap space using the -Xmx JVM option. This could potentially be reduced.

For jobs that require truly large amounts of memory (>256 Gb), you may need to use thebigmem SLURM partition. Genome and transcript assembly tools are commonly in this camp.

See this FAQ on determining how much memory your completed batch job used under SLURM.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

SLURM Errors: Node Failure

JOB <jobid> CANCELLED AT <time> DUE TO NODE FAILURE:

This message may arise for a variety of reasons, but it indicates that the host on which your job was running can no longer be contacted by SLURM. Not a good sign. Contact RCHelp to help with this problem.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

SLURM errors: Socket timed out. What?

If the SLURM master (the process that listens for SLURM requests) is busy, you might receive the following error:

[bfreeman@rclogin12 ~]$ squeue -u bfreeman
squeue: error: slurm_receive_msg: Socket timed out on send/recv operation
slurm_load_jobs error: Socket timed out on send/recv operation

Since SLURM is scheduling 1 job every 2 seconds (let alone doing the calculations to schedule this job on 1 of approximately 1000 compute nodes), it's going to be a bit busy at times. Don't worry. Get up, stretch, pet your cat, grab a cup of coffee, and try again.
 
 
Last Updated: October 2, 2014

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

SLURM Errors: Time limit

JOB <jobid> CANCELLED AT <time> DUE TO TIME LIMIT:

Either you did not specify enough time in your batch submission script, or you didn't specify the amount of time and SLURM assigned the default time of 10 minutes. The -t option sets time in minutes or can also take D-HH:MM form (0-12:30for 12.5 hours). Submit your job again with a longer time window.

Last Updated: October 2, 2014

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

What is Fair-Share?

FairShare is a score that determines what priority you have in the scheduling queue for your jobs. The more jobs you run, the lower your score becomes, temporarily. A number of factors are used to determine this score -- please read this section in our Running Jobs document for more information.

To find out what your score is, enter `sshare -u USERNAME` in your Odyssey terminal session. In general, a score of 0.5 or above means you have higher priority for scheduling.

For further information, see the SchedMD.com fair-share document.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

View category→

d. Software (3)

How do I load a module or software on Odyssey?

Step 1: Login to Odyssey through your Terminal window. Please see here for login instructions.

Step 2: Load a module/software by typing: module load MODULENAME. Replace MODULENAME with the specific software you want to use. A complete listing of modules can be found on the module list page. Only the modules that begin with Centos6 are supported on the current cluster.

To see what modules you have loaded type: module list

To unload a module type: module unload MODULENAME

Details can be found in the modules section of the Running Jobs page.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

How do I run a Matlab script on Odyssey?

To run a Matlab script (with no graphical interface component) on the Odyssey cluster, login using your preferred terminal application then activate the application by loading the module.

module load centos6/matlab-R2013a

Then, assuming your script is named calc.m, either run it through an interactive session

srun --pty --mem 1000 -p interact matlab -nojvm -nodisplay -nosplash < calc.m

or use the matlab command in a batch script

#!/bin/bash
#SBATCH -o calc.out
#SBATCH -o calc.err
#SBATCH -p serial_requeue
#SBATCH -n 1
#SBATCH -N 1
#SBATCH --mem 1000
#SBATCH -t 1000

matlab -nojvm -nodisplay -nosplash < calc.m

Make sure that `calc.m` finishes with an `exit` command. Otherwise, the process will hang waiting for further input.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

Perl modules: Can’t locate XX.pm in @INC

Perl modules have been developed over the past 15 to 20 years, and the installation method has changed significantly. Unfortunately, you might run into a program that needs to install a really old Perl module, and its installation is just not behaving properly under the new installation methods. You might see something like the following:

[bfreeman@rclogin12 PfamScan]$ ./pfam_scan.pl --help
Can't locate Data/Printer.pm in @INC (@INC contains: /n/sw/fasrcsw/apps/Core/perl-modules.....

The remedy can be rather simple:
1. Follow our new lmod - Perl instructions here on setting up your home directory for installing Perl modules 'locally'.

Note that the export PERL5LIB command must include both $LOCALPERL and $LOCALPERL/lib/perl5 (it's subdirectory) as some installation routines honor one; some the other.

2. Sometimes, you might need to install the module manually. Try both the Makefile.PL build and the Build.PL build if one or the other doesn't work.

3. In CPAN, you can do this manual install method without the hassle of the download process:

cpan
look Data::Printer

This latter command will download the module and unpack it for you, and leave you at the shell, where you can try either the Makefile.PL or Build.PL build process.

CC BY-NC 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.

Permalink.

View category→