In December 2014, we had to do emergency repair on one of the servers in the regal scratch filesystem. It appears, although those repairs were successful, that the specific node in question had more problems than we or our vendor suspected. Given this issue and the fact that so many of our researchers depend on the regal scratch filesystem to do their work effectively and efficiently, we unfortunately need to perform yet another emergency maintenance on this particular system (we thought about doing this last week but felt there wasn't sufficient time to notify our constituents).
Our tentative plan is to have our vendor do the drive shelf replacement (a simple reseat didn't fix the problem) on Friday, January 16 at 10AM. The work will take approximately 2 hours; regal will return to full production capacity at approximately noon or earlier. Users of regal may see - as before - certain files temporarily unavailable while this node is being worked on (a typical directory listing):
-rw-r--r-- 1 brw rc_admin 1170 Jan 16 2014 deep1-usage.txt
-rw-r--r-- 1 brw rc_admin 1031 Jan 16 2014 ????????
-rw-r--r-- 1 brw rc_admin 597 Jan 16 2014 deep3-usage.txt
-rw-r--r-- 1 brw rc_admin 376 Jan 16 2014 deep4-usage.txt
-rwxr-xr-x 1 root root 120 Dec 2 11:02 ????????
-rw-r--r-- 1 brw rc_admin 24971 Sep 19 14:11 dell_warranty.txt
drwxr-xr-x 3 brw rc_admin 25 Jan 29 2014 desktop
-rwx------ 1 brw rc_admin 690 Oct 22 15:51 get_holyoke_warranties.sh
-rwx------ 1 brw rc_admin 704 Oct 22 15:56 get_service_tag.sh
-rw-r--r-- 1 brw rc_admin 2146951 Dec 2 10:26 label.jpg
-rw-r--r-- 1 brw root 5834 Nov 13 14:30 query.txt
The files that show up as "????????" are on the server that is temporarily unavailable for servicing. As soon as it is brought back online, the file listing will return to its normal appearance and you will have access to your files again. It is possible they may be unavailable for the entire maintenance window depending on the work being done to regal at that time.
Note that older client systems that access this scratch filesystem may return "I/O error" instead of the display above. As with the above, this is no cause for concern; as soon as the work completes, the file listing will again appear normal. We will be upgrading the Lustre client software in one of the upcoming planned maintenance windows.
None of this work will delete data.
Please contact us with any concerns or questions. Again, we thank you for your cooperation and patience. We are planning on replacing this problematic node in the near future.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.