System and Infrastructure Status News
Delta Projects file system maintenance Thursday March 14th, 2024
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org
Start Date: March 14, 2024, 1:00 p.m.
End Date: March 15, 2024, 10:00 p.m.
The maintenance on the Taiga servers that provide the /projects file system on Delta encountered an issue early last night. Delta staff are working with the vendors to resolve the issue and return the file system to service. Currently, no estimation is available for /projects return to service. We will provide updates as new information becomes available. The Delta system continues to run jobs that do not indicate a need for the /projects file system (see below). The Delta projects (/projects) file system will be unavailable on March 14th from 8AM to 8PM for server-side maintenance. The upgraded software will allow for the enforcement of quotas on the projects file system. We recommend that research teams that are over quota in /projects start to clean up and get back under quota. Two days before the maintenance day, jobs already in the queue will be put on hold that have the projects file system as a slurm Feature or Constraint. Jobs that do not require /projects or /taiga file systems as a Feature will be allowed to run. If your job requires files in /projects then be sure to add the appropriate constraint as shown below. See below for information on how to specify file systems as a job Feature for already submitted jobs. Please send questions to help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu) and be sure to mention Delta in the subject or message body. --Delta Project Office To indicate that a job uses the /projects file system at job submission add the --constraint option to batch scripts: #SBATCH --constraint="projects" or add it to the command line: $ sbatch --constraint="projects" ... For an already submitted but not yet running job use: $ scontrol update job=JOBID Features="projects" replacing JOBID with the appropriate job id. Please see https://docs.ncsa.illinois.edu/systems/delta/en/latest/user_guide/architecture.html#file-system-dependency-specification-for-jobs
Posted: March 20, 2026
Bridges-2 Maintenance Wednesday March 13, 2024
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org
Start Date: March 13, 2024, 8:13 p.m.
End Date: March 13, 2024, 10:30 p.m.
Due to a power outage at the machine room, Bridges-2 is unavailable. Thank you for your patience while we restore full functionality.
Posted: March 20, 2026
Anvil Cluster Maintenance
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org
Start Date: March 12, 2024, 12:00 p.m.
End Date: March 13, 2024, 12:05 a.m.
Update as of March 12th 2024 8:05pm EDT The maintenance is now concluded and Anvil has been returned to service. Please submit a ticket through ACCESS Help Desk at https://support.access-ci.org/open-a-ticket if you have any questions. Update as of March 12th 2024 5:42pm EDT Engineers are experiencing multiple outages at this time due to power disruption. Due to this, the scheduled Anvil cluster maintenance is being extended until Tuesday, March 12th at 10pm. Please submit a ticket through ACCESS Help Desk at https://support.access-ci.org/open-a-ticket if you have any questions. Original News The Anvil cluster is unavailable beginning Tuesday, March 12th 2024 at 8:00am for a scheduled maintenance. It will return to full production by Tuesday, March 12th at 5pm. During this time, Anvil will have rack and power maintenance performed. Any jobs requesting a walltime which would take them past Tuesday, March 12th, 2024 at 8:00am will not start and will remain in the queue until after the maintenance is completed. Please submit a ticket through ACCESS Help Desk at https://support.access-ci.org/open-a-ticket if you have any questions.
Posted: March 20, 2026
Bridges-2 Maintenance Tuesday, March 5, 2024
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org
Start Date: March 5, 2024, 7:15 p.m.
End Date: March 5, 2024, 8:44 p.m.
Bridges-2 is currently undergoing maintenance to address some filesystem issues. Our team is working on returning the system to full service.
Posted: March 20, 2026
Kerberos Outage
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: kerberos.access-ci.org
Start Date: March 5, 2024, 7:00 p.m.
End Date: March 5, 2024, 10:00 p.m.
The Master Kerberos KDC will be upgraded to RHEL8. DUring this time users will not be able to create account or change passwords. Authentication should not be affected
Posted: March 20, 2026
ACCESS Ticket System degraded Feb. 28, 2024 through Mar. 1, 2024
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: tickets.access-ci.org
Start Date: February 28, 2024, 12:00 p.m.
End Date: March 1, 2024, 4:00 p.m.
JIRA Service Management automation rules issue detailed here: - https://jira-service-management.status.atlassian.com/incidents/rrhk1kk4m0kv
Posted: March 20, 2026
Anvil SLURM intermittent issues - Issue isolated, Outage resolved
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org
Start Date: February 27, 2024, 1:00 p.m.
End Date: February 27, 2024, 10:00 p.m.
The underlying issue has been isolated and the outage is resolved __________________________________________________________ The Anvil cluster began experiencing issues with Slurm Scheduling this past week. Engineers are currently diagnosing the root cause and are working to identify a fix. Scheduling is still enabled at this time. You may experience periodic SLURM outage where command will be unable to connect to the slurm controller. This can cause jobs to take longer than normal, and in some instanes fail. In addition, Open Ondemand relies on Slurm to run applications. When these issues with slurm occur, the menu in OOD may appear empty or non functional. We will provide another update by 5PM EST today
Posted: March 20, 2026
SDSC Expanse maintenance, 8AM-4PM (PT), Monday, 02/12/2024 [Completed]
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org
Start Date: February 12, 2024, 4:00 p.m.
End Date: February 13, 2024, 12:00 a.m.
>>> Update The Expanse maintenance is complete and the reservation has been released and jobs are running. Slurm has been updated to version 23.02.7. Please contact us via the ACCESS ticketing system if you have any questions. >>> Original message We will have a maintenance period on Expanse 8AM-5PM (PT), Feb 12, 2024. During this maintenance, we will be updating the Slurm scheduler. We have a reservation in place to prevent jobs from running during this period. The "squeue" output will show "ReqNodeNotAvail, Reserved for maintenance" for jobs that do not fit in the time period before the maintenance begins. These jobs will run after we release the maintenance reservation.
Posted: March 20, 2026
Anvil Scheduled Maintenance Wednesday, February 7th, 2024
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org
Start Date: February 7, 2024, 1:00 p.m.
End Date: February 8, 2024, 3:00 a.m.
Maintenance is now complete and Anvil has been returned to service —————-/ Anvil maintenance has been extended. We will provide a other update by 10PM — The Anvil system will be unavailable Wednesday, February 7th, 2024 from 8:00am - 6:00pm EDT for scheduled maintenance. Any Slurm jobs which request a walltime which would take them past Wednesday, February 7th, 2024 at 8:00am EDT will not start and will remain in the queue until after the maintenance is completed. Anvil will return to full production by Wednesday, February 7th, 2024 at 6:00pm EDT. Please submit a ticket through ACCESS Help Desk at https://support.access-ci.org/open-a-ticket if you have any questions.
Posted: March 20, 2026
ACES Scheduled Reconfiguration/Partial Outage Feb 5-15, 2024
PublishedInfrastructure News Type: Reconfiguration
Affected Infrastructure: aces.tamu.access-ci.org
Start Date: February 5, 2024, 3:00 p.m.
End Date: February 16, 2024, 12:00 a.m.
Beginning at 9am on Monday February 5 to 6pm on Friday February 15, 60 compute nodes, all 30 NVIDIA H100 GPUs, and all Intel PVC GPUs will be unavailable while we redeploy those hardware into four new hardware composability fabrics. The remaining 50 compute nodes on the other four composability fabrics will remain online during this time. All other ACES services such as account access, data transfer, job submission, etc. will remain available during this period.
Posted: March 20, 2026
Delta Notice: Delta maintenance 01-23-2024 - 01-25-2024
PublishedInfrastructure News Type: Reconfiguration
Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org
Start Date: January 23, 2024, 2:00 p.m.
End Date: January 25, 2024, 11:00 p.m.
The Delta resource will undergo maintenance starting 8:00AM on Tuesday January 23rd, 2024. During the maintenance Delta compute nodes will be upgraded with the HPC Cassini network interface card and will boot with an OS image updated to support the new Slingshot11 communication software stack. Please see the Delta Network Upgrade page at https://wiki.ncsa.illinois.edu/display/DSC/Delta+Network+Upgrade for information on the forthcoming changes. During the maintenance period: • Jobs will continue to be scheduled to run. • Compute nodes will be upgraded in batches on Tuesday and Wednesday. • the dt-login.delta.ncsa.illinois.edu and login.delta.ncsa.illinois.edu alias will point to the upgraded dt-login03 and dt-login04 login nodes. • dt-login01 will be rebooted but will remain available as a Slingshot10 configured login node. Delta resources will be available during the maintenance period: Delta login nodes • On Tuesday users are encouraged to use the dt-login.delta.ncsa.illinois.edu ssh alias or dt-login03.delta.ncsa.illinois.edu and dt-login04.delta.ncsa.illinois.edu in particular to begin to use compute nodes moved to the Slingshot11 configuration. • Jobs submitted from dt-login03 and dt-login04 will automatically be assigned to run on upgraded compute nodes. • dt-login01 will remain available in the Slingshot10 configuration to be used to address any porting issues discovered during the upgrade. • Jobs submitted from dt-login01 will automatically be assigned to run on non-upgraded compute nodes. Delta compute nodes • On Tuesday and Wednesday 1/2 of each node type will be upgraded and moved to the Slingshot11 configuration. • The pool of available upgraded nodes will increase during the day as they are returned to service. • Two compute nodes of each type, except for the gpuA100x8 nodes and the gpuMI100x8 node, will remain on Slingshot10 to address any porting issues discovered during the upgrade. Delta services: • Open OnDemand - will move to supporting Slingshot11 on Tuesday morning after a security software update. Expect a 30 - 60 minute OnDemand outage between 8:00AM and 9:00AM. • Delta Globus Online endpoint - available. Reminder: • Codes that use OpenMPI or similar on the Slingshot10 nodes will need to be rebuilt to run on the upgraded Slingshot11 nodes. • Jobs submitted from dt-login01 will only run on the remaining non-upgraded Slingshot10 computes nodes. A follow-up message will be sent once maintenance is complete. Please send questions to help@ncsa.illinois.edu and be sure to mention Delta in the subject.
Posted: March 20, 2026
Georgia Tech Hive Gateway Scheduled Downtime
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: hive.gatech.access-ci.org
Start Date: January 23, 2024, 12:00 p.m.
End Date: January 26, 2024, 5:59 a.m.
PACE Quarterly Maintenance period is scheduled to begin at 6:00AM on Tuesday, 01/23/2024, and is scheduled to conclude by 11:59PM on Friday, 01/26/2024. Please note, as usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the Maintenance Period by the scheduler. During the Maintenance Period, access to all the PACE managed computational and storage resources will be unavailable. Please see the list of activities to be completed, which are posted at https://blog.pace.gatech.edu/?p=7778
Posted: March 20, 2026
ACCESS XDMoD Scheduled Downtime
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: xdmod.access-ci.org
Start Date: January 16, 2024, 1:00 p.m.
End Date: January 16, 2024, 11:00 p.m.
ACCESS XDMoD (https://xdmod.access-ci.org/ ) will be unavailable from approximately 7:00AM to 5:00PM EDT on Tuesday January 16th 2024 during a scheduled monthly downtime. The downtime will cause a full outage for both XDMoD and the ACCESS Metrics site. These services should be unavailable for only a couple of minutes despite the full-day downtime. A follow-up message will be sent when the downtime is complete.
Posted: March 20, 2026
ACCESS XDMoD Scheduled Downtime
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: xdmod.access-ci.org
Start Date: December 19, 2023, 6:00 p.m.
End Date: December 20, 2023, 12:00 a.m.
UPDATE 12/19/23 15:56 EDT: The downtime is complete and ACCESS XDMoD is back up. Thank you for your patience. ACCESS XDMoD (https://xdmod.access-ci.org/ ) will be unavailable from approximately 12:00PM to 6:00PM EDT on Tuesday December 19th 2023 during a scheduled infrastructure update. This will temporarily be a full outage of the service. A follow-up message will be sent when the update is complete.
Posted: March 20, 2026
Bridges-2 Outage December 19-20
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org
Start Date: December 19, 2023, 12:00 p.m.
End Date: December 21, 2023, 12:00 a.m.
Beginning on Tuesday, December 19 at 6AM Eastern time, the entire PSC machine room (all machines, VMs and filesystems) will be unreachable due to a major networking upgrade. We anticipate that this outage will last until 6PM Eastern time on Wednesday December 20.
Posted: March 20, 2026
SDSC Expanse Maintenance 7AM-Midnight (PT), Monday, Dec 18, 2023 [Completed]
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org
Start Date: December 18, 2023, 3:00 p.m.
End Date: December 19, 2023, 2:30 a.m.
The Slurm scheduler upgrade has been completed on Expanse and the machine is available for use. Slurm was upgraded from version 21.08.8 to 23.02.6. Please note that with the upgrade of Slurm, srun default behaviour has changed. Details of release specific changes are available in: https://github.com/SchedMD/slurm/blob/slurm-23-02-6-1/NEWS (https://urldefense.com/v3/__https://github.com/SchedMD/slurm/blob/slurm-23-02-6-1/NEWS__;!!Mih3wA!CdIUBFjMQW1aL5WriJZf0AW9DInW3G8D99tY-K4oEFYdAWirVTSpm_6et8qGavnPSV87kgRvaigIvXsrQRih$) One change in particular might impact some users as srun is no longer reading SLURM_CPUS_PER_TASK. This meas that the --cpus-per-task value set in the #SBATCH specification will not be automatically picked up by any srun command within the script. Users can either add a specific option to their srun command OR set the following variable before the srun commands: export SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK} No changes are required if your script was using Intel MPI and mpirun. Please contact us either via the ACCESS ticketing system or via email to consult@sdsc.edu if you have any questions. >>>>>>>> Dear Expanse User, We will have a maintenance period on Expanse 7AM-Midnight (PT), Dec 18, 2023. During this maintenance, we will be updating the Slurm scheduler. We have a reservation in place to prevent jobs from running during this period. The "squeue" output will show "ReqNodeNotAvail, Reserved for maintenance" for jobs that do not fit in the time period before the maintenance begins. These jobs will run after we release the maintenance reservation. Thanks SDSC User Support Staff
Posted: March 20, 2026
ACCESS XDMoD Scheduled Downtime
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: xdmod.access-ci.org
Start Date: December 14, 2023, 2:00 p.m.
End Date: December 14, 2023, 5:00 p.m.
UPDATE 12/14/23 13:39 EDT: ACCESS XDMoD is up now, however some features might not be available for a couple more hours. Thank you for your patience during this time. ACCESS XDMoD (https://xdmod.access-ci.org/ ) will be unavailable from approximately 10:00AM to 1:00PM EDT on Thursday December 14th 2023 during a scheduled infrastructure update. This will temporarily be a full outage of the service. A follow-up message will be sent when the update is complete.
Posted: March 20, 2026
ACCESS Web Login Partial Outage October 31, 2023
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: identity.access-ci.org
Start Date: October 31, 2023, 2:15 p.m.
End Date: November 2, 2023, 1:00 p.m.
ACCESS Web Login is failing for some users. The issue was due to LDAP corruption which has been fixed.
Posted: March 20, 2026
Anvil Unplanned Outage
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org
Start Date: October 17, 2023, 1:45 p.m.
End Date: October 17, 2023, 2:55 p.m.
Dear Anvil user, Anvil nodes experienced a brief outage this morning. The problem is resolved and nodes are online now. Please check the status of your job and resubmit if necessary.
Posted: March 20, 2026
Delta /projects file system temporarily unavailable
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: delta-storage.ncsa.access-ci.org
Start Date: October 9, 2023, 6:30 p.m.
End Date: October 9, 2023, 9:05 p.m.
The Delta /projects file system currently has an issue that has taken part of it down rendering it unresponsive. NCSA is working with the vendor at the moment to determine the problem and resolution but do not yet have an ETA for repair. We have removed the projects and taiga constraints from the scheduling configuration so new jobs requesting those constraints will not start. At this time any attempt to access files on /projects will hang which may impact logins as well. A follow-up message will be sent once the repair is complete.
Posted: March 20, 2026