System and Infrastructure Status News

Delta and DeltaAI file system outage: /projects and /taiga

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: July 17, 2025, 12:00 p.m.

End Date: July 18, 2025, 3:00 a.m.

On July 17, 2025 from 7:00 AM - 10:00 PM the /project and /taiga file systems will not be available due to planned maintenance. Files in /projects or /taiga will not be accessible during the maintenance window. As the maintenance day approaches it is recommended that jobs which do not need access to files in /projects or /taiga make use of a special reservation. Jobs that are submitted to the special "no projects or taiga” reservation and do not specify /projects or /taiga file systems as a Feature or constraint will be allowed to run during the /projects and /taiga file system maintenance. This reservation can run jobs before the reservation is active next week. We recommend using the no_projects_taiga_requirements reservation in advance of the maintenance day. To submit new jobs to the special reservation, add the following #SBATCH --reservation=no_projects_taiga_requirements or use the command line option as in $ sbatch --reservation= no_projects_taiga_requirements ... For jobs already submitted but not running and that might be scheduled to run during the maintenance day, use scontrol as follows to add the job to the reservation: $ scontrol update reservation= no_projects_taiga_requirements job=JOBID where JOBID is the slurm job id of the existing job. To verify the change use scontrol again as follows $ scontrol show job JOBID | grep -i Reservation and you should see Reservation= no_projects_taiga_requirements In general we recommend using Slurm's constraint and feature to indicate to the job scheduler which jobs depend on any file system including the projects or Taiga file system. Jobs that have the projects or taiga file system as a Slurm Feature or constraint will be put on hold 2 days before the maintenance start time. See below for information on how to specify file systems as a job constraint for new jobs and as a Feature for already submitted jobs.

Posted: March 20, 2026

Bridges-2 Maintenance

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: bridges2-gpu.psc.access-ci.org

Start Date: July 17, 2025, 2:00 a.m.

End Date: July 17, 2025, 5:00 p.m.

Due to a lightning storm in the area, Bridges-2 has experienced some issues this evening. Most of the machine has been restored but some partitions will remain unavailable until the morning when we have additional staff on site. Thank you for your patience while we restore all services.

Posted: March 20, 2026

ACES filesystem issues

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: July 11, 2025, 2:55 p.m.

End Date: July 11, 2025, 6:00 p.m.

UPDATE: The degraded storage server was recovered around 11:50a. We are continuing to monitor the ACES filesystem for any further issues. We are currently seeing degradation on one of the Lustre storage servers. This is leading to slow filesystem access and impacting the responsiveness of the Slurm job scheduler. We will update once the issue is resolved.

Posted: March 20, 2026

SDSC Expanse Lustre filesystem OSS issue [Resolved]

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: July 8, 2025, 1:00 a.m.

End Date: July 8, 2025, 5:00 p.m.

>>> Update Dear Expanse User, We resolved the Lustre OSS issues yesterday and have been monitoring the filesystem for any further issues. Separately this morning (7/9/2025) we had a short disruption of Slurm job submissions due to a down system service and that has also been resolved. Thanks SDSC User Services >>> Dear Expanse User, We are currently seeing high load and timeouts on one of the Expanse Lustre filesystem object storage servers (OSSs). This might lead to access issues on files that are striped onto storage targets on this OSS. We are looking into the issue and will update once it is resolved. Thanks SDSC User Services Staff

Posted: March 20, 2026

ACES Lustre filesystem issues

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: July 1, 2025, 8:00 p.m.

End Date: July 2, 2025, 12:05 a.m.

We are currently seeing degradation on one of the Lustre storage servers. This is leading to slow filesystem access and impacting the responsiveness of the Slurm job scheduler. We will update once the issue is resolved. The Lustre filesystem has been recovered. We monitoring the storage for any further issues.

Posted: March 20, 2026

Premium and Enterprise plans for all Atlassian products will see full rollout of new navigation on July 7.

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: tickets.access-ci.org

Start Date: June 25, 2025, 6:00 p.m.

End Date: July 7, 2025, 1:00 p.m.

Premium and Enterprise plans for all Atlassian products will seefull rollout of new navigation on July 7 (https://support.atlassian.com/navigation/docs/manage-the-navigation-rollout/).

Posted: March 20, 2026

idp.access-ci.org Updated

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: identity.access-ci.org

Start Date: June 23, 2025, 6:00 p.m.

End Date: June 23, 2025, 6:30 p.m.

On June 23, 2025, the ACCESS Identity Provider (https://idp.access-ci.org/idp) (idp.access-ci.org) was updated to address several critical Tomcat vulnerabilities (https://www.cert-in.org.in/s2cMainServlet?pageid=PUBVLNOTES01&VLCODE=CIVN-2025-0129).

Posted: March 20, 2026

Delta/DeltaAI full system outage 6/18

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: June 18, 2025, 12:00 p.m.

End Date: June 19, 2025, 6:00 p.m.

Underlying network configuration issues encountered during maintenance are resolved and the systems are back in service. ---- The network maintenance on Delta encountered an issue with the network that was discovered at final checkout. The Delta login nodes will be available but could become unavailable at any time. The maintenance reservation will prevent the scheduler from running jobs. ---- Delta and DeltaAI users, ALL Delta and DeltaAI services will be down next Wednesday, 6/18, from 7AM to 7PM central time. NO Delta or DeltaAI services will be available during the outage including: logins, computes, data transfer nodes, Open OnDemand. During the outage the core high-speed network will have a software upgrade and reconfiguration to fully integrate the last compute hardware added to the system into the proper intended configuration. This upgrade will address some underlying issues in the network fabric to improve its performance and reliability, but does not include software changes on the clients so is expected to be transparent to users. If you have questions please open a ticket with https://help.ncsa.illinois.edu/ or help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu) <mailto:help@ncsa.illinois.edu> The Delta Project

Posted: March 20, 2026

SDSC Expanse Lustre filesystem issues

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: June 18, 2025, 8:00 a.m.

End Date: June 18, 2025, 4:00 p.m.

Dear Expanse User, We are currently seeing connectivity issues to one of the Lustre filesystem object storage servers (OSSs). This is leading to timeouts and access issues for files that are striped onto this OSS. We will update once the issue is resolved. Thanks SDSC User Services Staff

Posted: March 20, 2026

Bridges-2 and Neocortex Maintenance Monday, June 16, 2025

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org, neocortex-sdflex.psc.access-ci.org

Start Date: June 16, 2025, 1:00 p.m.

End Date: June 16, 2025, 11:00 p.m.

Bridges-2, including all VMs and filesystems, as well as Neocortex, will be unavailable due to scheduled maintenance starting on Monday June 16, 2025 at 8am Eastern Time. We anticipate that the system will return by 6pm Eastern Time. During this time, you will be unable to access the system. The slurm queue will be preserved and queued jobs will begin running once the machine has returned to service. Please direct any questions to help@psc.edu (mailto:help@psc.edu) and our team will be happy to assist you. Thank you, PSC

Posted: March 20, 2026

Anvil power outage

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org

Start Date: June 14, 2025, 4:30 p.m.

End Date: June 14, 2025, 7:40 p.m.

Update 3:40PM, EDT June 14, 2025 Our engineers have brought Anvil back to full service. If you have any questions about this outage, please submit a ticket through ACCESS Help Desk at https://support.access-ci.org/help-ticket. Thank you. Original Post: Shortly after 12:30pm June 14, 2025, we had a major power outage at our data center at Purdue. Anvil has been impacted and will be offline until power is resumed. Our engineering team is working closely with campus power engineers to bring the power and Anvil back. We apologize for any inconvenience it might have caused. There is no ETA yet, but we will provide an update as soon as we have one. Please submit a ticket through ACCESS Help Desk at https://support.access-ci.org/help-ticket if you have any questions.

Posted: March 20, 2026

Decommission ACCESS User Registry LDAP Servers

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: registry.access-ci.org

Start Date: June 11, 2025, 1:00 p.m.

End Date: June 11, 2025, 2:00 p.m.

On June 11, 2025, the LDAP servers previously used by the ACCESS User Registry (https://registry.access-ci.org/) will be decommissioned. No downtime is expected. (For previous related news, see https://operations.access-ci.org/node/842 .) On April 9, 2025, the ACCESS SSH Pubkey Downloader was reconfigured to use DynamoDB instead of LDAP. On April 24, 2025, the LDAP provisioner was removed from the ACCESS User Registry (https://registry.access-ci.org/) in favor of storing user attributes in DynamoDB (https://aws.amazon.com/dynamodb/). Thus, the ACCESS LDAP servers are no longer necessary. The LDAP servers for the DEV (https://registry-dev.access-ci.org/ (https://registry-dev.access-ci.org/))) and TEST (https://registry-test.access-ci.org/ (https://registry-test.access-ci.org/))) user registries have already been decommissioned. This final step will decommission the LDAP servers for the PROD user registry (https://registry.access-ci.org/ (https://registry.access-ci.org/))).

Posted: March 20, 2026

Bridges-2 Unscheduled Maintenance, Wednesday June 11, 2025

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org

Start Date: June 11, 2025, 5:01 a.m.

End Date: June 11, 2025, 5:05 p.m.

Bridges-2 and Neocortex have returned to service shortly after noon Eastern time. =============================================================== At around Midnight Eastern time, the Pittsburgh Supercomputing Center machine room experienced a power disruption which caused many of the systems to fail. Our team is working to restore all systems to full capacity. Jobs which failed due to the incident will be automatically refunded. Thank you for your patience while we work through this issue. If you have any specific quesions or problems, please contact us by sending email to help@psc.edu

Posted: March 20, 2026

[Resolved] Anvil Network Interruption

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org

Start Date: June 10, 2025, 2:00 p.m.

End Date: June 10, 2025, 11:30 p.m.

Starting at 10:00 AM EST, Anvil began experiencing a network interruption. This may impact access to services such as Open OnDemand and Remote Desktop. Additional impacts are still being assessed. At this time, we do not have an estimated time for service restoration. We will provide updates as more information becomes available. Thank you for your patience and understanding. Updates: The issue was resolved at 7:30 PM EST on June 10. All services should now be functioning normally. If you still experience any problem, please feel free to reach out Anvil support team through ACCESS Help Desk (https://support.access-ci.org/help-ticket).

Posted: March 20, 2026

ACES Maintenance - June 4-5

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: June 4, 2025, 2:00 p.m.

End Date: June 5, 2025, 5:00 p.m.

The ACES cluster will be unavailable during maintenance from 9am to 8pm CDT on Wednesday June 4. A reservation is in place to prevent jobs from running past the start time of the maintenance period. The maintenance is currently extended to 12pm CDT Thursday.

Posted: March 20, 2026

SDSC Machine room power outage [Expanse, Voyager returned to production]

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: May 31, 2025, 9:30 p.m.

End Date: June 2, 2025, 6:30 a.m.

>>>>> Update 1 Dear Expanse User, Expanse was put back in production after recovery from the power outage we had in the machine room (and UCSD wide). The machine is available for use and running jobs. Voyager has also been brought back into production. Thanks SDSC User Services Staff >>>>> There was a power outage at UCSD that impacted the SDSC machine room. The systems at SDSC (Expanse, Voyager) are currently down. We will update once they are brought up and accessible.

Posted: March 20, 2026

Upcoming Changes to the Ticketing System Portal Forms

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: tickets.access-ci.org

Start Date: May 29, 2025, 2:00 p.m.

End Date: June 5, 2025, 2:00 p.m.

Hi Everyone, The ACCESS Operations team is working on optimizing the ticketing system by streamlining the dropdown options available on portal forms. These improvements are intended to enhance the user experience and make ticket categorization more intuitive. We are planning to roll out these changes on June 6, and the implementation window will take place from May 29 to June 5. During this period, there may be minor outages or brief disruptions. Updated Operational Support Issues: - Allocations (includes AMIE, XRAS, etc.) - Support (includes OnDemand, Pegasus, Knowledge Base, Affinity Groups, Events, Announcements, Ask.CI, etc.) - Security and Authentication (includes IAM, policies, etc.) - Networking and Data Transfer (includes SSL, DNS, CONECTnet, etc.) - Operations Infrastructure (includes monitoring, logging, GitHub, etc. - Operations Software and Online Services (includes CiDeR, portal, dashboard, API, etc.) - Metrics - ACCESS Communications and Collaboration Tools - Ticket System - Resource Integration - Some Other Question Please note that minor updates to queues and watcher groups will also be included as part of this change. We appreciate your patience and understanding as we work to improve the system. We'll keep you informed with any further updates. Thanks & Regards, Dinuka (On Behalf of ACCESS Operations)

Posted: March 20, 2026

SDSC Expanse Lustre filesystem issues (update)

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: May 25, 2025, 10:45 p.m.

End Date: May 29, 2025, 12:00 a.m.

>>> Update #3 Dear Expanse User, We have remounted the Lustre filesystem on Expanse excluding the OST that is having problems. This should prevent the hung tasks that were causing higher loads on the login nodes. We continue to work on the OST with issues and will update once it is returned to service. In the interim any old files that were striped onto the OST will fail on reads. New I/O to the filesystem will target healthy OSTs. Thanks SDSC User Services Staff >>> Update #2 Dear Expanse User, We are still working on the one OST from the Expanse Lustre filesystem that is failing to mount. This is making all files/directories that are striped onto this OST unavailable. Please note that this will also cause full file listings to hang so please refrain from doing full metadata listings on Lustre directories. All other OSTs on the filesystem are usable and new I/O will automatically avoid the problem OST. We will update once the OST with issues is restored. Thanks SDSC User Services Staff >>> Update Dear Expanse User, We brought the two OSSs online last night but there is still one storage target on one of them that needs more work to recover. We are continuing to look at the issue and will update again once the filesystem is back. Thanks SDSC User Services >>> Dear Expanse User, We are currently seeing problems with two object storage servers (OSSs) that are part of the Expanse Lustre filesystem. This is causing access issues on files that are striped on these servers. Please refrain from doing full metadata listings on Lustre directories as chances are you will access a file that is on one of the OSSs and the commands might hang. We are working on resolving the problem and will update once the OSSs are back in service. Thanks SDSC User Services Staff

Posted: March 20, 2026

All Jira products experiencing degradations

Published

Infrastructure News Type: Degraded

Affected Infrastructure: tickets.access-ci.org

Start Date: May 20, 2025, 5:30 p.m.

End Date: May 23, 2025, 1:30 p.m.

Atlassian is investigating cases of degraded performance for all Jira Work Management, Jira Service Management, Jira, and Jira Product Discovery Cloud customers. Check theirstatus page (https://jira-service-management.status.atlassian.com/incidents/nmg7dw0vwtr7) for current info.

Posted: March 20, 2026

Launch Maintenance - May 7

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: launch.tamu.access-ci.org

Start Date: May 7, 2025, 1:00 p.m.

End Date: May 7, 2025, 10:00 p.m.

The TAMU Launch cluster will be down for maintenance on Wednesday, May 7 from 8:00AM CDT to 5:00PM CDT.

Posted: March 20, 2026