Data Centers: Managing Operational Risks

by John Jenkins

March 27, 2025

The boom in AI is being powered by unparalleled growth in data centers, and it’s essential that those facilities remain online and fully functional. Unfortunately, data centers face a lot of operational risks, including sudden power loss resulting in data corruption, damage to circuits and components and other hardware failures or malfunctions, cooling system interruptions, fires, and unplanned outages leading to extended service disruptions. This excerpt from a recent WTW memo offers recommendations on mitigating some data center operating risks:

– Backup power systems should be designed to match the reliability standards of hospitals or nuclear facilities, where electric supply reliability is similarly critical. Redundant power supplies should be incorporated to avoid single points of failure by using separately fed grid supplies from independent sources to minimize the occurrence of simultaneous supply failures. Emergency backup power sources, such as internal combustion engine-generators or simple cycle gas turbine generators, should be connected in parallel to the primary supply to ensure uninterrupted power during primary supply outages.

– Fire risk can be mitigated by minimizing the use of combustible materials in the design and using clean agents for fire suppression to avoid damaging data center equipment and following strict maintenance of electrical systems.

– Cooling systems should be designed with redundant, dual systems to prevent single points of failure. Uninterruptible Power Supplies (UPS) should be provided for short-term temporary power (less than an hour) to allow controlled shutdowns, minimizing data loss or equipment issues if both primary and backup power are lost.

– Regular testing of UPS and backup power systems should be conducted periodically, as well as audits of the local grid supplier. Continuous monitoring and alarming systems should be implemented both locally and offsite to ensure prompt response to any issues.

– Cybersecurity measures should be implemented, especially during construction and assembly when equipment is easily accessed, and applicable parts of NERC’s CIP cybersecurity standards should be considered. Finally, periodic cyber risk audits and ongoing cybersecurity training for staff should be conducted to ensure they are aware of the latest threats and best practices.