Setup emergency shutdown for AC failure #87

Open
opened 2025-02-24 12:41:06 +01:00 by rarias · 0 comments
Owner

Since 2025-02-23 at 20:00 the temperature of the compute room has been increasingly from the usual 21 degress to 30 in 11 hours, as can be seen in the front air temperature graph:

image

https://jungle.bsc.es/grafana/d/EKkKVvLVz/hut?orgId=1&from=2025-02-22T15:16:38.091Z&to=2025-02-24T11:36:02.906Z&timezone=browser&refresh=5s

As the temperature of the nodes has been kept under the limit, no alarm has fired. We should add an emergency mechanism that reports a problem with the AC and shuts down the nodes to prevent temperature damage to components.

No report has been issued from UPC/BSC.

Since 2025-02-23 at 20:00 the temperature of the compute room has been increasingly from the usual 21 degress to 30 in 11 hours, as can be seen in the front air temperature graph: ![image](/attachments/03cd959f-dd18-4fc4-b420-e0b763f23d36) https://jungle.bsc.es/grafana/d/EKkKVvLVz/hut?orgId=1&from=2025-02-22T15:16:38.091Z&to=2025-02-24T11:36:02.906Z&timezone=browser&refresh=5s As the temperature of the nodes has been kept under the limit, no alarm has fired. We should add an emergency mechanism that reports a problem with the AC and shuts down the nodes to prevent temperature damage to components. No report has been issued from UPC/BSC.
204 KiB
rarias added the
hw
label 2025-02-24 12:41:06 +01:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#87
No description provided.