Node owl2 reaches high temperatures due to slow fans #70

Open
opened 2024-07-19 16:56:29 +02:00 by rarias · 2 comments
Owner

The node owl2 is reaching a high temperature when running some workloads (in yellow):

image

This seems to be caused by the slow speed of the fans, which tops at around 5000 RPM, instead of increasing the speed to the same level of other nodes, at around 8000 RPM (the maximum speed is 20000 RPM).

After increasing the lower critical limit of one fan, the BMC puts all fans at maximum speed, which drastically reduces the temperature, but it wastes a lot of power and is not required:

image

I suspect there may be a configuration in the BIOS for this node to make it more agresive with the cooling, but I haven't tested it: https://www.intel.com/content/www/us/en/support/articles/000060057/server-products/server-boards.html

The node owl2 is reaching a high temperature when running some workloads (in yellow): ![image](/attachments/e479d7af-adf2-40f7-8d0e-8397ed438c29) This seems to be caused by the slow speed of the fans, which tops at around 5000 RPM, instead of increasing the speed to the same level of other nodes, at around 8000 RPM (the maximum speed is 20000 RPM). After increasing the lower critical limit of one fan, the BMC puts all fans at maximum speed, which drastically reduces the temperature, but it wastes a lot of power and is not required: ![image](/attachments/7d7641ed-07cb-4a43-accb-2e959db88b38) I suspect there may be a configuration in the BIOS for this node to make it more agresive with the cooling, but I haven't tested it: https://www.intel.com/content/www/us/en/support/articles/000060057/server-products/server-boards.html
rarias added the
hw
label 2024-07-19 16:56:29 +02:00
Author
Owner

So, this seems to be the problem:

/------------------------------------------------------------------------------\
|                System Acoustic and Performance Configuration                 |
\------------------------------------------------------------------------------/

   Set Fan Profile            <Acoustic>                 [Performance] - Fan
   Fan PWM Offset             [0]                        control provides
                                                         primary system
                                                         cooling before
                                                         attempting to
                                                         throttle memory.
                                                         [Acoustic] - The
                                                         system will favor
                                                         using throttling of
                                                         memory over boosting
                                                         fans to cool the
                                                         system if thermal
                                                         thresholds are met.



/------------------------------------------------------------------------------\
|                         F10=Save Changes          F9=Reset to Defaults       |
| ^v=Move Highlight       <Enter>=Select Entry      Esc=Exit                   |
\-----------------Copyright (c) 2010-2016, Intel Corporation-------------------/

It was configured as "Acoustic". Let's try switching it to Performance, and see if it helps. The PWM offset can also be increased otherwise:

/------------------------------------------------------------------------------\
|                System Acoustic and Performance Configuration                 |
\------------------------------------------------------------------------------/

   Set Fan Profile            <Acoustic>                 Valid Offset 0-100.
   Fan PWM Offset             [0]                        This number is added
                                                         to the calculated PWM
                                                         value to increase Fan
                                                         Speed











/------------------------------------------------------------------------------\
| +/- =Adjust Value       F10=Save Changes          F9=Reset to Defaults       |
| ^v=Move Highlight       <Enter>=Select Entry      Esc=Exit                   |
\-----------------Copyright (c) 2010-2016, Intel Corporation-------------------/
So, this seems to be the problem: ``` /------------------------------------------------------------------------------\ | System Acoustic and Performance Configuration | \------------------------------------------------------------------------------/ Set Fan Profile <Acoustic> [Performance] - Fan Fan PWM Offset [0] control provides primary system cooling before attempting to throttle memory. [Acoustic] - The system will favor using throttling of memory over boosting fans to cool the system if thermal thresholds are met. /------------------------------------------------------------------------------\ | F10=Save Changes F9=Reset to Defaults | | ^v=Move Highlight <Enter>=Select Entry Esc=Exit | \-----------------Copyright (c) 2010-2016, Intel Corporation-------------------/ ``` It was configured as "Acoustic". Let's try switching it to Performance, and see if it helps. The PWM offset can also be increased otherwise: ``` /------------------------------------------------------------------------------\ | System Acoustic and Performance Configuration | \------------------------------------------------------------------------------/ Set Fan Profile <Acoustic> Valid Offset 0-100. Fan PWM Offset [0] This number is added to the calculated PWM value to increase Fan Speed /------------------------------------------------------------------------------\ | +/- =Adjust Value F10=Save Changes F9=Reset to Defaults | | ^v=Move Highlight <Enter>=Select Entry Esc=Exit | \-----------------Copyright (c) 2010-2016, Intel Corporation-------------------/ ```
Author
Owner

Setting it to performance, makes the fans stay at 10000 RPM and the temperature is fine under load:

image

Setting it to performance, makes the fans stay at 10000 RPM and the temperature is fine under load: ![image](/attachments/2884b3df-635d-4fd4-ac39-8e78bc64a6b4)
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#70
No description provided.