Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / Check and Monitor CPU Temperature

Check and Monitor CPU Temperature

Publication Date: 27 Sep 2024
WHAT?

Step-by-step explanation of how to configure CPU temperature monitoring.

WHY?

You want to reduce your electricity bill and make sure that the hardware runs optimally.

EFFORT

10 minutes to read the article and 20 minutes to install and configure the required tool.

GOAL

Put in place a mechanism for checking and monitoring CPU temperature.

Requirements
  • Root permissions to install the required package

  • The package sensors

1 Introduction

Checking and monitoring CPU temperature has several benefits.

  • Energy savings and cost reduction. When a CPU runs at full speed, it consumes more energy than when it is idling. Also, running CPUs cool is a critical cost factor, especially in data centers.

  • Identifying and monitoring processes that consume too much CPU power. Doing that can help to free your CPU resources and increase the CPU's responsiveness.

  • Easier detection of cooling issues. If the CPU temperature reaches 80°C or higher, it indicates that there is a problem with the cooling system or the fan, or that the thermal paste was not applied correctly.

  • A long-term reduction of the carbon footprint can be achieved by adjusting the cooling parameters.

Note
Note: Specific architectures

The sensors package is available on all architectures except IBM Z.

2 Installing and configuring hardware sensors

To measure the CPU temperature, install and configure the sensors tool that can access and read the hardware sensors.

  1. Install the required package:

    > sudo zypper install sensors
  2. To detect all the sensors in the system, run the following command as root:

    > sudo sensors-detect --auto

    The --auto option allows checking for all hardware monitoring chips at once without probing them one by one. When finished, the script shows a summary of what chips were detected:

    Now follows a summary of the probes I have just done.
    
    Driver `coretemp':
     * Chip `Intel digital thermal sensor' (confidence: 9)
    
    Driver `to-be-written':
     * ISA bus, address 0xa40
       Chip `ITE IT8686E Super IO Sensors' (confidence: 9)
    
    Do you want to generate /etc/sysconfig/lm_sensors? (YES/no):
  3. Confirm to generate the file /etc/sysconfig/lm_sensors. After confirmation, the script creates a systemd service (/usr/lib/systemd/system/lm_sensors.service.) that is enabled by default.

Check the status of the systemd service:

> sudo systemctl status lm_sensors
● lm_sensors.service - Initialize hardware monitoring sensors
  Loaded: loaded (/usr/lib/systemd/system/lm_sensors.service; enabled; vendor preset: disabled)
  Active: active (exited) since Fri 2021-09-10 16:57:55 CEST; 2min 23s ago
 Process: 32552 ExecStart=/usr/bin/sensors -s (code=exited, status=0/SUCCESS)
 Process: 32551 ExecStart=/sbin/modprobe -qab $BUS_MODULES $HWMON_MODULES (code=exited, status=0/SUCCESS)
Main PID: 32552 (code=exited, status=0/SUCCESS)
   Tasks: 0
  CGroup: /system.slice/lm_sensors.service

Sep 10 16:57:55 edison systemd[1]: Starting Initialize hardware monitoring sensors...
Sep 10 16:57:55 edison systemd[1]: Started Initialize hardware monitoring sensors.

After you have completed these steps, your computer has detected all sensors and has started to monitor them.

3 Getting temperature data

To obtain a snapshot of the current temperature, run the following command:

> sensors
 [...]
 nvme-pci-0700 1
 Adapter: PCI adapter 2
 Composite:    +36.9°C  (low  = -273.1°C, high = +83.8°C)3
                        (crit = +83.8°C)
 Sensor 1:     +36.9°C  (low  = -273.1°C, high = +65261.8°C)4
 Sensor 2:     +43.9°C  (low  = -273.1°C, high = +65261.8°C)56
 
 Adapter: ACPI device
 temp1:        +16.8°C  (crit = +18.8°C)
 temp2:        +27.8°C  (crit = +119.0°C)
 temp3:        +29.8°C  (crit = +119.0°C)
 
 coretemp-isa-0000
 Adapter: ISA adapter
 Package id 0:  +43.0°C  (high = +82.0°C, crit = +100.0°C)
 Core 0:        +41.0°C  (high = +82.0°C, crit = +100.0°C)
 Core 1:        +41.0°C  (high = +82.0°C, crit = +100.0°C)
 Core 2:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
 Core 3:        +41.0°C  (high = +82.0°C, crit = +100.0°C)
 Core 4:        +41.0°C  (high = +82.0°C, crit = +100.0°C)
 Core 5:        +40.0°C  (high = +82.0°C, crit = +100.0°C)

1

Specific hardware component or sensor chip being monitored.

2

The descriptive name for the specific sensor on the chip.

3

Aggregate temperature measurement from several sensors. The low = -273.1°C, high = +83.8°C means the sensor reading should ideally be within this range. If the temperature goes above 83.8 degrees Celsius (crit = +83.8°C), it is deemed critical and may cause hardware issues.

4

This is a stand-alone sensor on the motherboard that is currently reading at 36.9 degrees Celsius.

5

This is another stand-alone sensor on the motherboard reading at 43.9 degrees Celsius.

6

The value of +65261.8°C is a placeholder or a default maximum value, indicating that the sensor is not programmed to measure temperatures above that level. Since the actual reading (+36.9°C) is far below this value, we can ignore the anomalously high maximum.

Note
Note: Output depends on the type of hardware

The output of the sensors command depends on the type of hardware installed on your machine, as different hardware components have different sensors.

4 Monitoring CPU temperature in real time

To monitor the temperature in real time, run the watch command:

> watch sensors

The watch command is a built-in Linux utility that runs user-defined commands at regular intervals. Its combination with sensors is useful if you need to keep an eye on your system's temperatures or voltages. The result looks as follows:

Every 2.0s: sensors                                                 

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +56.0°C
  
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +57.8°C
    
amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:       +0.73 V
vddnb:        +0.74 V
edge:         +50.0°C
PPT:           0.00 W

By default, the watch command updates the output every two seconds. You can change this interval by using the -n option followed by the number of seconds. For example, to change the interval to 5 seconds, use:

> watch -n 5 sensors

Press CtrlC to stop the watch command.

5 Troubleshooting

This part describes potential problems when monitoring CPU temperatures and their solutions.

5.1 No sensors were detected

On laptops, the sensors-detect command may provide the following output:

Sorry, no sensors were detected.
This is relatively common on laptops, where thermal management is
handled by ACPI rather than the OS.

This message is displayed when sensors-detect cannot find any hardware sensors on your laptop because most laptops handle thermal management through ACPI (Advanced Configuration and Power Interface), not the operating system.

Note
Note: The sensors command

Despite the message about the failure to detect sensors, the sensors command may still work and provide expected results.

You can check the CPU temperature using the tools that read from the ACPI interface.

  1. Check if the acpi package is installed. This package provides an interface for the hardware's embedded controller via ACPI, allowing you to check battery status, thermal zone temperature, and more. To install, run the command:

    > sudo zypper install acpi
  2. Check the CPU temperature directly from the /sys file system. The CPU temperature is located in /sys/class/thermal/thermal_zone*/temp. Below is an example of the command with its output:

    > cat /sys/class/thermal/thermal_zone*/temp
    41000

    The temperature is displayed in milliCelsius. To get the temperature in Celsius, divide the output by 1000 to get, in our example, 41°C.

For more information about ACPI, refer to https://documentation.suse.com/sles/html/SLES-all/cha-power-mgmt.html#sec-power-mgmt-acpi.

Note
Note: ACPI may not be available on mainframes

Mainframes do not have the same power management needs as desktops, laptops and servers, and so they do not typically use ACPI. Instead, mainframes use different architectures and technologies for their configuration and management.

5.2 The displayed temperatures are unrealistic

If you suspect that the displayed temperature is too low or too high, you can try the following:

  • Check whether the sensors are detected correctly: Rerun the sensors-detect command to redetect the sensors.

    > sudo sensors-detect

    Then, run the sensors command again to see if the temperature readings are more realistic.

  • Check the raw thermal data in the /sys/class/thermal/ directory. See whether the raw data matches the output of the sensors command.

    > cat /sys/class/thermal/thermal_zone*/temp
  • Use a different tool to read the CPU temperature, for example, Hardinfo, which is a system profiler and benchmark tool. It can gather information about your system's hardware and operating system, perform benchmarks, and generate printable reports. It can also show the CPU temperature. To install Hardinfo, use the following commands:

    > sudo zypper install hardinfo

    Then, you can launch Hardinfo from the app menu.

If none of these recommendations solves the issue, the problem might be due to unsupported or faulty hardware. In this case, you need to seek help from your hardware manufacturer.

5.3 The displayed temperature is too high

If the CPU temperature is too high, here are actions you can take:

  • Verify that the CPU cooling system, such as the fan or heat sink, works correctly. Ensure that the fan is spinning properly and that the heat sink is making proper contact with the CPU. If necessary, you may need to replace the thermal paste between the CPU and the heat sink to improve heat transfer.

  • Adjust the power settings on your system to reduce heat generation. Lowering the CPU frequency or enabling power-saving features can help keep the temperature in check. For more information about lowering CPU frequency, see https://documentation.suse.com/sbp/all/single-html/SBP-performance-tuning/index.html#sec-cpupower-tool.

  • Monitor the system load and CPU usage. High CPU usage for extended periods can lead to increased temperatures. Identify any resource-intensive processes and consider optimizing or limiting their usage. For more information, refer to https://www.suse.com/support/kb/doc/?id=000016916.

6 For more information