Problem with Overheating then reboot since Ubuntu 11.10
Contents
Symtoms
Since Ubuntu 11.10, my T500 has reduced battery life (around 3 hours with low screen power, 2 hours on normal usage) and reboots when CPU charge is important due to overheating (more than 100°). This is clearly a software bug, as I didn't have this behavior in Ubuntu 11.04. It appears since 11.10 (first ubuntu release with Linux kernel 3.x).
Trying to find a solution...
Fan control
To set your fan to max:
sudo rmmod thinkpad_acpi sudo modprobe thinkpad_acpi fan_control=1 echo "level 127" > /proc/acpi/ibm/fan
But it is not a problem with fan control. Whatever is the fan speed (disengaged and set manually to full speed with level 127) my thinkpad T500 still reboots after less than one minute of high CPU (I didn't have this problem before Ubuntu 11.10).
Temporary fix
The temperature does not exceed 75° when my laptop is on battery and so is not rebooting because of the overheating.
Devices power saving off
Tunables tab of powertop prints a lot of devices with important runtime (no economy energy mode ?):
Bad Enable SATA link power management for /dev/sda Bad NMI watchdog should be turned off Bad Power Aware CPU scheduler Bad VM writeback timeout Bad Enable Audio codec power management Bad Autosuspend for USB device Fingerprint Sensor [4-1] Bad Autosuspend for USB device USB Receiver (Logitech) Bad Autosuspend for USB device Android Phone (HTC) Bad Runtime PM for PCI Device Intel Corporation Mobile 4 Series Chipset Memory Controller Hub Bad Runtime PM for PCI Device Ricoh Co Ltd R5C832 IEEE 1394 Controller Bad Runtime PM for PCI Device Intel Corporation Mobile 4 Series Chipset MEI Controller Bad Runtime PM for PCI Device Intel Corporation 82567LF Gigabit Network Connection Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) HD Audio Controller Bad Runtime PM for PCI Device Ricoh Co Ltd xD-Picture Card Controller Bad Runtime PM for PCI Device Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter Bad Runtime PM for PCI Device Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter Bad Runtime PM for PCI Device Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller Bad Runtime PM for PCI Device Intel Corporation Ultimate N WiFi Link 5300 Bad Runtime PM for PCI Device Ricoh Co Ltd RL5c476 II Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 Bad Runtime PM for PCI Device Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 Bad Runtime PM for PCI Device Intel Corporation 82801I (ICH9 Family) PCI Express Port 1
Don't know if it could be a clue...
Turbo mode
Turbo mode of the processor is always running 25% of the time:
Package | CPU 0 Turbo Mode 24.4% | Turbo Mode 21.7% 2.81 Ghz 1.8% | 2.81 Ghz 1.6% 2.14 Ghz 0.9% | 2.14 Ghz 0.9% 1.60 Ghz 3.3% | 1.60 Ghz 3.3% 800 Mhz 57.5% | 800 Mhz 55.2% Idle 12.1% | Idle 17.4%
| CPU 1 | Turbo Mode 24.1% | 2.81 Ghz 1.8% | 2.14 Ghz 0.9% | 1.60 Ghz 3.2% | 800 Mhz 54.5% | Idle 15.5%
ASPM
Linux 3.X has a bug with ASPM with similar symptoms, but it not activated in my case:
$ dmesg | grep ASPM [ 0.160380] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
If I add pcie_aspm=force to kernel, I've the following output:
$ dmesg | grep ASPM [ 0.000000] PCIe ASPM is forcibly enabled [ 0.197865] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
Power Aware CPU scheduler
Changing cpu policy to powersave does not solve the problem.
ACPI OSI
Using switch 'acpi_osi=Linux' (as [here]) does not fix the problem.
ACPI
A kernel bug entry with similar behavior has been fixed: https://bugzilla.kernel.org/show_bug.cgi?id=42858
The patch available http://marc.info/?l=linux-acpi&m=132854533918079&w=2 in the comment number 5 of the bug report is already present in latest 12.04 kernel, so this is not the solution.
But strangely the symptoms seem very close, maybe the same bug is present in thinkpad-acpi ?
Data
Model | Bios version (latest ?) | Bug present in |
---|---|---|
T500 - type 2082 | 3.24 (latest) | Ubuntu 11.10 (32bits), Ubuntu 12.04 (32bits), ArchLinux (September 2012) |
References
- Bug Launchpad entry: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/751689
- Bug 42858 - thermal throttling doesn't seem to trigger on Thinkpad T420s: https://bugzilla.kernel.org/show_bug.cgi?id=42858
- Ask Ubuntu entry about the problem: http://askubuntu.com/questions/133030/overheating-and-reboot-with-ubuntu-11-10-and-12-04-on-thinpad-t500