[Letux-kernel] thermal madness

H. Nikolaus Schaller hns at goldelico.com
Sat Sep 14 10:22:51 CEST 2019


Hi,

> Am 13.09.2019 um 22:27 schrieb Andreas Kemnade <andreas at kemnade.info>:
> 
> On Fri, 13 Sep 2019 21:51:40 +0200
> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> 
>>> Am 13.09.2019 um 21:44 schrieb Andreas Kemnade <andreas at kemnade.info>:
>>> 
>>> Hi,
>>> 
>>> I was experimenting a bit a bit with the thermal:
>>> 
>>> fresh after rebooting and autoidling uarts and loading some modules I
>>> made the letux3704 device consume 32mA, so I expect the temperature to
>>> being low.  
>> 
>> Indeed. I usually have the GTA04 up and running with X11 etc. so it is
>> significantly warmer.
>> 
>>> Reading the thermal gives this:
>>> 
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 58500
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 47000
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 47000
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 47000
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 47000
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 47000
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 48500
>>> root@(none):/# cat /sys/devices/virtual/thermal/thermal_zone0/temp 
>>> 48500
>>> 
>>> That is just the opposite to what Nikolaus was getting. Here it jumps
>>> down instead of up and stays stable.  
>> 
>> Oops!!!
>> 
>>> My conclusion: the measurements are buffered somewhere/somehow and we
>>> are getting something old here.   
>> 
>> Or there is some other bug in the code...
>> 
>> I have tried to understand the code a little but it just reads some
>> registers... And translates ADC values to celsius.
>> 
>> And, there is some feature to handle temperature trends. This seems
>> to read multiple registers.
>> 
>> And in some case it may not be possible to read a value and then
>> it returns a previous one.
>> 
>> Hm. What if that situation is true for the first read? But the
>> previous is random?
>> 
>> Another test: I also did run the first cat command in a loop
>> 
>> for i in 1 2 3 4 5 6 7 8 9 10
>> do
>> 	cat /sys/devices/virtual/thermal/thermal_zone0/temp
>> 	sleep 0.1
>> done
>> 
> some more testing here:
> root@(none):/# cpufreq-info 
> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
> Report errors and bugs to cpufreq at vger.kernel.org, please.
> analyzing CPU 0:
>  driver: cpufreq-dt
>  CPUs which run at the same hardware frequency: 0
>  CPUs which need to have their frequency coordinated by software: 0
>  maximum transition latency: 300 us.
>  hardware limits: 300 MHz - 1000 MHz
>  available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz
>  available cpufreq governors: conservative, userspace, powersave, ondemand, performance
>  current policy: frequency should be within 300 MHz and 1000 MHz.
>                  The governor "ondemand" may decide which speed to use
>                  within this range.
>  current CPU frequency is 600 MHz (asserted by call to hardware).
>  cpufreq stats: 300 MHz:94.86%, 600 MHz:2.49%, 800 MHz:0.92%, 1000 MHz:1.73%  (51)
> root@(none):/# cd /sys/bus/platform/drivers/omap_uart/
> _delay_ms ; echo 3000 >$name ; doners/omap_uart# for name in */power/autosuspend 
> root@(none):/sys/bus/platform/drivers/omap_uart# for name in */power/autosuspen
> _delay_ms ; do echo 3000 >$name ; doners/omap_uart# for name in */power/autosuspend_                                                                                                          
> root@(none):/sys/bus/platform/drivers/omap_uart# 
> root@(none):/sys/bus/platform/drivers/omap_uart# cd
> w ot@(none):/# sleep 16 ; cat /sys/class/power_supply/bq27000-battery/current_now
> 32844
> rmal/thermal_zone0/temp ; sleep 0.1 ; done 8 9 ; do cat /sys/devices/virtual/the 
> 58500
> 47000
> 47000
> 48500
> 48500
> 48500
> 48500
> 48500
> 48500
> 48500
> root@(none):/# 
> 
> That is with
> commit d71fc15bce98abf24226f451f192df07ab9d089b
> We are seeing 1Ghz here on the letux3704 without any boost.

I have studied the TRM and we can also read the bandgap sensor through
devmem2 and that indeed indicates some strange effect by the driver

root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6fe7000.
Value at address 0x48002524 (0xb6fe7524): 0x38
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6f09000.
Value at address 0x48002524 (0xb6f09524): 0x38
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6fbd000.
Value at address 0x48002524 (0xb6fbd524): 0x38
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
58500
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6fb8000.
Value at address 0x48002524 (0xb6fb8524): 0x34
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6fc9000.
Value at address 0x48002524 (0xb6fc9524): 0x34
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6f90000.
Value at address 0x48002524 (0xb6f90524): 0x34
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6f1a000.
Value at address 0x48002524 (0xb6f1a524): 0x34
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# 

This time I had the temperature also going down!?

Well, this was a boot where not all modules were loaded and
there is no display (haven't checked why).

It looks as if only reading the thermal_zone twice makes the value
update:

root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6f48000.
Value at address 0x48002524 (0xb6f48524): 0x37
root at letux:~# ./temperatures 
Sat Sep 14 07:50:13 UTC 2019 57° 800MHz
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6f88000.
Value at address 0x48002524 (0xb6f88524): 0x37
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
57000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# /usr/bin/arm-linux-gnueabihf/devmem2 0x48002524
/dev/mem opened.
Memory mapped at address 0xb6fe6000.
Value at address 0x48002524 (0xb6fe6524): 0x34
root at letux:~# 

And if my ./temperatures script creates processor load that
increases the temp which goes down quickly

root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# ./temperatures 
Sat Sep 14 07:52:16 UTC 2019 52° 800MHz
root at letux:~# ./temperatures 
Sat Sep 14 07:52:19 UTC 2019 57° 800MHz
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
57000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
53500
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
53500
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
52000
root at letux:~#

The measured temperature with thermocamera of the surface of the PoP is ca. 30°.

This brings me to another idea: we could also read the value from
0x48002524 in U-Boot, maybe after manually triggering a conversion.

Well of course all kernel tests should be done with thermal throttling
turned off so that the govenors do not read the bandgap sensor all
the time.

BR,
Nikolaus




More information about the Letux-kernel mailing list