[Letux-kernel] [RFC] ARM: dts: omap36xx: Enable thermal throttling

Fri Sep 13 22:11:13 CEST 2019

On 13/09/2019 20:46, Adam Ford wrote:
> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
> <daniel.lezcano at linaro.org> wrote:
>>
>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
>>
>> [ ... ]
>>
>>>> Good news (I think)
>>>>
>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
>>>> frequency and it returned 600MHz.
>>>>
>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
>>>> 58500
>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
>>>> 300000 600000 800000
>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
>>>> scaling_max_freq  scaling_min_freq
>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
>>>> 600000
>>>
>>> looks good!
>>> But we have to understand what the <&cpu 1 2> exactly means...
>>>
>>> Hopefully someone reading your RFCv2 can answer...
>>
> Daniel,
> 
> Thank you for replying.
> 
>> I may have missed the question :)
>>
>> These are the states allowed for the cooling device (the one you can see
>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
>> inverted for cpufreq, that can be confusing.
> 
> I think that's what has be confused.
> 
>>
>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
>> when the mitigation begins the cooling device state is 0 and then the
>> thermal governor increase the state until it sees a cooling effect.
>>
>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
>> temperature increases.
> 
> I am not sure I know what you mean by 'state' in this context.

A thermal zone is managed by the thermal framework as the following:
 - a sensor
 - a governor
 - a cooling device

The governor gets the temperature via the sensor and depending on the
temperature it will increase or decrease the cooling effect of the
cooling device. With a fan, that means it will increase or decrease its
speed. With cpufreq, it will decrease or increase the OPP.

These are discrete values the governor will use to set the cooling
effect. The state is one of these value (the current speed or the
current OPP index).

Depending on the cooling device, the number of states are different.

In the context above, the fan cooling device can be stopped (state=0),
running (state=1), running faster (state=2).

As the node tells to use no more than 2, then the governor will never go
to running much faster (state=3). (That's an example).

>> When the cooling driver is able to return the number of states it
>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
>> governor to find the balance point.
> 
> If the cooling driver is using cpufreq, is the number of supported
> states equal to the number of operating points given to cpufreq?

Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done
most of the cases). Otherwise it will use the boundaries set in <&cpu
limit_low limit_high>

When changing the limits, a state=1 has a different meaning.

For example: 7 OPPs available

<&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7]

<&cpu 0 2> : state=[0..2] (1, 2)

<&cpu 5 7> : state=[0..3] (5, 6, 7)

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334

>> Now if the cooling device is cpufreq, the state order is inverted,
>> because the cooling effects happens when decreasing the OPP.
>>
>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
>> etc.
> 
> I am not sure how the state would be set to 2.

That is a governor decision. Let me give an example with a hikey960
board which has very fast temperature transitions, so it is simpler to
illustrate the behavior. The trip point is 75°C.

Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max
(2.36GHz), as the temperature is still under 75°C, there is no
mitigation yet, so the cooling device state is 0.

In a very few seconds the temperature reaches 75°C, that trigger the
monitoring of the thermal zone and the mitigation begins, then the
temperature continues to increase very quickly to 78°C, the governor see
we are above the trip point and increment the cooling device state
(state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz.

The governor continues to read the temperature and see the temperature
is still increasing (even if it is that happens more slowly), so it
increases the state again (state=>2). That leads to an OPP change from
2.11GHz to 1.8GHz.

The governor continues to read the temperature and see the temperature
decrease, it does nothing.

The governor continues to read the temperature, see the temperature
decreases and is below 75°C, it decrease the state (state=>1), the OPP
change to 2.36GHz.

The temperature then increases, etc ...

Actually the governors do more than that but it is for the example.

So it is a bad idea to set boundaries for the cooling device state as
that may prevent the governor to take the right decision for the cooling
effect. Imagine in the example above, we set the max state to 1 for the
cooling device, that would mean the governor won't be able to stop the
temperature increasing, thus ending up to a hard reboot.

>> Now the different combinations:
>>
>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
>> 0 to 7.
>>
>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
> 
> What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
> <&cpu 0 2> ?
> (if there is any)

There is no difference.

>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
>> always the cooling effect as the governor won't set it to zero thus
>> stopping the mitigation.
> 
> For the purposes of the board in question, we have 4 operating points,
> 300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
> them to cease operation at 800MHz and 1GHz and only permit operation
> at 300MHz and 600MHz.  I am going under the assumption that the cpu
> index[0] would be for 300MHz, index[1] = 600MHz, etc.
> 
> If I am interpreting your comment correctly, I should set <&cpu
> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
> to 600MHz and not exceed, is that correct?

Nope, it will mean the cooling device can only reduce to 800MHz and to
600MHz to mitigate.

Actually the thermal framework neither the kernel are designed to handle
this case. They assume the OPPs are stable whatever the thermal situation.

That is the reason why I think it is a very interesting use case because
it introduces a temperature constraint in addition to a duration for a
certain OPP. IMO, that could be an extension of the turbo-mode.

With what we have now, I doubt it is feasible.

The best we can do is preventing to reach the 90°C, so we remove the OPP
temperature constraint. I suppose 85°C is a safe temperature to stick on.

And in order to let the governor have free hand.

<&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>

I don't think that will have a significant impact on performances
compared to be able to run at a higher temperature with less OPPs.

>> Does it clarify the DT spec?
>>
> 
> I think your reply to my inquiry might.  If possible, it would be nice
> to get this documented into the bindings doc for others in the future.
> I can do it, but someone with a better understanding of the concept
> maybe more qualified.  I can totally understand why some may want to
> integrate this into their SoC device trees to slow the processor when
> hot.
> 
> Thank you for taking the time to review this.  I appreciate it.
> 
> adam
> 
>>
>>
>>
>>> What happens with trip point 60000?
>>> (unfortunately one has to reboot in between or can you kexec between two kernel/dtb versions?)
>>>
>>> BR,
>>> Nikolaus
>>>
>>
>>
>> --
>>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog