[Letux-kernel] Strange things happening with latest kernels

H. Nikolaus Schaller hns at goldelico.com
Sat Jun 15 18:46:01 CEST 2019


Hi Andreas,

> Am 15.06.2019 um 18:16 schrieb Andreas Kemnade <andreas at kemnade.info>:
> 
> On Sat, 15 Jun 2019 17:16:20 +0200
> Andreas Kemnade <andreas at kemnade.info> wrote:
> 
>> On Thu, 13 Jun 2019 06:58:17 +0200
>> Andreas Kemnade <andreas at kemnade.info> wrote:
>> 
>>> On Wed, 12 Jun 2019 23:09:48 +0200
>>> Andreas Kemnade <andreas at kemnade.info> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> On Wed, 12 Jun 2019 20:35:42 +0200
>>>> Andreas Kemnade <andreas at kemnade.info> wrote:
>>>> 
>>>>> On Tue, 11 Jun 2019 21:21:22 +0200
>>>>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>>>> 
>>>>>> I think the latest kernels 4.19.49 and 5.4-rc2 have bugs...
>>>>>> 
>>>>>> A) 4.19.49 seems to make the OMAP3 run ca. 70% of the time
>>>>>> at 800MHz driving the GTA04A4 hot (have seen 90°) although
>>>>>> I have only 10% system load and not too many or strange
>>>>>> interrupts.
>>>>>> 
>>>>>> Idle modes broken?
>>>>>> 
>>>>>> B) 5.2-rc4 seems to have broken cpufreq-info (also GTA04A4)
>>>>>> 
>>>>> That looks interesting
>>>>> 
>>>>> [    3.140655] core: _opp_supported_by_regulators: OPP minuV: 1012500 maxuV: 1012500, not supported by regulator
>>>>> [    3.152709] cpu cpu0: _opp_add: OPP not supported by regulators (300000000)
>>>>> [    3.160278] core: _opp_supported_by_regulators: OPP minuV: 1200000 maxuV: 1200000, not supported by regulator
>>>>> [    3.171142] cpu cpu0: _opp_add: OPP not supported by regulators (600000000)
>>>>> [    3.178710] core: _opp_supported_by_regulators: OPP minuV: 1325000 maxuV: 1325000, not supported by regulator
>>>>> [    3.189483] cpu cpu0: _opp_add: OPP not supported by regulators (800000000)
>>>>> 
>>>>> letux-5.2-rc4 does not fully boot here, neither on gta04a5 nor on
>>>>> letux3704, still investigating.
>>>>> 
>>>> 
>>>> in
>>>> int regulator_is_supported_voltage(struct regulator *regulator,
>>>>                                   int min_uV, int max_uV)
>>>> 
>>>> the following if fails:
>>>>     /* Any voltage within constrains range is fine? */
>>>>        if (rdev->desc->continuous_voltage_range) {
>>>> 
>>>> 
>>>> this did the trick:
>>>> 
>>>> diff --git a/drivers/regulator/twl-regulator.c b/drivers/regulator/twl-regulator.c
>>>> index 6fa15b2d6fb3..f7bfdf53701d 100644
>>>> --- a/drivers/regulator/twl-regulator.c
>>>> +++ b/drivers/regulator/twl-regulator.c
>>>> @@ -478,6 +478,7 @@ static const struct twlreg_info TWL4030_INFO_##label = { \
>>>> 		.type = REGULATOR_VOLTAGE, \
>>>> 		.owner = THIS_MODULE, \
>>>> 		.enable_time = turnon_delay, \
>>>> +		.continuous_voltage_range = true, \
>>>> 		.of_map_mode = twl4030reg_map_mode, \
>>>> 		}, \
>>>> 	}
>>>> 
>>>> I am not sure if it is really ok, but seems to work.
>>>> 
>>> analyzed a bit further:
>>> last ok: next-20190503
>>> first fail: next-20190506
>>> 
>> to point to some author of a bad commit I do a 
>> git bisect run ~/kerneltest/test-grep-kernel.sh '_opp_add: OPP not supported by regulator' root=/dev/mmcblk0p2

^^^ yes, that simplifies things a lot if you can autocompile+download+boot+log and then just look for problems...

>> 
>> Hopefully it finds something interesting.
>> ... besides too hot cpu during build
>> 636288.334384] CPU2: Core temperature above threshold, cpu clock throttled (total events = 146)
>> [636288.334386] CPU3: Package temperature above threshold, cpu clock throttled (total events = 155)
>> [636288.334387] CPU1: Package temperature above threshold, cpu clock throttled (total events = 155)
>> [636288.334388] CPU0: Core temperature above threshold, cpu clock throttled (total events = 146)
>> [636288.334390] CPU0: Package temperature above threshold, cpu clock throttled (total events = 155)
>> [636288.334394] CPU2: Package temperature above threshold, cpu clock throttled (total events = 155)
>> 
>> sounds like airplane starts soon ;-)

Wow. Seems that the GTA04 overheating issue infects other machines through e-mail...
I still haven't got a better understanding of my issue here. But I think I have a L3704
with an SD card with a slightly older setup for comparisons.

>> 
>> seems to be 10 minutes per step.
> which gets lower later

Yes, I have observed that quite often with bisect if you do not do a make clean.
The reason is that the number of diffs between commits to be tested is going down
and less and less has to be recompiled. Sometimes even nothing if commits affect
areas not configured into our kernel.

> 
> drwxr-xr-x 2 andi andi   4096 Jun 15 16:09 v5.1
> drwxr-xr-x 2 andi andi   4096 Jun 15 16:21 v5.2-rc1
> drwxr-xr-x 2 andi andi   4096 Jun 15 16:32 v5.1-6352-g2646719a48c2
> drwxr-xr-x 2 andi andi   4096 Jun 15 16:42 v5.1-3125-g8b35ad6232c4
> drwxr-xr-x 2 andi andi   4096 Jun 15 16:53 v5.1-1572-gb4dd05dee0db
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:03 v5.1-747-g59df1c2bdecb
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:12 v5.1-1169-g81ff5d2cba4f
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:22 v5.1-1345-g61be53f9ef37
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:32 v5.1-rc1-88-g2564002abcde
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:34 v5.1-rc1-130-g498209445124
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:36 v5.1-rc1-109-g87dbc5eb3cff
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:38 v5.1-rc1-119-gc7e3ddd129d5
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:39 v5.1-rc1-124-g0ae3b061df30
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:41 v5.1-rc1-127-g7bcbdbe01fa8
> drwxr-xr-x 2 andi andi   4096 Jun 15 17:43 v5.1-rc1-129-gfd2f02f9724c
> 
> # first bad commit: [498209445124920b365ef43aac93d6f1acbaa1b7] regulator: core: simplify return value on suported_voltage
> 
> ok,
> 1. opp code does not do error checks there

> 2. twl regulator has neither a voltage list nor the
>   continuous flag set
> -> voltage invalid.
> 
> now things are clear. I am about to submit an official patch.

Great work!

BR,
Nikolaus



More information about the Letux-kernel mailing list