[Letux-kernel] weird problem with pwm_bl on omap3
H. Nikolaus Schaller
hns at goldelico.com
Thu May 31 08:11:14 CEST 2018
Hi Tony,
> Am 31.05.2018 um 01:20 schrieb Tony Lindgren <tony at atomide.com>:
>
> * H. Nikolaus Schaller <hns at goldelico.com> [180529 15:44]:
>> I have restarted hunting the issue but it is very
>> ghostly. Every time I try to test by another method
>> it disappears and if I remove my printk things or
>> /etc/modprobe/blacklist.conf it comes back.
>
> Hmm yeah those kind of bugs are annoying :)
>
>> And I tried a bisect but I might not have properly
>> detected good/bad and the result points to a commit
>> that is very unlikely to make the problem (some patch
>> for file system flags).
>>
>> Because it seems to depend on the probe/deferred probe
>> sequence so it is clearly some race somewhere and I
>> am not sure if the pwm_bl is really the source of the
>> problem or just happens to do probing at the same moment
>> when something else goes wrong.
>>
>> I also had another strange effect that sometimes
>> only 6 or 7 kernel modules were loaded and shown
>> by lsmod. And when I did "modprobe omapdss" it did
>> try to load the pwm_bl again several times.
>>
>> Now after running several more boot sequences I did
>> several times have problems in the generic-adc-battery
>> driver and just some minutes ago in the bq27xxx
>> driver as well. Both are problems in power_supply_changed_work.
>>
>> Every time it is an unexpected NULL pointer dereference
>> happening around the moment where pwm_bl is probed.
>
> I guess it could be also some power supply issue?
Its is exactly the same on two different boards and power supplies...
And different compilers.
But it disappears if an older kernel (before 4.17-rc1) is installed.
Maybe it is already there but not that visible.
It seems to have something to do with two drivers probing
in parallel (which might depend a little on hardware fluctuations)
and common to all error reports is that there is some worker running.
But we can't find a mechanism that leads to the oopses we see.
By delaying some scheduled worker in the generic-adc-battery
I was able to shift the problem into the twl4030 sound driver probing...
>
>> At the moment it looks as if *all* such problems occur
>> in some worker_thread...
>>
>> Next I'll try the CONFIG_DEBUG_SLAB and POISON options.
>
> Yeah that might catch it.
I did try it but it did not show anything :(
To me it looks like a missing lock which garbles some linked
list.
BR and thanks,
Nikolaus
More information about the Letux-kernel
mailing list