[Letux-kernel] [Gta04-owner] New LetuxOS Kernels
H. Nikolaus Schaller
hns at goldelico.com
Wed Jun 20 06:55:17 CEST 2018
Hi Tony,
> Am 20.06.2018 um 06:26 schrieb Tony Lindgren <tony at atomide.com>:
>>>>
>>>> I did a quick boot - and on first boot I also got a strcmp(NULL).
>>>> From the SD card which I had used for extensive testing yesterday.
>>>>
>>>> What the hell is going on here?
One more observation: the strcmp(NULL) has moved from
pinctrl_get_group_selector() to to pinctrl_generic_add_group()
i.e. now happens in pinctrl_generic_group_name_to_selector().
>>>
>>> Maybe it is still a bug to devm_kzalloc something and store in the radix
>>> tree and leave it there, even if the driver is detached?
>
> Or you guys using and older version of the patches?
We use these:
http://git.goldelico.com/?p=gta04-kernel.git;a=shortlog;h=refs/heads/work/letux-base/hacks
http://git.goldelico.com/?p=gta04-kernel.git;a=commit;h=d73fac5da046cc09fe082d20f57d8955c4d58ec7
I.e. from Fri, 15 Jun 2018 13:11:37 +0200 (04:11 -0700).
> The check for not
> allowing to add NULL named entries was added. Not sure how you would
> end up with NULL names though unless some parts are still freed on
> deferred probe. Care to try with the updated patches and add dump_stack
> for NULL names?
This does IMHO not solve the real problem. Would just hide it - mostly.
Problem seems to be:
1. before driver is probed pinmux does a group = devm_kzalloc
2. this is added to radix tree
3. driver probe fails for some reason
4. devres_release_all(driver) ==> does a kfree(group)
5. someone reuses the memory area defined by kzalloc
6. other driver is probed and wants to check if selector exists
7. scans through radix tree (in pinctrl_generic_group_name_to_selector)
8. finds NULLified memory area [or maybe other stale data!]
9. strcmp(NULL)
This happens despite our checking for duplicates.
So we should not fix #9 but #4 to properly remove groups from radix tree
and make sure that it is skipped during scanning for selectors (this is
what a NULL test #9 finally would do - the test alone isn't enough).
Andy already pointed to a location where the cleanup should take place:
> I think there is a simple way to clean up pinctrl stuff on failed probe. See
> https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416
>
> We only bind pins, and do not perform any actions when failure happens later on.
Or have we missed this patch?
>
>>> Then we still try to access this memory region by scanning the tree.
>>>
>>> For test purposes we could replace the devm_kzalloc by kzalloc. This
>>> whould leak a little memory, but my hope is that the problem disappears.
>>>
>>> Do you have a repeatable (at least >some%) scenario to reproduce the
>>> bug?
>>
>> unfortunately not, maybe we should pass init=/modprobe-mess.sh to
>> kernel commandline, and create a worst case modprobe scenario there.
>> So we can control probing order more.
>
> Funny how I have not seen these. Probably because I got rid of that
> PID 1 software years ago.
I think it depends on which drivers are modprobed and how they defer probing.
If all drivers are succeeding immediately one doesn't see it.
BR,
Nikolaus
More information about the Letux-kernel
mailing list