[Letux-kernel] [Gta04-owner] New LetuxOS Kernels

H. Nikolaus Schaller hns at goldelico.com
Wed Jun 20 06:55:17 CEST 2018


Hi Tony,

> Am 20.06.2018 um 06:26 schrieb Tony Lindgren <tony at atomide.com>:
>>>> 
>>>> I did a quick boot - and on first boot I also got a strcmp(NULL).
>>>> From the SD card which I had used for extensive testing yesterday.
>>>> 
>>>> What the hell is going on here?  

One more observation: the strcmp(NULL) has moved from
pinctrl_get_group_selector() to to pinctrl_generic_add_group()
i.e. now happens in pinctrl_generic_group_name_to_selector().

>>> 
>>> Maybe it is still a bug to devm_kzalloc something and store in the radix
>>> tree and leave it there, even if the driver is detached?
> 
> Or you guys using and older version of the patches?

We use these:

http://git.goldelico.com/?p=gta04-kernel.git;a=shortlog;h=refs/heads/work/letux-base/hacks
http://git.goldelico.com/?p=gta04-kernel.git;a=commit;h=d73fac5da046cc09fe082d20f57d8955c4d58ec7

I.e. from Fri, 15 Jun 2018 13:11:37 +0200 (04:11 -0700).

> The check for not
> allowing to add NULL named entries was added. Not sure how you would
> end up with NULL names though unless some parts are still freed on
> deferred probe. Care to try with the updated patches and add dump_stack
> for NULL names?

This does IMHO not solve the real problem. Would just hide it - mostly.

Problem seems to be:

1. before driver is probed pinmux does a group = devm_kzalloc
2. this is added to radix tree
3. driver probe fails for some reason
4. devres_release_all(driver) ==> does a kfree(group)
5. someone reuses the memory area defined by kzalloc
6. other driver is probed and wants to check if selector exists
7. scans through radix tree (in pinctrl_generic_group_name_to_selector)
8. finds NULLified memory area [or maybe other stale data!]
9. strcmp(NULL)

This happens despite our checking for duplicates.

So we should not fix #9 but #4 to properly remove groups from radix tree
and make sure that it is skipped during scanning for selectors (this is
what a NULL test #9 finally would do - the test alone isn't enough).

Andy already pointed to a location where the cleanup should take place:

> I think there is a simple way to clean up pinctrl stuff on failed probe. See
> https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416
> 
> We only bind pins, and do not perform any actions when failure happens later on.

Or have we missed this patch?

> 
>>> Then we still try to access this memory region by scanning the tree.
>>> 
>>> For test purposes we could replace the devm_kzalloc by kzalloc. This
>>> whould leak a little memory, but my hope is that the problem disappears.
>>> 
>>> Do you have a repeatable (at least >some%) scenario to reproduce the
>>> bug?
>> 
>> unfortunately not, maybe we should pass init=/modprobe-mess.sh to
>> kernel commandline, and create a worst case modprobe scenario there.
>> So we can control probing order more.
> 
> Funny how I have not seen these. Probably because I got rid of that
> PID 1 software years ago.

I think it depends on which drivers are modprobed and how they defer probing.
If all drivers are succeeding immediately one doesn't see it.

BR,
Nikolaus



More information about the Letux-kernel mailing list