[Letux-kernel] New LetuxOS Kernels - strcmp(NULL)
Andreas Kemnade
andreas at kemnade.info
Sun Jun 24 13:00:08 CEST 2018
On Sun, 24 Jun 2018 12:54:38 +0200
"H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> > Am 24.06.2018 um 11:38 schrieb Andreas Kemnade <andreas at kemnade.info>:
> >
> > On Sun, 24 Jun 2018 09:52:53 +0200
> > "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> >
> >> Hi,
> >>
> >>> Am 24.06.2018 um 09:11 schrieb Andreas Kemnade <andreas at kemnade.info>:
> >>>
> >>> On Sat, 23 Jun 2018 12:13:11 +0200
> >>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> >>>
> >>>
> >>>>
> >>>> So the issue is that "backlight_pins_pinmux" are searched for a NULL record before they
> >>>> are properly stored. Or someone punches a NULL into the radix_tree.
> >>>>
> >>>> Hope this sheds some new light on the problem.
> >>>>
> >>> hmm, the next question is whether the NULL is *always* there, so even
> >>> in the successful boots. Is that still with mainline sources + minimal
> >>> set of things?
> >>>
> >>> Can we infer any bad order of module loading from that output?
> >>> Probably the thing that inserts the NULL should be loaded last for
> >>> successful boots or first for failed boots
> >>>
> >>> Or should we remove stuff from dtb piece by piece to see if that helps?
> >>
> >> The problem is that you can remove almost anything and "it helps". So
> >> it is very fragile to have a system that runs into this bug. If
> >> you change a little piece, the problem disappears but you don't know if
> >> it is really the reason or just a factor that enables/disables the real
> >> problem to appear/disappear.
> >
> > I think we can do it the other way round. Removing drivers stuff which
> > does not disable the problem and consider them not guilty.
>
> Indeed, there are some.
>
> I have identified these:
> omap3isp
> omapdss
>
> >
> >
> >>
> >> For example you can blackist some modules and it is gone. You
> >
> > What do you mean: it is gone? How many times do you test to consider it
> > gone?
>
> Well, I have a scenario where I have it in 4 to 5 out of 5 tests.
> And if I do do a change and it is 0 of 3 I can consider it as "gone".
>
> Well, this is not statistically valid but a good guess only.
>
> I have found another printk that makes it go away:
>
> @@ -390,6 +390,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>
> + printk("%s: driver %s\n", __func__, drv->name);
>
> Then, I can see the probe order (there are some more drivers which are probed
> without notice in the log), but I have no strcmp(NULL).
>
> > The next question is whether we need really that much concurrency here.
> > Can we do:
> >
> > create_random_list_of_modules
> > log that list
> > while(not_everything_loaded) {
> > insmod first_module_in_list.ko && remove_module_from_list
> > sleep ?
> > }
> >
> > will we get some pattern? Or maybe just get a full list of drivers
> > which needs not to be loaded to enable the problem.
>
> Hm. It is difficult to predict the outcome. Especially since deferred
> driver probing makes it a little independent of the list order.
>
but at least we can rule out problems in the drivers which are not
loaded (if we insert a sleep there) before the Oops.
Regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180624/fa699595/attachment.asc>
More information about the Letux-kernel
mailing list