[Letux-kernel] New LetuxOS Kernels - strcmp(NULL)

Andreas Kemnade andreas at kemnade.info
Sun Jun 24 13:00:08 CEST 2018


On Sun, 24 Jun 2018 12:54:38 +0200
"H. Nikolaus Schaller" <hns at goldelico.com> wrote:

> > Am 24.06.2018 um 11:38 schrieb Andreas Kemnade <andreas at kemnade.info>:
> > 
> > On Sun, 24 Jun 2018 09:52:53 +0200
> > "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> >   
> >> Hi,
> >>   
> >>> Am 24.06.2018 um 09:11 schrieb Andreas Kemnade <andreas at kemnade.info>:
> >>> 
> >>> On Sat, 23 Jun 2018 12:13:11 +0200
> >>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> >>> 
> >>>   
> >>>> 
> >>>> So the issue is that "backlight_pins_pinmux" are searched for a NULL record before they
> >>>> are properly stored. Or someone punches a NULL into the radix_tree.
> >>>> 
> >>>> Hope this sheds some new light on the problem.
> >>>>   
> >>> hmm, the next question is whether the NULL is *always* there, so even
> >>> in the successful boots. Is that still with mainline  sources + minimal
> >>> set of things?
> >>> 
> >>> Can we infer any bad order of module loading from that output?
> >>> Probably the thing that inserts the NULL should be loaded last for
> >>> successful boots or first for failed boots
> >>> 
> >>> Or should we remove stuff from dtb piece by piece to see if that helps?    
> >> 
> >> The problem is that you can remove almost anything and "it helps". So
> >> it is very fragile to have a system that runs into this bug. If
> >> you change a little piece, the problem disappears but you don't know if
> >> it is really the reason or just a factor that enables/disables the real
> >> problem to appear/disappear.  
> > 
> > I think we can do it the other way round. Removing drivers stuff which
> > does not disable the problem and consider them not guilty.   
> 
> Indeed, there are some.
> 
> I have identified these:
> omap3isp
> omapdss
> 
> > 
> >   
> >> 
> >> For example you can blackist some modules and it is gone. You  
> > 
> > What do you mean: it is gone? How many times do you test to consider it
> > gone?   
> 
> Well, I have a scenario where I have it in 4 to 5 out of 5 tests.
> And if I do do a change and it is 0 of 3 I can consider it as "gone".
> 
> Well, this is not statistically valid but a good guess only.
> 
> I have found another printk that makes it go away:
> 
> @@ -390,6 +390,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> 
> +       printk("%s: driver %s\n", __func__, drv->name);
> 
> Then, I can see the probe order (there are some more drivers which are probed
> without notice in the log), but I have no strcmp(NULL).
> 
> > The next question is whether we need really that much concurrency here.
> > Can we do:
> > 
> > create_random_list_of_modules
> > log that list
> > while(not_everything_loaded) {
> >  insmod first_module_in_list.ko && remove_module_from_list
> >  sleep ?
> > }
> > 
> > will we get some pattern? Or maybe just get a full list of drivers
> > which needs not to be loaded to enable the problem.  
> 
> Hm. It is difficult to predict the outcome. Especially since deferred
> driver probing makes it a little independent of the list order.
> 
but at least we can rule out problems in the drivers which are not
loaded (if we insert a sleep there) before the Oops. 

Regards,
Andreas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180624/fa699595/attachment.asc>


More information about the Letux-kernel mailing list