[Letux-kernel] accelerometer bridge madness

Andreas Kemnade andreas at kemnade.info
Fri Jul 27 12:55:44 CEST 2018


On Fri, 27 Jul 2018 08:48:31 +0200
"H. Nikolaus Schaller" <hns at goldelico.com> wrote:

> Hi,
> 
> > Am 27.07.2018 um 07:21 schrieb Andreas Kemnade <andreas at kemnade.info>:
> > 
> > Hi,
> > 
> > On Wed, 25 Jul 2018 07:00:48 +0200
> > "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> >   
> >> Hi,
> >>   
> >>> Am 24.07.2018 um 22:26 schrieb Andreas Kemnade
> >>> <andreas at kemnade.info>:
> >>> 
> >>> Hi,
> >>> 
> >>>   
> >>>> BTW: did you see the kernel hickups also with evtest on the
> >>>> accelerometer?  
> >>> [   60.439422] ------------[ cut here ]------------
> >>> [   60.444305] WARNING: CPU: 0 PID: 3140 at kernel/workqueue.c:1513
> >>> __queue_delayed_work+0xd8/0x13c  
> >> 
> >> https://elixir.bootlin.com/linux/v4.18-rc6/source/kernel/workqueue.c#L1513
> >> 
> >> So some list is not empty which should be. No idea why that happens...
> >> 
> >> 		INIT_DELAYED_WORK(&input_work, inputbridge_work);
> >> 
> >> should have initialized work->entry through
> >>   
> > well, it is called *after* the device registration. So a there is a
> > little race  
> 
> Ah, ok! That could already be the main reason for problems.
> 
> > condition. Not sure if that can be exploited.  
> 
> I have experienced that some drivers start interrups and handlers before
> returning from probe().
> 
> > 
> > [...]  
> >>>> Maybe /lib/udev/accelerometer is slightly incompatible with kernel
> >>>> API and the iio-bridge doesn't catch such situations properly?  
> >>>   
> > I also tried with my stretched partition, then I have a process named
> > systemd-udevd running. So there is no single binary to remove/rename.
> > Booting is again slowed down horribly, and also the time between kernel
> > switching off unused regulators (and therefore stops automatic
> > charging) and usb enumeration (when charging starts again). So this is
> > critical. Somehow I do not like a system where failed accelerometer
> > reads lead to boot problems. That is too fragile.
> >   
> >>> also plain reads are enough to get it, like a simple
> >>> xxd /dev/input/eventX.
> >>> 
> >>> Are you testing with A4 or A5?  
> >> 
> >> Both.
> >> 
> >> I have also looked into the code of the stack trace and could you
> >> please try to change accel_open() in
> >> drivers/iio/industrialio-inputbridge.c
> >> 
> >> #if POLLING
> >> 	schedule_delayed_work(&input_work,
> >> 		msecs_to_jiffies(0));	// start now
> >> #else
> >> 
> >> to
> >> 
> >> #if POLLING
> >> 	schedule_delayed_work(&input_work,
> >> 		msecs_to_jiffies(1));	// start now
> >> #else
> >> 
> >> Maybe immediately scheduling a delayed work ins't allowed in open()
> >> in some situations?
> >>   
> > I even tried 100 and it did not help.
> > But what helps:
> > moving the initialisation of the workqueue to accel_open().  
> 
> That is unexpected. As long as we INIT_DELAYED_WORK first and then
> schedule with a long delay, nothing bad should happen, IMHO.
> 
Well, the main question is whether iio-bridge is the victim or the
suspect. Perhaps we have the same situation as with pinctl and gab.
So we will find a lot of minor issues here and the real problem is
somewhere else.

> Ah, ok. This could indeed be the race: If input_register_device()
> stops the thread and waits until all udev activities are finished,
> udev will indirectly call accel_open() before INIT_DELAYED_WORK.

Well, udev first has to create a node and then something can open it (and I still do not
understand why stretch udev messes with the evdev). 
I will try to enable the printk in open to see what really happens.
I hope the bug will not disappear.
> 
> Unfortunately Linux is missing any documentation about such concurrency
> issues. At least I never know what is thread-safe and what isn't...
> 
> Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
> which may lead to problems with the second open().
> 
Well, if the work is not scheduled anymore (due to a close(), it is probably no problem,
but again opening it multiple times at the same time might be dangerous.

> So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().
> 
I already tried it yesterday. It did not help. I might try again.

> > 
> > Then evtest works. But that can be dangerous if the device is opened
> > multiple times. Is the current code safe for that anyways? I guess not,
> > canceling the work should only happen at last close() and scheduling it
> > at the first open()  
> 
> Ok, this is a situation I have never thought of. So we should count open()
> and close().
> 
> I have added another patch.
> 
which adds another set of race conditions, probably mainly on multicore systems.
kref_get() and kref_put() are the tools to use afaik.

But I will first check how often open() is called.

Regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/071611f9/attachment.asc>


More information about the Letux-kernel mailing list