[Letux-kernel] accelerometer bridge madness
Andreas Kemnade
andreas at kemnade.info
Fri Jul 27 12:55:44 CEST 2018
On Fri, 27 Jul 2018 08:48:31 +0200
"H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> Hi,
>
> > Am 27.07.2018 um 07:21 schrieb Andreas Kemnade <andreas at kemnade.info>:
> >
> > Hi,
> >
> > On Wed, 25 Jul 2018 07:00:48 +0200
> > "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> >
> >> Hi,
> >>
> >>> Am 24.07.2018 um 22:26 schrieb Andreas Kemnade
> >>> <andreas at kemnade.info>:
> >>>
> >>> Hi,
> >>>
> >>>
> >>>> BTW: did you see the kernel hickups also with evtest on the
> >>>> accelerometer?
> >>> [ 60.439422] ------------[ cut here ]------------
> >>> [ 60.444305] WARNING: CPU: 0 PID: 3140 at kernel/workqueue.c:1513
> >>> __queue_delayed_work+0xd8/0x13c
> >>
> >> https://elixir.bootlin.com/linux/v4.18-rc6/source/kernel/workqueue.c#L1513
> >>
> >> So some list is not empty which should be. No idea why that happens...
> >>
> >> INIT_DELAYED_WORK(&input_work, inputbridge_work);
> >>
> >> should have initialized work->entry through
> >>
> > well, it is called *after* the device registration. So a there is a
> > little race
>
> Ah, ok! That could already be the main reason for problems.
>
> > condition. Not sure if that can be exploited.
>
> I have experienced that some drivers start interrups and handlers before
> returning from probe().
>
> >
> > [...]
> >>>> Maybe /lib/udev/accelerometer is slightly incompatible with kernel
> >>>> API and the iio-bridge doesn't catch such situations properly?
> >>>
> > I also tried with my stretched partition, then I have a process named
> > systemd-udevd running. So there is no single binary to remove/rename.
> > Booting is again slowed down horribly, and also the time between kernel
> > switching off unused regulators (and therefore stops automatic
> > charging) and usb enumeration (when charging starts again). So this is
> > critical. Somehow I do not like a system where failed accelerometer
> > reads lead to boot problems. That is too fragile.
> >
> >>> also plain reads are enough to get it, like a simple
> >>> xxd /dev/input/eventX.
> >>>
> >>> Are you testing with A4 or A5?
> >>
> >> Both.
> >>
> >> I have also looked into the code of the stack trace and could you
> >> please try to change accel_open() in
> >> drivers/iio/industrialio-inputbridge.c
> >>
> >> #if POLLING
> >> schedule_delayed_work(&input_work,
> >> msecs_to_jiffies(0)); // start now
> >> #else
> >>
> >> to
> >>
> >> #if POLLING
> >> schedule_delayed_work(&input_work,
> >> msecs_to_jiffies(1)); // start now
> >> #else
> >>
> >> Maybe immediately scheduling a delayed work ins't allowed in open()
> >> in some situations?
> >>
> > I even tried 100 and it did not help.
> > But what helps:
> > moving the initialisation of the workqueue to accel_open().
>
> That is unexpected. As long as we INIT_DELAYED_WORK first and then
> schedule with a long delay, nothing bad should happen, IMHO.
>
Well, the main question is whether iio-bridge is the victim or the
suspect. Perhaps we have the same situation as with pinctl and gab.
So we will find a lot of minor issues here and the real problem is
somewhere else.
> Ah, ok. This could indeed be the race: If input_register_device()
> stops the thread and waits until all udev activities are finished,
> udev will indirectly call accel_open() before INIT_DELAYED_WORK.
Well, udev first has to create a node and then something can open it (and I still do not
understand why stretch udev messes with the evdev).
I will try to enable the printk in open to see what really happens.
I hope the bug will not disappear.
>
> Unfortunately Linux is missing any documentation about such concurrency
> issues. At least I never know what is thread-safe and what isn't...
>
> Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
> which may lead to problems with the second open().
>
Well, if the work is not scheduled anymore (due to a close(), it is probably no problem,
but again opening it multiple times at the same time might be dangerous.
> So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().
>
I already tried it yesterday. It did not help. I might try again.
> >
> > Then evtest works. But that can be dangerous if the device is opened
> > multiple times. Is the current code safe for that anyways? I guess not,
> > canceling the work should only happen at last close() and scheduling it
> > at the first open()
>
> Ok, this is a situation I have never thought of. So we should count open()
> and close().
>
> I have added another patch.
>
which adds another set of race conditions, probably mainly on multicore systems.
kref_get() and kref_put() are the tools to use afaik.
But I will first check how often open() is called.
Regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/071611f9/attachment.asc>
More information about the Letux-kernel
mailing list