[Letux-kernel] accelerometer bridge madness

H. Nikolaus Schaller hns at goldelico.com
Fri Jul 27 08:48:31 CEST 2018


Hi,

> Am 27.07.2018 um 07:21 schrieb Andreas Kemnade <andreas at kemnade.info>:
> 
> Hi,
> 
> On Wed, 25 Jul 2018 07:00:48 +0200
> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> 
>> Hi,
>> 
>>> Am 24.07.2018 um 22:26 schrieb Andreas Kemnade
>>> <andreas at kemnade.info>:
>>> 
>>> Hi,
>>> 
>>> 
>>>> BTW: did you see the kernel hickups also with evtest on the
>>>> accelerometer?
>>> [   60.439422] ------------[ cut here ]------------
>>> [   60.444305] WARNING: CPU: 0 PID: 3140 at kernel/workqueue.c:1513
>>> __queue_delayed_work+0xd8/0x13c
>> 
>> https://elixir.bootlin.com/linux/v4.18-rc6/source/kernel/workqueue.c#L1513
>> 
>> So some list is not empty which should be. No idea why that happens...
>> 
>> 		INIT_DELAYED_WORK(&input_work, inputbridge_work);
>> 
>> should have initialized work->entry through
>> 
> well, it is called *after* the device registration. So a there is a
> little race

Ah, ok! That could already be the main reason for problems.

> condition. Not sure if that can be exploited.

I have experienced that some drivers start interrups and handlers before
returning from probe().

> 
> [...]
>>>> Maybe /lib/udev/accelerometer is slightly incompatible with kernel
>>>> API and the iio-bridge doesn't catch such situations properly?
>>> 
> I also tried with my stretched partition, then I have a process named
> systemd-udevd running. So there is no single binary to remove/rename.
> Booting is again slowed down horribly, and also the time between kernel
> switching off unused regulators (and therefore stops automatic
> charging) and usb enumeration (when charging starts again). So this is
> critical. Somehow I do not like a system where failed accelerometer
> reads lead to boot problems. That is too fragile.
> 
>>> also plain reads are enough to get it, like a simple
>>> xxd /dev/input/eventX.
>>> 
>>> Are you testing with A4 or A5?
>> 
>> Both.
>> 
>> I have also looked into the code of the stack trace and could you
>> please try to change accel_open() in
>> drivers/iio/industrialio-inputbridge.c
>> 
>> #if POLLING
>> 	schedule_delayed_work(&input_work,
>> 		msecs_to_jiffies(0));	// start now
>> #else
>> 
>> to
>> 
>> #if POLLING
>> 	schedule_delayed_work(&input_work,
>> 		msecs_to_jiffies(1));	// start now
>> #else
>> 
>> Maybe immediately scheduling a delayed work ins't allowed in open()
>> in some situations?
>> 
> I even tried 100 and it did not help.
> But what helps:
> moving the initialisation of the workqueue to accel_open().

That is unexpected. As long as we INIT_DELAYED_WORK first and then
schedule with a long delay, nothing bad should happen, IMHO.

Ah, ok. This could indeed be the race: If input_register_device()
stops the thread and waits until all udev activities are finished,
udev will indirectly call accel_open() before INIT_DELAYED_WORK.

Unfortunately Linux is missing any documentation about such concurrency
issues. At least I never know what is thread-safe and what isn't...

Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
which may lead to problems with the second open().

So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().

> 
> Then evtest works. But that can be dangerous if the device is opened
> multiple times. Is the current code safe for that anyways? I guess not,
> canceling the work should only happen at last close() and scheduling it
> at the first open()

Ok, this is a situation I have never thought of. So we should count open()
and close().

I have added another patch.

> 
> But somehow if initializing input_work earlier causes trouble then it
> means it is damaged somehow. That all feels like memory corruption by a
> random driver. So now what is the right printk to enable to
> fix the problem ;-)
> Well, nightmare ahead...
> 
> Maybe I should give the letux 3704 a torture night again so maybe we
> know whether the problem is in common code or not.

So let's see what you find.

BR,
Nikolaus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-iio-input-bridge-initialize-delayed-work-before-regi.patch
Type: application/octet-stream
Size: 1565 bytes
Desc: not available
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/d14fd412/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-iio-input-bridge-count-open-close-to-schedule-pollin.patch
Type: application/octet-stream
Size: 1518 bytes
Desc: not available
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/d14fd412/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/d14fd412/attachment.asc>


More information about the Letux-kernel mailing list