[Letux-kernel] accelerometer bridge madness
H. Nikolaus Schaller
hns at goldelico.com
Fri Jul 27 08:48:31 CEST 2018
Hi,
> Am 27.07.2018 um 07:21 schrieb Andreas Kemnade <andreas at kemnade.info>:
>
> Hi,
>
> On Wed, 25 Jul 2018 07:00:48 +0200
> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>
>> Hi,
>>
>>> Am 24.07.2018 um 22:26 schrieb Andreas Kemnade
>>> <andreas at kemnade.info>:
>>>
>>> Hi,
>>>
>>>
>>>> BTW: did you see the kernel hickups also with evtest on the
>>>> accelerometer?
>>> [ 60.439422] ------------[ cut here ]------------
>>> [ 60.444305] WARNING: CPU: 0 PID: 3140 at kernel/workqueue.c:1513
>>> __queue_delayed_work+0xd8/0x13c
>>
>> https://elixir.bootlin.com/linux/v4.18-rc6/source/kernel/workqueue.c#L1513
>>
>> So some list is not empty which should be. No idea why that happens...
>>
>> INIT_DELAYED_WORK(&input_work, inputbridge_work);
>>
>> should have initialized work->entry through
>>
> well, it is called *after* the device registration. So a there is a
> little race
Ah, ok! That could already be the main reason for problems.
> condition. Not sure if that can be exploited.
I have experienced that some drivers start interrups and handlers before
returning from probe().
>
> [...]
>>>> Maybe /lib/udev/accelerometer is slightly incompatible with kernel
>>>> API and the iio-bridge doesn't catch such situations properly?
>>>
> I also tried with my stretched partition, then I have a process named
> systemd-udevd running. So there is no single binary to remove/rename.
> Booting is again slowed down horribly, and also the time between kernel
> switching off unused regulators (and therefore stops automatic
> charging) and usb enumeration (when charging starts again). So this is
> critical. Somehow I do not like a system where failed accelerometer
> reads lead to boot problems. That is too fragile.
>
>>> also plain reads are enough to get it, like a simple
>>> xxd /dev/input/eventX.
>>>
>>> Are you testing with A4 or A5?
>>
>> Both.
>>
>> I have also looked into the code of the stack trace and could you
>> please try to change accel_open() in
>> drivers/iio/industrialio-inputbridge.c
>>
>> #if POLLING
>> schedule_delayed_work(&input_work,
>> msecs_to_jiffies(0)); // start now
>> #else
>>
>> to
>>
>> #if POLLING
>> schedule_delayed_work(&input_work,
>> msecs_to_jiffies(1)); // start now
>> #else
>>
>> Maybe immediately scheduling a delayed work ins't allowed in open()
>> in some situations?
>>
> I even tried 100 and it did not help.
> But what helps:
> moving the initialisation of the workqueue to accel_open().
That is unexpected. As long as we INIT_DELAYED_WORK first and then
schedule with a long delay, nothing bad should happen, IMHO.
Ah, ok. This could indeed be the race: If input_register_device()
stops the thread and waits until all udev activities are finished,
udev will indirectly call accel_open() before INIT_DELAYED_WORK.
Unfortunately Linux is missing any documentation about such concurrency
issues. At least I never know what is thread-safe and what isn't...
Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
which may lead to problems with the second open().
So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().
>
> Then evtest works. But that can be dangerous if the device is opened
> multiple times. Is the current code safe for that anyways? I guess not,
> canceling the work should only happen at last close() and scheduling it
> at the first open()
Ok, this is a situation I have never thought of. So we should count open()
and close().
I have added another patch.
>
> But somehow if initializing input_work earlier causes trouble then it
> means it is damaged somehow. That all feels like memory corruption by a
> random driver. So now what is the right printk to enable to
> fix the problem ;-)
> Well, nightmare ahead...
>
> Maybe I should give the letux 3704 a torture night again so maybe we
> know whether the problem is in common code or not.
So let's see what you find.
BR,
Nikolaus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-iio-input-bridge-initialize-delayed-work-before-regi.patch
Type: application/octet-stream
Size: 1565 bytes
Desc: not available
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/d14fd412/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-iio-input-bridge-count-open-close-to-schedule-pollin.patch
Type: application/octet-stream
Size: 1518 bytes
Desc: not available
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/d14fd412/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/d14fd412/attachment.asc>
More information about the Letux-kernel
mailing list