[Letux-kernel] accelerometer bridge madness
H. Nikolaus Schaller
hns at goldelico.com
Fri Jul 27 09:00:47 CEST 2018
> Am 27.07.2018 um 08:48 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>
> Hi,
>
>> Am 27.07.2018 um 07:21 schrieb Andreas Kemnade <andreas at kemnade.info>:
>>
>> Hi,
>>
>> On Wed, 25 Jul 2018 07:00:48 +0200
>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>
>>> Hi,
>>>
>>>> Am 24.07.2018 um 22:26 schrieb Andreas Kemnade
>>>> <andreas at kemnade.info>:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>> BTW: did you see the kernel hickups also with evtest on the
>>>>> accelerometer?
>>>> [ 60.439422] ------------[ cut here ]------------
>>>> [ 60.444305] WARNING: CPU: 0 PID: 3140 at kernel/workqueue.c:1513
>>>> __queue_delayed_work+0xd8/0x13c
>>>
>>> https://elixir.bootlin.com/linux/v4.18-rc6/source/kernel/workqueue.c#L1513
>>>
>>> So some list is not empty which should be. No idea why that happens...
>>>
>>> INIT_DELAYED_WORK(&input_work, inputbridge_work);
>>>
>>> should have initialized work->entry through
>>>
>> well, it is called *after* the device registration. So a there is a
>> little race
>
> Ah, ok! That could already be the main reason for problems.
>
>> condition. Not sure if that can be exploited.
>
> I have experienced that some drivers start interrups and handlers before
> returning from probe().
>
>>
>> [...]
>>>>> Maybe /lib/udev/accelerometer is slightly incompatible with kernel
>>>>> API and the iio-bridge doesn't catch such situations properly?
>>>>
>> I also tried with my stretched partition, then I have a process named
>> systemd-udevd running. So there is no single binary to remove/rename.
>> Booting is again slowed down horribly, and also the time between kernel
>> switching off unused regulators (and therefore stops automatic
>> charging) and usb enumeration (when charging starts again). So this is
>> critical. Somehow I do not like a system where failed accelerometer
>> reads lead to boot problems. That is too fragile.
>>
>>>> also plain reads are enough to get it, like a simple
>>>> xxd /dev/input/eventX.
>>>>
>>>> Are you testing with A4 or A5?
>>>
>>> Both.
>>>
>>> I have also looked into the code of the stack trace and could you
>>> please try to change accel_open() in
>>> drivers/iio/industrialio-inputbridge.c
>>>
>>> #if POLLING
>>> schedule_delayed_work(&input_work,
>>> msecs_to_jiffies(0)); // start now
>>> #else
>>>
>>> to
>>>
>>> #if POLLING
>>> schedule_delayed_work(&input_work,
>>> msecs_to_jiffies(1)); // start now
>>> #else
>>>
>>> Maybe immediately scheduling a delayed work ins't allowed in open()
>>> in some situations?
>>>
>> I even tried 100 and it did not help.
>> But what helps:
>> moving the initialisation of the workqueue to accel_open().
>
> That is unexpected. As long as we INIT_DELAYED_WORK first and then
> schedule with a long delay, nothing bad should happen, IMHO.
>
> Ah, ok. This could indeed be the race: If input_register_device()
> stops the thread and waits until all udev activities are finished,
> udev will indirectly call accel_open() before INIT_DELAYED_WORK.
>
> Unfortunately Linux is missing any documentation about such concurrency
> issues. At least I never know what is thread-safe and what isn't...
Maybe my wrong model is that activities of a driver start not before
successfully returning from probe(). But the engine may start at
100% thrust before a register() function called by probe() returns...
>
> Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
> which may lead to problems with the second open().
>
> So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().
>
>>
>> Then evtest works. But that can be dangerous if the device is opened
>> multiple times. Is the current code safe for that anyways? I guess not,
>> canceling the work should only happen at last close() and scheduling it
>> at the first open()
>
> Ok, this is a situation I have never thought of. So we should count open()
> and close().
>
> I have added another patch.
>
>>
>> But somehow if initializing input_work earlier causes trouble then it
>> means it is damaged somehow. That all feels like memory corruption by a
>> random driver. So now what is the right printk to enable to
>> fix the problem ;-)
>> Well, nightmare ahead...
>>
>> Maybe I should give the letux 3704 a torture night again so maybe we
>> know whether the problem is in common code or not.
>
> So let's see what you find.
>
> BR,
> Nikolaus
>
> <0001-iio-input-bridge-initialize-delayed-work-before-regi.patch><0002-iio-input-bridge-count-open-close-to-schedule-pollin.patch>_______________________________________________
> http://projects.goldelico.com/p/gta04-kernel/
> Letux-kernel mailing list
> Letux-kernel at openphoenux.org
> http://lists.goldelico.com/mailman/listinfo.cgi/letux-kernel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/f23eec81/attachment-0001.asc>
More information about the Letux-kernel
mailing list