[Letux-kernel] accelerometer bridge madness

H. Nikolaus Schaller hns at goldelico.com
Fri Jul 27 09:00:47 CEST 2018


> Am 27.07.2018 um 08:48 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
> 
> Hi,
> 
>> Am 27.07.2018 um 07:21 schrieb Andreas Kemnade <andreas at kemnade.info>:
>> 
>> Hi,
>> 
>> On Wed, 25 Jul 2018 07:00:48 +0200
>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>> 
>>> Hi,
>>> 
>>>> Am 24.07.2018 um 22:26 schrieb Andreas Kemnade
>>>> <andreas at kemnade.info>:
>>>> 
>>>> Hi,
>>>> 
>>>> 
>>>>> BTW: did you see the kernel hickups also with evtest on the
>>>>> accelerometer?
>>>> [   60.439422] ------------[ cut here ]------------
>>>> [   60.444305] WARNING: CPU: 0 PID: 3140 at kernel/workqueue.c:1513
>>>> __queue_delayed_work+0xd8/0x13c
>>> 
>>> https://elixir.bootlin.com/linux/v4.18-rc6/source/kernel/workqueue.c#L1513
>>> 
>>> So some list is not empty which should be. No idea why that happens...
>>> 
>>> 		INIT_DELAYED_WORK(&input_work, inputbridge_work);
>>> 
>>> should have initialized work->entry through
>>> 
>> well, it is called *after* the device registration. So a there is a
>> little race
> 
> Ah, ok! That could already be the main reason for problems.
> 
>> condition. Not sure if that can be exploited.
> 
> I have experienced that some drivers start interrups and handlers before
> returning from probe().
> 
>> 
>> [...]
>>>>> Maybe /lib/udev/accelerometer is slightly incompatible with kernel
>>>>> API and the iio-bridge doesn't catch such situations properly?
>>>> 
>> I also tried with my stretched partition, then I have a process named
>> systemd-udevd running. So there is no single binary to remove/rename.
>> Booting is again slowed down horribly, and also the time between kernel
>> switching off unused regulators (and therefore stops automatic
>> charging) and usb enumeration (when charging starts again). So this is
>> critical. Somehow I do not like a system where failed accelerometer
>> reads lead to boot problems. That is too fragile.
>> 
>>>> also plain reads are enough to get it, like a simple
>>>> xxd /dev/input/eventX.
>>>> 
>>>> Are you testing with A4 or A5?
>>> 
>>> Both.
>>> 
>>> I have also looked into the code of the stack trace and could you
>>> please try to change accel_open() in
>>> drivers/iio/industrialio-inputbridge.c
>>> 
>>> #if POLLING
>>> 	schedule_delayed_work(&input_work,
>>> 		msecs_to_jiffies(0));	// start now
>>> #else
>>> 
>>> to
>>> 
>>> #if POLLING
>>> 	schedule_delayed_work(&input_work,
>>> 		msecs_to_jiffies(1));	// start now
>>> #else
>>> 
>>> Maybe immediately scheduling a delayed work ins't allowed in open()
>>> in some situations?
>>> 
>> I even tried 100 and it did not help.
>> But what helps:
>> moving the initialisation of the workqueue to accel_open().
> 
> That is unexpected. As long as we INIT_DELAYED_WORK first and then
> schedule with a long delay, nothing bad should happen, IMHO.
> 
> Ah, ok. This could indeed be the race: If input_register_device()
> stops the thread and waits until all udev activities are finished,
> udev will indirectly call accel_open() before INIT_DELAYED_WORK.
> 
> Unfortunately Linux is missing any documentation about such concurrency
> issues. At least I never know what is thread-safe and what isn't...

Maybe my wrong model is that activities of a driver start not before
successfully returning from probe(). But the engine may start at
100% thrust before a register() function called by probe() returns...

> 
> Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
> which may lead to problems with the second open().
> 
> So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().
> 
>> 
>> Then evtest works. But that can be dangerous if the device is opened
>> multiple times. Is the current code safe for that anyways? I guess not,
>> canceling the work should only happen at last close() and scheduling it
>> at the first open()
> 
> Ok, this is a situation I have never thought of. So we should count open()
> and close().
> 
> I have added another patch.
> 
>> 
>> But somehow if initializing input_work earlier causes trouble then it
>> means it is damaged somehow. That all feels like memory corruption by a
>> random driver. So now what is the right printk to enable to
>> fix the problem ;-)
>> Well, nightmare ahead...
>> 
>> Maybe I should give the letux 3704 a torture night again so maybe we
>> know whether the problem is in common code or not.
> 
> So let's see what you find.
> 
> BR,
> Nikolaus
> 
> <0001-iio-input-bridge-initialize-delayed-work-before-regi.patch><0002-iio-input-bridge-count-open-close-to-schedule-pollin.patch>_______________________________________________
> http://projects.goldelico.com/p/gta04-kernel/
> Letux-kernel mailing list
> Letux-kernel at openphoenux.org
> http://lists.goldelico.com/mailman/listinfo.cgi/letux-kernel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/f23eec81/attachment-0001.asc>


More information about the Letux-kernel mailing list