[Letux-kernel] accelerometer bridge madness

H. Nikolaus Schaller hns at goldelico.com
Fri Jul 27 16:58:08 CEST 2018


Hi,

> Am 27.07.2018 um 12:55 schrieb Andreas Kemnade <andreas at kemnade.info>:
> 
>> 
>> That is unexpected. As long as we INIT_DELAYED_WORK first and then
>> schedule with a long delay, nothing bad should happen, IMHO.
>> 
> Well, the main question is whether iio-bridge is the victim or the
> suspect. Perhaps we have the same situation as with pinctl and gab.

Yes. There we have found bugs in pinmux - but they were not the reason
the real problemy...

> So we will find a lot of minor issues here and the real problem is
> somewhere else.

But wrong INIT for the scheduled work is one of these minor ones...

> 
>> Ah, ok. This could indeed be the race: If input_register_device()
>> stops the thread and waits until all udev activities are finished,
>> udev will indirectly call accel_open() before INIT_DELAYED_WORK.
> 
> Well, udev first has to create a node and then something can open it (and I still do not
> understand why stretch udev messes with the evdev).

Well, it wants to create "virtual" input devices. This lib/udev/accelerometer
reads the accelerometer to provide new events for position changes.

> I will try to enable the printk in open to see what really happens.
> I hope the bug will not disappear.

Yes, I keep fingers crossed!

>> 
>> Unfortunately Linux is missing any documentation about such concurrency
>> issues. At least I never know what is thread-safe and what isn't...
>> 
>> Moving INIT_DELAYED_WORK to accel_open() wipes out an existing workqueue
>> which may lead to problems with the second open().
>> 
> Well, if the work is not scheduled anymore (due to a close(), it is probably no problem,
> but again opening it multiple times at the same time might be dangerous.

Indeed.

> 
>> So I have attached a patch to move INIT_DELAYED_WORK before input_register_device().
>> 
> I already tried it yesterday. It did not help. I might try again.
> 
>>> 
>>> Then evtest works. But that can be dangerous if the device is opened
>>> multiple times. Is the current code safe for that anyways? I guess not,
>>> canceling the work should only happen at last close() and scheduling it
>>> at the first open()
>> 
>> Ok, this is a situation I have never thought of. So we should count open()
>> and close().
>> 
>> I have added another patch.
>> 
> which adds another set of race conditions, probably mainly on multicore systems.
> kref_get() and kref_put() are the tools to use afaik.

Ok, you are right a counter++ or counter-- isn't really multicore-safe.

> 
> But I will first check how often open() is called.

Yes, that is the important thing. And when it is called. You might add some
printk to the probe function to identify the sequence.

BR,
Nikolaus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180727/9176b139/attachment.asc>


More information about the Letux-kernel mailing list