[Letux-kernel] [Tinkerphones] I2C bus issues on Pandora

H. Nikolaus Schaller hns at goldelico.com
Sun Jan 2 19:38:27 CET 2022


Hi,
looks as if the lists were forgotten for reply. I have added them.

> Am 02.01.2022 um 19:17 schrieb Grond <grond66 at riseup.net>:
> 
> 
> On Thu, Dec 02, 2021 at 07:23:50PM +0100, H. Nikolaus Schaller wrote:
> 
> [snip]
> 
>>> 
>>>> 
>>>> But there is a known bug with the bandgap sensor inside the omap3530 (600 MHz).
>>>> And something about an I2C bus timeout which hasn't been seen on dm3730 (1 GHz)
>>>> based devices. 
>>>> 
>>> 
>>> Is there any more detail on these issues anywhere?
>> 
>> Basically I see after a while:
>> 
>> ti-soc-thermal 48002524.bandgap: eocz timed out waiting high
>> 
>> and / or (not directly related)
>> 
>> [ 2061.283721] omap_i2c 48060000.i2c: timeout waiting on XUDF bit
>> 
>> The first one may be a Silicon bug - at least what the OMAP maintainer
>> thought. He recommended to turn it off. On the other hand I know
>> that it did work quite well in kernels at least before 4.0.
>> 
>> The second one comes from i2c3 where the bq27500 fuel gauge and the nubs
>> are connected to. In fact for me the nubs stop working as soon as this
>> message appears.
>> 
>> It may be either a protocol or a speed error. Or something in the drivers
>> of these chips which make them block the bus. Or even a power management
>> issue in the bandgap and i2c controllers on the omap3530 SoC.
>> 
>> In both cases I have planned (but not found time) to experiment with
>> different older kernels to find out in which kernel release it appeared
>> first. Then we may have to git bisect to understand. This is quite time-consuming...
>> 
>> BR,
>> Nikolaus
>> 
> 
> Having disabled the bandgap driver through the DT, the i2c bug still
> shows up. This tends to make me believe that the two are not (directly)
> related. Also I have observed the i2c timeout occurring a few seconds
> before the bandgap timeout error message, so it seems less than likely
> that the bandgap issue can be *causing* the i2c one.
> 
> Somewhat more alarmingly, when I tried use the Pandora's 3.2 kernel to
> get a working baseline for the i2c adaptor, I discovered to my horror
> that the bug had persisted across reboots (and kernel versions). This
> raises the specter of something in the letux kernel causing hardware
> damage to this particular component.
> 
> I have tried probing the i2c bus while it was exhibiting the bug, and
> the behaviour that I observe is SCL getting stuck into a low state, and
> the adaptor seems to be timing out because of that. Proceeding on the
> theory that one of the ATMEGA microcontrollers running the nubs might be
> responsible for jamming the bus, I tried asserting their reset GPIO and
> killing their power via a regulator (both confirmed via probing at
> runtime). Neither was successful in unsticking the bus. That leaves the
> MMA7455L accelerometer,

AFAIR it is not installed, at least not on my unit.

> the PCA9306 level shifter chip and the BQ27500
> fuel gauge as the only remaining non-SoC devices which could be jamming
> the bus. However, since all but the BQ27500 is powered via the same
> regulator as the ATMEGA controllers, I don't see how any of them are
> likely to be the source of the jam (or indeed what else I could do to
> confirm that they are). As for the BQ27500, while it has non-volatile
> flash memory

Yes, I think there was a flashing tool. Basically it defines the nominal
capacity of the battery.

> (which could cause the problem to persist across resets) I
> don't understand how it could have become corrupted from running a newer
> kernel. And even if it's firmware/runtime data have been corrupted in
> such a way that makes it jam it's i2c interface, it looks (from the
> datasheet) like the firmware is inspected and updated via i2c, which
> means that if this is the source of the problem, it will be difficult to
> impossible to fix.
> 
> With that in mind, I'm going to assume (hope) that the issue is inside
> the OMAP itself, and is persisting across reboots in some way that I do
> not understand (hopefully not hardware damage). I'm planning to try
> using the OMAP's i2c debug functionality to run SCL high and see what
> that does. Failing that, pinmuxing should allow me to route a GPIO
> to that pin instead of SCL, which should at least allow me to confirm
> that the OMAP is creating the jam condition.
> 
> Has anyone else on this list encountered this bug before? If you have,
> could you take a moment to check if the bug also turns up with the
> original OpenPandora kernel? If it does, the kernel will have a dmesg
> error which looks something like:
> 
> omap_i2c.3: controller timed out

Oops. Never observed. But I can try.

In any case hardware can get old :(

> If anyone on the list has opinions/relevant knowledge about what I've
> said above (especially concerning the assumptions I've made), please
> feel free to make them known.

What I assume is that something upstream has changed. I did not
yet invest time to try some older kernels where I know that it is not
there.

BTW: there is a similar bug on the GTA04 - but related to HDQ. It
sometimes reports: 

[  574.006469] omap_hdq 480b2000.1w: TX irqstatus not cleared (01)

This wasn't seen in kernels before ca. v5.10.

What we generally must be aware is that things get backported so that
5.4.y may inherit a bug from mainline.

Regarding your issue: did you try to remove the battery to make a real
cold boot? It may also be related to a stuck PMU.

One more thought: do you also use a 600MHz Pandora? It has the omap3530
and not the DM3730. Maybe there is another silicon erratum which should
be applied.

BR,
Nikolaus



More information about the Letux-kernel mailing list