[Letux-kernel] [Tinkerphones] I2C bus issues on Pandora

Grond grond66 at riseup.net
Sun Jan 2 21:00:12 CET 2022


On Sun, Jan 02, 2022 at 07:38:27PM +0100, H. Nikolaus Schaller wrote:
> Hi,
> looks as if the lists were forgotten for reply. I have added them.
Yup. Thanks for catching this.

> 
> > Am 02.01.2022 um 19:17 schrieb Grond <grond66 at riseup.net>:
> > 
> > 
> > On Thu, Dec 02, 2021 at 07:23:50PM +0100, H. Nikolaus Schaller wrote:
> > 
> > [snip]
> > 
> >>> 
> >>>> 
> >>>> But there is a known bug with the bandgap sensor inside the omap3530 (600 MHz).
> >>>> And something about an I2C bus timeout which hasn't been seen on dm3730 (1 GHz)
> >>>> based devices. 
> >>>> 
> >>> 
> >>> Is there any more detail on these issues anywhere?
> >> 
> >> Basically I see after a while:
> >> 
> >> ti-soc-thermal 48002524.bandgap: eocz timed out waiting high
> >> 
> >> and / or (not directly related)
> >> 
> >> [ 2061.283721] omap_i2c 48060000.i2c: timeout waiting on XUDF bit
> >> 
> >> The first one may be a Silicon bug - at least what the OMAP maintainer
> >> thought. He recommended to turn it off. On the other hand I know
> >> that it did work quite well in kernels at least before 4.0.
> >> 
> >> The second one comes from i2c3 where the bq27500 fuel gauge and the nubs
> >> are connected to. In fact for me the nubs stop working as soon as this
> >> message appears.
> >> 
> >> It may be either a protocol or a speed error. Or something in the drivers
> >> of these chips which make them block the bus. Or even a power management
> >> issue in the bandgap and i2c controllers on the omap3530 SoC.
> >> 
> >> In both cases I have planned (but not found time) to experiment with
> >> different older kernels to find out in which kernel release it appeared
> >> first. Then we may have to git bisect to understand. This is quite time-consuming...
> >> 
> >> BR,
> >> Nikolaus
> >> 
> > 
> > Having disabled the bandgap driver through the DT, the i2c bug still
> > shows up. This tends to make me believe that the two are not (directly)
> > related. Also I have observed the i2c timeout occurring a few seconds
> > before the bandgap timeout error message, so it seems less than likely
> > that the bandgap issue can be *causing* the i2c one.
> > 
> > Somewhat more alarmingly, when I tried use the Pandora's 3.2 kernel to
> > get a working baseline for the i2c adaptor, I discovered to my horror
> > that the bug had persisted across reboots (and kernel versions). This
> > raises the specter of something in the letux kernel causing hardware
> > damage to this particular component.
> > 
> > I have tried probing the i2c bus while it was exhibiting the bug, and
> > the behaviour that I observe is SCL getting stuck into a low state, and
> > the adaptor seems to be timing out because of that. Proceeding on the
> > theory that one of the ATMEGA microcontrollers running the nubs might be
> > responsible for jamming the bus, I tried asserting their reset GPIO and
> > killing their power via a regulator (both confirmed via probing at
> > runtime). Neither was successful in unsticking the bus. That leaves the
> > MMA7455L accelerometer,
> 
> AFAIR it is not installed, at least not on my unit.
> 
> > the PCA9306 level shifter chip and the BQ27500
> > fuel gauge as the only remaining non-SoC devices which could be jamming
> > the bus. However, since all but the BQ27500 is powered via the same
> > regulator as the ATMEGA controllers, I don't see how any of them are
> > likely to be the source of the jam (or indeed what else I could do to
> > confirm that they are). As for the BQ27500, while it has non-volatile
> > flash memory
> 
> Yes, I think there was a flashing tool. Basically it defines the nominal
> capacity of the battery.
> 
> > (which could cause the problem to persist across resets) I
> > don't understand how it could have become corrupted from running a newer
> > kernel. And even if it's firmware/runtime data have been corrupted in
> > such a way that makes it jam it's i2c interface, it looks (from the
> > datasheet) like the firmware is inspected and updated via i2c, which
> > means that if this is the source of the problem, it will be difficult to
> > impossible to fix.
> > 
> > With that in mind, I'm going to assume (hope) that the issue is inside
> > the OMAP itself, and is persisting across reboots in some way that I do
> > not understand (hopefully not hardware damage). I'm planning to try
> > using the OMAP's i2c debug functionality to run SCL high and see what
> > that does. Failing that, pinmuxing should allow me to route a GPIO
> > to that pin instead of SCL, which should at least allow me to confirm
> > that the OMAP is creating the jam condition.
> > 
> > Has anyone else on this list encountered this bug before? If you have,
> > could you take a moment to check if the bug also turns up with the
> > original OpenPandora kernel? If it does, the kernel will have a dmesg
> > error which looks something like:
> > 
> > omap_i2c.3: controller timed out
> 
> Oops. Never observed. But I can try.
Thanks. This problem is really mystifying, any more data points will
help.

> 
> In any case hardware can get old :(
It can, but this issue cropped up right after I the first time I booted
letux kernels. (I use the pandora as a daily driver, so I definitely
would have noticed if the fuel gauge stopped working.)

> 
> > If anyone on the list has opinions/relevant knowledge about what I've
> > said above (especially concerning the assumptions I've made), please
> > feel free to make them known.
> 
> What I assume is that something upstream has changed. I did not
> yet invest time to try some older kernels where I know that it is not
> there.
> 
> BTW: there is a similar bug on the GTA04 - but related to HDQ. It
> sometimes reports: 
> 
> [  574.006469] omap_hdq 480b2000.1w: TX irqstatus not cleared (01)
> 
> This wasn't seen in kernels before ca. v5.10.
> 
> What we generally must be aware is that things get backported so that
> 5.4.y may inherit a bug from mainline.
> 
> Regarding your issue: did you try to remove the battery to make a real
> cold boot? It may also be related to a stuck PMU.
That's what I tried originally. And no luck. By "stuck PMU" you mean the
TWL4030 erroneously keeping power to the OMAP on, right?

> 
> One more thought: do you also use a 600MHz Pandora? It has the omap3530
> and not the DM3730. Maybe there is another silicon erratum which should
> be applied.
I am using the 600MHz. So maybe? Looking through the known errata for
this device, I haven't found one that fits, exactly (I'll keep looking).

> 
> BR,
> Nikolaus
> 

Thanks,
	--Grond

-- 

Attached is my PGP public key.
Primary key fingerprint: B7C7 AD66 D9AF 4348 0238  168E 2C53 D8FA 55D8 9FD9

If you have a PGP key (and a minute to spare)
please send it in reply to this email.

If you have no idea what PGP is, feel free
to ignore all this gobbledegook.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-keys
Size: 6242 bytes
Desc: PGP Key 0x2C53D8FA55D89FD9.
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20220102/4836b6b0/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20220102/4836b6b0/attachment-0001.asc>


More information about the Letux-kernel mailing list