[Letux-kernel] Pandora: XUDF and other issues

Grond grond66 at riseup.net
Sun Feb 20 08:21:19 CET 2022

On Sat, Feb 19, 2022 at 05:25:00PM +0100, H. Nikolaus Schaller wrote:
> Hi Grond,
> > Am 18.02.2022 um 22:47 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
> > 
> >> 
> >> So let's check if reverting this commit makes a difference. If not the
> >> bisect was on the wrong track (due to randomness effects).
> > 
> > Hm. It was a false alarm.
> > 
> > The file drivers/firmware/ti_sci.c is not even compiled
> > since we don't have CONFIG_TI_SCI_PROTOCOL in the defconfig.
> > 
> > So I'll repeat the bisect.
> Well, I did repeat it.
> With a really difficult to explain result:
> this time, the first bad commit is the one I initially specified as bad (v5.3).
> This means the XUDF problem was not reproducible any more
> over ca. 15 installs of new kernel variants, rebooting, waiting,
> running high-load for 30 seconds and then checking dmesg for signs
> of XUDF issues.
> After a while of experimentation I realized one more change I
> had done: I did enable charging with 800mA over USB because
> the battery was only ca. 12%. And there is plenty of time while
> git bisect compiles a new kernel. So why not take the chance to
> recharge the battery during this idle time?
> This brought me to another experiment. I did run ./high-load >/dev/null &
> And experimented with /sys/class/power_supply/twl4030_usb/input_current_limit
> Indeed, increasing to 800000 made the XUDF issues go away and
> 500000 made them appear occasionally. Also backlight going off
> made them go away. And pressing a key turned on backlight and
> XUDF was back.
> So this explains a lot: it is a hardware issue or limitation.
> It seems to depend on total power consumption (i.e. backlight and
> processor load - which can be different in different kernel version)
> and battery charge level and how much the USB power supply can
> compensate.
> Maybe batteries are 10 years old and getting weak? Although they
> reach full capacity and seem to charge normally.
> This may also explain why you did see it first when running Letux
> kernels (can have higher power demand) and why it was still effective
> when booting the Pandora OS from NAND.
> Please can you try to find out if it depends on USB charging, backlight
> or similar effects on your unit?
I will try. But there are a few caveats with this. We appear to be
experiencing two different sets of symptoms. For you, the bus timeout/
XUDF issue occurs when power consumption goes above some threshold. On
my unit it appears to be caused by something totally jamming the SCL
line (it is completely pulled down starting at boot time, and never goes
high). It is constant (100% of bus transactions fail) and happens
reliably across kernel versions. Right now, I have the battery pulled
for a few days in the probably vain hope that this changes anything, but
when I'm done with that experiment, I'll give this a try.

What happens on your unit when the battery is removed and it is powered
via the barrel jack? If this is related to the battery being unable to
provide enough power this should presumably keep the symptoms from
manifesting by running the whole system in constant voltage mode...

> So it seems to be nothing we can solve by bisecting for a kernel bug.
> I just started to cross-check letux-5.17-rc4. At the moment it only shows
> the
> [  330.002105] ti-soc-thermal 48002524.bandgap: eocz timed out waiting high
> This does not appear to depend on input_current_limit or anything else.
> But it also occurs in a high-load situation.
> This looks like a "real bug" - which hopefully can be bisected more
> repeatable.
> BR and thanks,
> Nikolaus


Attached is my PGP public key.
Primary key fingerprint: B7C7 AD66 D9AF 4348 0238  168E 2C53 D8FA 55D8 9FD9

If you have a PGP key (and a minute to spare)
please send it in reply to this email.

If you have no idea what PGP is, feel free
to ignore all this gobbledegook.

More information about the Letux-kernel mailing list