[Letux-kernel] 4.16 boot hickups

H. Nikolaus Schaller hns at goldelico.com
Mon Apr 16 22:16:57 CEST 2018


Hi,

> Am 13.04.2018 um 22:48 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
> 
> 
>> Am 13.04.2018 um 19:04 schrieb Andreas Kemnade <andreas at kemnade.info>:
>> 
>> On Fri, 13 Apr 2018 17:51:38 +0200
>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>> 
>>> Hi,
>>> 
>>>> Am 13.04.2018 um 10:43 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>> 
>>>> I wasn't able to mount NAND on this device manually either.
>>>> 
>>>> So I'll reformat it and then let's see if the problem is still there.
>>> 
>>> Well, the ubi0 error: scan_peb: bad image sequence number 1311990011 in PEB 1980, expected 1795890576
>>> is gone, but the hickup is still there.
>>> 
>>> GTA04A5:	boots 10 of 10 attempts fine with boot after battery-insert
>>> GTA04A4:	boots 2-3 attempts of 10 fine with boot after battery-insert
>>> 		no problem after force-shutdown and power-on
>>> 
>>> With letux-4.16-rc6 I have 10/10 on GTA04A4.
>>> 
>>> So either the OneNAND fix breaks the GTA04A4 or there is some other change
>>> between letux-4.16-rc6 and letux-4.16 that is NOT active on GTA04A5. Maybe
>>> some peripheral driver?
>>> 
>> Maybe you simple disable nand in kernel config completely (or compile
>> as modules) and compare?
> 
> Ah, good idea to check if it is NAND driver related or not.
> 
> But, it turned out that neither CONFIG_MTD=n nor CONFIG_MTD_NAND=n
> did help (except that it spuriously turns the corner and boots).
> 
> So the only known factor so far is the offmode patch. But there must
> be another one that differentiates between A4 and A5.
> 
> Something for the weekend to ponder on...

Well, pondering about it didn't reveal a result,
but by good luck I found that this patch solves the
problem:

From d6481ad8b1291e063078e60937b1c6a9d2ddc7d6 Mon Sep 17 00:00:00 2001
From: "H. Nikolaus Schaller" <hns at goldelico.com>
Date: Mon, 16 Apr 2018 20:51:56 +0200
Subject: [PATCH] hack that makes GTA04A4 boot again

Without this, the kernel often (but not always) hangs after printing:
[    3.079132] Initializing XFRM netlink socket
[    3.083648] NET: Registered protocol family 17
[    3.088470] NET: Registered protocol family 15
[    3.093200] Key type dns_resolver registered
[    3.099060] omap2_set_init_voltage: unable to find boot up OPP for vdd_core
[    3.106628] omap2_set_init_voltage: unable to set vdd_core

With this patch it boots reliably:
[    3.066986] Initializing XFRM netlink socket
[    3.071472] NET: Registered protocol family 17
[    3.076293] NET: Registered protocol family 15
[    3.081024] Key type dns_resolver registered
[    3.086914] omap2_set_init_voltage: unable to find boot up OPP for vdd_core
[    3.094451] omap2_set_init_voltage: unable to set vdd_core
[    3.101837] omap3_pm_off_mode_enable(1)
[    3.106781] ThumbEE CPU extension supported.
[    3.111297] mmc0: host does not support reading read-only switch, assuming write-enable
[    3.119781] Registering SWP/SWPB emulation handler
---
 arch/arm/mach-omap2/pm34xx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
index 9516e203ca67..9222f2ff5ba0 100644
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -366,6 +366,8 @@ void omap3_pm_off_mode_enable(int enable)
        struct power_state *pwrst;
        u32 state;
 
+printk("%s(%d)\n", __func__, enable);
+
        if (enable)
                state = PWRDM_POWER_OFF;
        else
-- 
2.12.2

But I have no idea why...

I found this hack by adding several printk() to arch/arm/mach-omap2/pm34xx.c
It turned out that a printk() at the beginning of omap3_pm_init() would also
work.

Maybe there is a race between omap2_set_init_voltage() and omap3_pm_off_mode_enable()?
This would explain why the hickup does not happen on all GTA04 boards and not always
and not if we do a warm reboot.

Or a different idea: printk() has some side-effect to omap3_pm_off_mode_enable().
This effect could be missing if we do not call printk() right before omap_set_pwrdm_state().

Or the pwrst list changes by some other activity running in a parallel thread?

Is omap3_pm_off_mode_enable() called a little too early?

Some more tests could be to move the printk to omap_set_pwrdm_state() and/or
print the list. Or try to replace by some udelay();

So for the moment we can add the hack to letux-4.16 (an additional log line doesn't
harm much) and independently do further inspection. The reason why I prefer to add such
a hack is that we already have 4.16.2 upstream and 4.17-rc1 is here and waits
for debugging.

BR,
Nikolaus



More information about the Letux-kernel mailing list