[Letux-kernel] 4.16 boot hickups
H. Nikolaus Schaller
hns at goldelico.com
Mon Apr 16 22:16:57 CEST 2018
Hi,
> Am 13.04.2018 um 22:48 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>
>
>> Am 13.04.2018 um 19:04 schrieb Andreas Kemnade <andreas at kemnade.info>:
>>
>> On Fri, 13 Apr 2018 17:51:38 +0200
>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>
>>> Hi,
>>>
>>>> Am 13.04.2018 um 10:43 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>>
>>>> I wasn't able to mount NAND on this device manually either.
>>>>
>>>> So I'll reformat it and then let's see if the problem is still there.
>>>
>>> Well, the ubi0 error: scan_peb: bad image sequence number 1311990011 in PEB 1980, expected 1795890576
>>> is gone, but the hickup is still there.
>>>
>>> GTA04A5: boots 10 of 10 attempts fine with boot after battery-insert
>>> GTA04A4: boots 2-3 attempts of 10 fine with boot after battery-insert
>>> no problem after force-shutdown and power-on
>>>
>>> With letux-4.16-rc6 I have 10/10 on GTA04A4.
>>>
>>> So either the OneNAND fix breaks the GTA04A4 or there is some other change
>>> between letux-4.16-rc6 and letux-4.16 that is NOT active on GTA04A5. Maybe
>>> some peripheral driver?
>>>
>> Maybe you simple disable nand in kernel config completely (or compile
>> as modules) and compare?
>
> Ah, good idea to check if it is NAND driver related or not.
>
> But, it turned out that neither CONFIG_MTD=n nor CONFIG_MTD_NAND=n
> did help (except that it spuriously turns the corner and boots).
>
> So the only known factor so far is the offmode patch. But there must
> be another one that differentiates between A4 and A5.
>
> Something for the weekend to ponder on...
Well, pondering about it didn't reveal a result,
but by good luck I found that this patch solves the
problem:
From d6481ad8b1291e063078e60937b1c6a9d2ddc7d6 Mon Sep 17 00:00:00 2001
From: "H. Nikolaus Schaller" <hns at goldelico.com>
Date: Mon, 16 Apr 2018 20:51:56 +0200
Subject: [PATCH] hack that makes GTA04A4 boot again
Without this, the kernel often (but not always) hangs after printing:
[ 3.079132] Initializing XFRM netlink socket
[ 3.083648] NET: Registered protocol family 17
[ 3.088470] NET: Registered protocol family 15
[ 3.093200] Key type dns_resolver registered
[ 3.099060] omap2_set_init_voltage: unable to find boot up OPP for vdd_core
[ 3.106628] omap2_set_init_voltage: unable to set vdd_core
With this patch it boots reliably:
[ 3.066986] Initializing XFRM netlink socket
[ 3.071472] NET: Registered protocol family 17
[ 3.076293] NET: Registered protocol family 15
[ 3.081024] Key type dns_resolver registered
[ 3.086914] omap2_set_init_voltage: unable to find boot up OPP for vdd_core
[ 3.094451] omap2_set_init_voltage: unable to set vdd_core
[ 3.101837] omap3_pm_off_mode_enable(1)
[ 3.106781] ThumbEE CPU extension supported.
[ 3.111297] mmc0: host does not support reading read-only switch, assuming write-enable
[ 3.119781] Registering SWP/SWPB emulation handler
---
arch/arm/mach-omap2/pm34xx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
index 9516e203ca67..9222f2ff5ba0 100644
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -366,6 +366,8 @@ void omap3_pm_off_mode_enable(int enable)
struct power_state *pwrst;
u32 state;
+printk("%s(%d)\n", __func__, enable);
+
if (enable)
state = PWRDM_POWER_OFF;
else
--
2.12.2
But I have no idea why...
I found this hack by adding several printk() to arch/arm/mach-omap2/pm34xx.c
It turned out that a printk() at the beginning of omap3_pm_init() would also
work.
Maybe there is a race between omap2_set_init_voltage() and omap3_pm_off_mode_enable()?
This would explain why the hickup does not happen on all GTA04 boards and not always
and not if we do a warm reboot.
Or a different idea: printk() has some side-effect to omap3_pm_off_mode_enable().
This effect could be missing if we do not call printk() right before omap_set_pwrdm_state().
Or the pwrst list changes by some other activity running in a parallel thread?
Is omap3_pm_off_mode_enable() called a little too early?
Some more tests could be to move the printk to omap_set_pwrdm_state() and/or
print the list. Or try to replace by some udelay();
So for the moment we can add the hack to letux-4.16 (an additional log line doesn't
harm much) and independently do further inspection. The reason why I prefer to add such
a hack is that we already have 4.16.2 upstream and 4.17-rc1 is here and waits
for debugging.
BR,
Nikolaus
More information about the Letux-kernel
mailing list