[Letux-kernel] OMAP5: Debugging kernel not starting...

H. Nikolaus Schaller hns at goldelico.com
Mon Sep 10 22:27:21 CEST 2018


Hi Tony,

> Am 10.09.2018 um 19:09 schrieb Tony Lindgren <tony at atomide.com>:
> 
> * H. Nikolaus Schaller <hns at goldelico.com> [180909 08:46]:
>> So what is the canonical way to debug such issues
>> with modern kernels? Is it possible to get something
>> like printascii()? Or make the kernel log go to a
>> fixed address? Or should I hack printk() to write
>> a copy to a private buffer at a static address?
> 
> First try adding debug earlycon to your kernel command
> line. This assumes you have stdout-path in the dts file,
> see patch "ARM: dts: Add chosen for earlycon UART" I sent
> for pyra last year at some point.

Ah, yes. This was missing in my testing tree.

> 
> If that does not get you output early enough, there's
> the DEBUG_LL option but that's rarely needed nowadays.
> 
> And then there's the pstore console support which is
> really nice as you have the buffer available after
> a warm reset on watchdog reboot etc. But sounds like
> that may not help here if you're having trouble with
> reboot :)

Indeed.

Now, I get something for further analysis, which indicates how the kernel
is stumbling...

root at letux:~# dmesg|fgrep early
[    0.000000] Malformed early option 'console'
[    0.000000] earlycon: omapserial0 at MMIO 0x48020000 (options '')

^^^ ok, earlycon is working :)

[    0.000000] Kernel command line: console= root=PARTUUID=7d45bc20-02 rw rootfstype=ext4 rootwait console=ttyO2,115200n8 vram=12M omapfb.vram=0:8M,1:4M omapfb.rotate_type=0 omapdss.def_disp=lcd rootwait twl4030_charger.allow_usb=1 log_buf_len=8M ignore_loglevel earlyprintk earlycon
[    0.000000] early log buf free: 63696(97%)
root at letux:~# reboot

Broadcast message from root at letux (console) (Mon Sep 10 20:05:16 2018):

The system is going down for reboot NOW!

....

[  164.453708] reboot: Restarting system

U-Boot SPL 2016.11-00292-g12720df (Aug 12 2018 - 18:32:51)
OMAP5432-GP ES2.0

....

## Booting kernel from Legacy Image at 82000000 ...
   Image Name:   Linux-4.18.0-letux+
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    4901632 Bytes = 4.7 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
   Loading Kernel Image ... OK
   Loading Device Tree to 8ffed000, end 8fffff57 ... OK

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.18.0-letux+ (hns at iMac.fritz.box) (gcc version 4.9.2 (GCC)) #2664 SMP PREEMPT Mon Sep 10 20:46:49 CEST 2018
[    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] OF: fdt: Machine model: Pyra-Handheld-V5.1.2ff
[    0.000000] Malformed early option 'console'
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] earlycon: omapserial0 at MMIO 0x48020000 (options '')
[    0.000000] bootconsole [omapserial0] enabled
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] cma: Reserved 16 MiB at 0xfd800000
[    0.000000] OMAP4: Map 0xfee00000 to (ptrval) for dram barrier
[    0.000000] On node 0 totalpages: 519680
[    0.000000]   Normal zone: 1536 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 196608 pages, LIFO batch:31
[    0.000000]   HighMem zone: 323072 pages, LIFO batch:31
[    0.000000] OMAP5432 ES2.0
[    0.000000] random: get_random_bytes called from start_kernel+0x80/0x470 with crng_init=0
[    0.000000] percpu: Embedded 17 pages/cpu @(ptrval) s40320 r8192 d21120 u69632
[    0.000000] pcpu-alloc: s40320 r8192 d21120 u69632 alloc=17*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 518144
[    0.000000] Kernel command line: console= root=PARTUUID=7d45bc20-02 rw rootfstype=ext4 rootwait console=ttyO2,115200n8 vram=12M omapfb.vram=0:8M,1:4M omapfb.rotate_type=0 omapdss.def_disp=lcd rootwait twl4030_charger.allow_usb=1 log_buf_len=8M ignore_loglevel earlyprintk earlycon
[    0.000000] log_buf_len: 8388608 bytes
[    0.000000] early log buf free: 63696(97%)
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 2025568K/2078720K available (6823K kernel code, 653K rwdata, 1824K rodata, 1024K init, 221K bss, 36768K reserved, 16384K cma-reserved, 1275904K highmem)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0xf0800000 - 0xff800000   ( 240 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xf0000000   ( 768 MB)
[    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
[    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
[    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (7816 kB)
[    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
[    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 654 kB)
[    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 222 kB)
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[    0.000000]  Tasks RCU enabled.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] clock: dpll_abe_ck failed transition to 'locked'
[    0.000000] OMAP clockevent source: timer1 at 32768 Hz
[    0.000000] arch_timer: cp15 timer(s) running at 6.14MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x16ac02862, max_idle_ns: 440795202218 ns
[    0.000005] sched_clock: 56 bits at 6MHz, resolution 162ns, wraps every 4398046511085ns
[    0.008218] Switching to timer-based delay loop, resolution 162ns
[    0.015171] clocksource: 32k_counter: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 58327039986419 ns
[    0.025273] OMAP clocksource: 32k_counter at 32768 Hz
[    0.031722] Console: colour dummy device 80x30
[    0.036209] Calibrating delay loop (skipped), value calculated using timer frequency.. 12.28 BogoMIPS (lpj=61440)
[    0.046753] pid_max: default: 32768 minimum: 301
[    0.051557] Security Framework initialized
[    0.055723] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.062482] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.070125] CPU: Testing write buffer coherency: ok
[    0.075087] CPU0: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    0.084971] /cpus/cpu at 0 missing clock-frequency property
[    0.090363] /cpus/cpu at 1 missing clock-frequency property
[    0.095794] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.101610] smp: CPU1 parked within kernel, needs reset (0x0 0x801248a0)
[    0.161732] Setting up static identity map for 0x80100000 - 0x80100060
[    0.168443] Hierarchical SRCU implementation.
[    0.212907] smp: Bringing up secondary CPUs ...
[    0.273213] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[    0.273219] CPU1: Spectre v2: using ICIALLU workaround
[    0.284313] smp: Brought up 1 node, 2 CPUs
[    0.288434] SMP: Total of 2 processors activated (24.57 BogoMIPS).
[    0.294807] CPU: All CPU(s) started in HYP mode.
[    0.299509] CPU: Virtualization extensions available.
[    0.305759] devtmpfs: initialized
[    0.319718] VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0
[    0.327882] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.337934] futex hash table entries: 512 (order: 3, 32768 bytes)
[    0.344594] pinctrl core: initialized pinctrl subsystem
[    0.350243] really_probe: driver reg-dummy for reg-dummy bus platform
[    0.357023] really_probe: driver reg-dummy for reg-dummy bus platform ok (1)
[    0.364644] NET: Registered protocol family 16
[    0.370796] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.378566] omap_hwmod: l3_main_3 using broken dt data from ocp
[    0.385175] omap_hwmod: l3_main_2 using broken dt data from ocp
[    1.723766] clock: dpll_abe_ck failed transition to 'locked'

^^^ this seems to be different

[    3.028640] clock: dpll_abe_ck failed transition to 'locked'
[    4.333541] clock: dpll_abe_ck failed transition to 'locked'
[    5.638412] clock: dpll_abe_ck failed transition to 'locked'
[    6.939455] clock: dpll_abe_ck failed transition to 'locked'
[    6.947796] ------------[ cut here ]------------
[    6.952460] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:814 clk_core_disable_lock+0x18/0x24
[    6.961024] aess_fclk already disabled
[    6.964864] Modules linked in:
[    6.967995] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-letux+ #2664
[    6.974954] Hardware name: Generic OMAP5 (Flattened Device Tree)
[    6.981136] [<c01112c8>] (unwind_backtrace) from [<c010c87c>] (show_stack+0x10/0x14)
[    6.989075] [<c010c87c>] (show_stack) from [<c078f894>] (dump_stack+0x7c/0x9c)
[    6.996489] [<c078f894>] (dump_stack) from [<c0132848>] (__warn+0xd8/0x108)
[    7.003632] [<c0132848>] (__warn) from [<c01328bc>] (warn_slowpath_fmt+0x44/0x6c)
[    7.011313] [<c01328bc>] (warn_slowpath_fmt) from [<c049d44c>] (clk_core_disable_lock+0x18/0x24)
[    7.020335] [<c049d44c>] (clk_core_disable_lock) from [<c01200b8>] (_disable_clocks+0x18/0x70)
[    7.029177] [<c01200b8>] (_disable_clocks) from [<c012116c>] (_enable+0x1ec/0x230)
[    7.036947] [<c012116c>] (_enable) from [<c012259c>] (_setup.part.16+0xc8/0x404)
[    7.044537] [<c012259c>] (_setup.part.16) from [<c01215c8>] (_setup+0x14/0x1c)
[    7.051949] [<c01215c8>] (_setup) from [<c01216b4>] (omap_hwmod_for_each+0x3c/0x64)
[    7.059810] [<c01216b4>] (omap_hwmod_for_each) from [<c0a0db90>] (__omap_hwmod_setup_all+0xa4/0xc0)
[    7.069099] [<c0a0db90>] (__omap_hwmod_setup_all) from [<c0102eb4>] (do_one_initcall+0xc0/0x268)
[    7.078121] [<c0102eb4>] (do_one_initcall) from [<c0a01050>] (kernel_init_freeable+0x220/0x36c)
[    7.087050] [<c0a01050>] (kernel_init_freeable) from [<c07a25dc>] (kernel_init+0x8/0x110)
[    7.095443] [<c07a25dc>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    7.103207] Exception stack(0xed8adfb0 to 0xed8adff8)
[    7.108390] dfa0:                                     00000000 00000000 00000000 00000000
[    7.116786] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.125180] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    7.131971] ---[ end trace 02345015e7dc8e1f ]---
[    7.136720] omap_hwmod: aess: _wait_target_ready failed: -16
[    7.142511] omap_hwmod: aess: cannot be enabled for reset (3)
[    8.447580] clock: dpll_abe_ck failed transition to 'locked'
[    9.752444] clock: dpll_abe_ck failed transition to 'locked'

--- and then everything hangs

If I enforce a warm RESET, the Pyra boots again up to the same point of failure.
I have to power-off and on again through Palmas to get it booting again.

Searching for "clock: dpll_abe_ck failed transition to 'locked'" leads to:

	https://e2e.ti.com/support/omap/f/885/t/328485?dpll-abe-ck-failed-transition-to-locked-

And:

	https://www.pyra-handheld.com/pipermail/kernel/2016-May/000877.html

So we had discussed this already a while ago but did not find a solution...

Maybe I should try to disable ABE/AESS for the next experiment (not today).
An try the same on OMAP5EVM (to find out if there are hardware dependencies involved).

BR and thanks,
Nikolaus



More information about the Letux-kernel mailing list