[Letux-kernel] SMP issue between LX16 and LX20 found...
Paul Boddie
paul at boddie.org.uk
Sun Jun 15 18:53:34 CEST 2025
On Sunday, 15 June 2025 16:40:11 CEST H. Nikolaus Schaller wrote:
>
> So I currently see a mix of three intermixed issues:
> a) two lx20 boards not booting reliably
> b) letux-6.16-rc1 less reliable than letux-6.15.2
> c) non-SMP compiled kernel works everywhere
>
> Maybe the dual-edge IRQ symptom also plays a role since IRQs
> are used everywhere (but dual-edge only for the WKUP gpio-key)...
Checking the GPIO registers would confirm whether the IRQs are misconfigured.
I have tried to modify your jzgpio script to work with the X2000, complicated
by the UART misbehaving again. This does seem to suggest that the GPIOs are
indeed set to dual-edge triggering, which is a pretty bad default.
> Looks as if I should fix that and the other issues (second µSD, USB)
> first. On a kernel w/o SMP...
I'm aiming to investigate the SDHCI and USB issues. I did enable debugging for
the DWC2 OTG controller, and it produces reports related to the endpoints, as
well as indicating that it is in peripheral mode, but the PHY state is
possibly crucial, as you noted previously.
[...]
> PS: I looked into /proc/interrupts and it appears as if
> the IRQs are sometimes assigned to a CPU (ttyS2, mmc0) and
> sometimes not (mailbox, OST):
>
> root at letux:~# cat /proc/interrupts
> CPU0 CPU1
> 2: 4177 0 MIPS 2 SoC intc cascade interrupt
> 3: 1923 3490 MIPS 3 core mailbox
> 4: 113075 3308 MIPS 4 OST event timer
> 8: 0 0 TCU 0 TCU0
> 9: 0 0 TCU 1 TCU1
> 10: 0 0 INTC 3 13420000.dma-controller
> 31: 1 0 GPIOE 31 WAKEUP
> 32: 0 0 INTC 32 10003000.rtc
> 45: 594 0 INTC 45 ttyS2
> 48: 3584 0 INTC 48 mmc0
> ERR: 0
I would expect IRQs to be distributed to different CPUs. As long as they do
not get distributed to both at once, and as long as the handling is done in a
critical section, it should probably work fine.
I have been looking at one of the stack traces you sent, specifically these
entries:
[ 21.169691] [<80029cd0>] show_stack+0x38/0x118
[ 21.169707] [<800200d8>] dump_stack_lvl+0x74/0xb0
[ 21.169720] [<8094f7b0>] nmi_cpu_backtrace+0x13c/0x144
[ 21.169732] [<800263d8>] handle_backtrace+0x10/0x54
[ 21.169740] [<801019ec>] __flush_smp_call_function_queue+0x174/0x360
[ 21.169751] [<80021da8>] ingenic_xburst2_mbox_handler+0x94/0xc8
[ 21.169759] [<800b96ec>] handle_percpu_devid_irq+0xc0/0x194
[ 21.169770] [<800b2a90>] handle_irq_desc+0x78/0x90
[ 21.169777] [<80971cb0>] do_IRQ+0x18/0x24
In this case, it seems appropriate for the per-CPU mechanism to be used, since
the mailbox interrupts are indeed specific to a given CPU. The handler can be
found in arch/mips/ingenic/smp.c and calls the uppermost function above, found
in linux/smp.c, that may be the one that crashes.
Then again, maybe the interrupt itself is the problem and that the stack trace
is just indicating the act of receiving it. If so, that might suggest that the
CPU initialisation isn't done entirely correctly.
But I am very unfamiliar with this code, and still need to acquire a coherent
picture of the situation, so all of this is speculation.
Paul
More information about the Letux-kernel
mailing list