[Letux-kernel] SMP issue between LX16 and LX20 found...

Paul Boddie paul at boddie.org.uk
Sun Jun 15 18:53:34 CEST 2025


On Sunday, 15 June 2025 16:40:11 CEST H. Nikolaus Schaller wrote:
> 
> So I currently see a mix of three intermixed issues:
> a) two lx20 boards not booting reliably
> b) letux-6.16-rc1 less reliable than letux-6.15.2
> c) non-SMP compiled kernel works everywhere
> 
> Maybe the dual-edge IRQ symptom also plays a role since IRQs
> are used everywhere (but dual-edge only for the WKUP gpio-key)...

Checking the GPIO registers would confirm whether the IRQs are misconfigured. 
I have tried to modify your jzgpio script to work with the X2000, complicated 
by the UART misbehaving again. This does seem to suggest that the GPIOs are 
indeed set to dual-edge triggering, which is a pretty bad default.

> Looks as if I should fix that and the other issues (second µSD, USB)
> first. On a kernel w/o SMP...

I'm aiming to investigate the SDHCI and USB issues. I did enable debugging for 
the DWC2 OTG controller, and it produces reports related to the endpoints, as 
well as indicating that it is in peripheral mode, but the PHY state is 
possibly crucial, as you noted previously.

[...]

> PS: I looked into /proc/interrupts and it appears as if
> the IRQs are sometimes assigned to a CPU (ttyS2, mmc0) and
> sometimes not (mailbox, OST):
> 
> root at letux:~# cat /proc/interrupts
>            CPU0       CPU1
>   2:       4177          0     MIPS   2  SoC intc cascade interrupt
>   3:       1923       3490     MIPS   3  core mailbox
>   4:     113075       3308     MIPS   4  OST event timer
>   8:          0          0      TCU   0  TCU0
>   9:          0          0      TCU   1  TCU1
>  10:          0          0     INTC   3  13420000.dma-controller
>  31:          1          0 GPIOE  31  WAKEUP
>  32:          0          0     INTC  32  10003000.rtc
>  45:        594          0     INTC  45  ttyS2
>  48:       3584          0     INTC  48  mmc0
> ERR:          0

I would expect IRQs to be distributed to different CPUs. As long as they do 
not get distributed to both at once, and as long as the handling is done in a 
critical section, it should probably work fine.

I have been looking at one of the stack traces you sent, specifically these 
entries:

[   21.169691] [<80029cd0>] show_stack+0x38/0x118
[   21.169707] [<800200d8>] dump_stack_lvl+0x74/0xb0
[   21.169720] [<8094f7b0>] nmi_cpu_backtrace+0x13c/0x144
[   21.169732] [<800263d8>] handle_backtrace+0x10/0x54
[   21.169740] [<801019ec>] __flush_smp_call_function_queue+0x174/0x360
[   21.169751] [<80021da8>] ingenic_xburst2_mbox_handler+0x94/0xc8
[   21.169759] [<800b96ec>] handle_percpu_devid_irq+0xc0/0x194
[   21.169770] [<800b2a90>] handle_irq_desc+0x78/0x90
[   21.169777] [<80971cb0>] do_IRQ+0x18/0x24

In this case, it seems appropriate for the per-CPU mechanism to be used, since 
the mailbox interrupts are indeed specific to a given CPU. The handler can be 
found in arch/mips/ingenic/smp.c and calls the uppermost function above, found 
in linux/smp.c, that may be the one that crashes.

Then again, maybe the interrupt itself is the problem and that the stack trace 
is just indicating the act of receiving it. If so, that might suggest that the 
CPU initialisation isn't done entirely correctly.

But I am very unfamiliar with this code, and still need to acquire a coherent 
picture of the situation, so all of this is speculation.

Paul




More information about the Letux-kernel mailing list