[Letux-kernel] mmc1 errors on Beaglebone Black since 5.17-rc3

H. Nikolaus Schaller hns at goldelico.com
Thu Feb 17 20:00:20 CET 2022


Hi Jean,

> Am 17.02.2022 um 11:10 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
> 
> 
>> Am 15.02.2022 um 10:41 schrieb Jean Rene Dawin <jdawin at math.uni-bielefeld.de>:
>> 
>> Hi,
>> 
>> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
>> Black, but didn't check the logs.
>> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
>> traffic there were messages like
>> 
>> [  662.529584] mmc1: error -110 doing runtime resume
>> [  669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
>> 
>> [  739.076072] mmc1: Timeout waiting for hardware interrupt.
>> [  739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [  739.231053] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101
>> [  739.316472] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
>> [  739.401937] mmc1: sdhci: Argument:  0x00342d30 | Trn mode: 0x00000023
>> [  739.487439] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000
>> [  739.573007] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
>> [  739.658609] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00003c07
>> [  739.744224] mmc1: sdhci: Timeout:   0x00000007 | Int stat: 0x00000002
>> [  739.829896] mmc1: sdhci: Int enab:  0x027f000b | Sig enab: 0x027f000b
>> [  739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>> [  740.001394] mmc1: sdhci: Caps:      0x07e10080 | Caps_1:   0x00000000
>> [  740.087208] mmc1: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
>> [  740.173051] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
>> [  740.258928] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>> [  740.344854] mmc1: sdhci: Host ctl2: 0x00000000
>> [  740.402796] mmc1: sdhci: ============================================
>> 
>> and finally IO errors and a corrupted filesystem.
>> 
>> 5.17.0-rc4-letux+ shows the same behaviour.
>> 
>> Has anyone seen this, too?
> 
> I just upgraded some BBB to 5.17-rc2 and yes, it seems to boot very sluggish (>90 seconds until login).
> Even as I boot from an SD card. So it is not your eemc but the mmc interface.
> 
> I got e.g.:
> 
> [  121.908241] mmc1: Card stuck being busy! __mmc_poll_for_busy
> [  121.914472] mmc1: error -110 doing runtime resume
> [  122.294220] mmc1: Card stuck being busy! __mmc_poll_for_busy
> [  122.300332] I/O error, dev mmcblk1, sector 0 op 0x0:(READ) flags 0x4000 phys_seg 21 prio class 2
> 
> ...
> 
> We have to run a git bisect.

Fortunately I was able to set up a git bisect that can run unattended (by installing a new kernel over ethernet and running "reboot" instead of manually swapping µSD cards) and got:

76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
Author: Huijin Park <huijin.park at samsung.com>
Date:   Thu Nov 4 15:32:31 2021 +0900

   mmc: core: adjust polling interval for CMD1

   In mmc_send_op_cond(), loops are continuously performed at the same
   interval of 10 ms.  However the behaviour is not good for some eMMC
   which can be out from a busy state earlier than 10 ms if normal.

   Rather than fixing about the interval time in mmc_send_op_cond(),
   let's instead convert into using the common __mmc_poll_for_busy().

   The reason for adjusting the interval time is that it is important
   to reduce the eMMC initialization time, especially in devices that
   use eMMC as rootfs.

   Test log(eMMC:KLM8G1GETF-B041):

   before: 12 ms (0.311016 - 0.298729)
   [    0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
   [    0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
   [    0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
   [    0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007

   after: 2 ms (0.301270 - 0.298762)
   [    0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
   [    0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
   [    0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
   [    0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
   [    0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
   [    0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
   [    0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
   [    0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007

   Signed-off-by: Huijin Park <huijin.park at samsung.com>
   Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@samsung.com
   Signed-off-by: Ulf Hansson <ulf.hansson at linaro.org>

Reverting this makes letux-5.17-rc[1-4] work.

Maybe you can try and confirm?


BTW: while testing the bisect setup I think I found other (unrelated) issues already in letux-5.16:
* ifconfig does not show "usb0" but some "sit0" which I have not seen elsewhere.
* it appears as if the chipsee touch screen is broken and there is no /dev/input event for it.

BR,
Nikolaus



More information about the Letux-kernel mailing list