[Letux-kernel] weird problem with pwm_bl on omap3
H. Nikolaus Schaller
hns at goldelico.com
Tue May 29 19:29:38 CEST 2018
Hi,
> Am 29.05.2018 um 18:36 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>
> Hi Andreas,
>
>> Am 29.05.2018 um 18:27 schrieb Andreas Kemnade <andreas at kemnade.info>:
>>
>> On Tue, 29 May 2018 17:41:53 +0200
>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>
>>> Hi Tony,
>>>
>>>> Am 17.05.2018 um 20:08 schrieb Tony Lindgren <tony at atomide.com>:
>>>>
>>>> * H. Nikolaus Schaller <hns at goldelico.com> [180517 12:17]:
>>>>> Hi Tony,
>>>>> we are using for long time the dmtimer/pwm on
>>>>> the GTA04 to drive the display panel backlight.
>>>>>
>>>>> Starting a while ago (I am not sure when, but it may
>>>>> be 4.17-rc1), the device randomly fails to boot
>>>>> with a NULL pointer dereference in strcmp().
>>>>> Booting again usually runs fine.
>>>>
>>>> Hmm maybe enable CONFIG_DEBUG_SLAB=y and POISON options
>>>> and see if that catches something. It's might be some
>>>> array out of bounds type issue.
>>>>
>>>> Regards,
>>>>
>>>> Tony
>>>
>>>
>>> I have restarted hunting the issue but it is very
>>> ghostly. Every time I try to test by another method
>>> it disappears and if I remove my printk things or
>>> /etc/modprobe/blacklist.conf it comes back.
>>>
>> Have you tried to load every module piece by piece before running udev
>> by booting the kernel with a init=/something.sh script?
>> So you have the order under control?
>
> Well, if I load modules manually everything is fine.
> The problem appears to be the concurrency of loading
> and deferred probing of modules.
>
> If I block pwm_bl and omap3dss and some others and load
> them all manually, there was never a problem :(
>
>>
>> Another thing: Are our fb patches evil again?
>
> It also occurs with omapdss blacklisted.
>
>>
>> The offmode patch? Maybe it uncovers some other bug.
>
> Hm. I am not sure if I have included it. I am running with
> mainline with just a handful of our recent patches (not
> everything).
>
> But I can try it as soon as I find time (kernel compiling has
> got much slower in recent years so that I have new ideas
> faster than a test result :).
>
>>
>> Just to sort out some non-mainline stuff
>>
>> I am compiling and will try to boot with my scripts this evening.
>
> Yes, please. At least to check if you can observe the same issue.
> It seems to happen more often on a GTA04A5 that a GTA04A4.
>
> And the strcmp(NULL, "backlight_pins_pinmux") occurred
> first in 4.17-rc1. But it might just be a more visible
> symptom and the bug may be older.
Latest boot log:
[ 8.381225] pwm_backlight_probe
[ 8.384613] pwm-backlight backlight: backlight supply power not found, using dummy regulator
[ 8.503112] pwm_backlight_probe probe error -517
[ 8.517517] wwan_on_off_init: wwan_on_off_init
[ 8.549804] (NULL device *): hwmon: 'gta04-battery' is not a valid name attribute, please fix
[ 8.565521] pps_core: LinuxPPS API ver. 1 registered
[ 8.570709] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti at linux.it>
[ 8.604522] pinctrl_get_group_selector: strcmp: (null) backlight_pins_pinmux
[ 8.612243] iio_charge:-749
[ 8.622192] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 8.641662] pgd = (ptrval)
[ 8.644897] [00000000] *pgd=00000000
[ 8.652252] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[ 8.657897] Modules linked in: pps_core(+) encoder_opa362(+) wwan_on_off(+) snd_soc_gtm601 connector_analog_tv pwm_omap_dmtimer omapdss_base generic_adc_battery pwm_bl wlcore_sdio bmp280_spi bq27xxx_battery_hdq bq27xxx_battery omap_hdq omap2430 ov9655 v4l2_fwnode v4l2_common snd_soc_omap_mcbsp snd_soc_omap snd_pcm_dmaengine bmp280_i2c bmp280 videodev bmc150_magn_i2c tsc2007 bmc150_accel_i2c bmc150_magn at24 bmc150_accel_core leds_tca6507 industrialio_triggered_buffer media kfifo_buf phy_twl4030_usb gpio_twl4030 musb_hdrc twl4030_pwrbutton twl4030_vibra snd_soc_twl4030 twl4030_madc twl4030_charger industrialio gnss_w2sg0004 w2cbw003_bluetooth gnss ehci_omap
[ 8.718811] CPU: 0 PID: 917 Comm: kworker/0:2 Not tainted 4.17.0-rc3-letux+ #2356
[ 8.726623] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 8.733184] Workqueue: events deferred_probe_work_func
[ 8.738555] PC is at strcmp+0x0/0x34
[ 8.742309] LR is at pinctrl_get_group_selector+0x6c/0xa8
[ 8.747955] pc : [<c071a6e0>] lr : [<c0436880>] psr: 60000013
[ 8.754516] sp : ee6a9e08 ip : 00000000 fp : 0000001a
[ 8.759979] r10: 00000017 r9 : 0000001a r8 : c075b144
[ 8.765472] r7 : 00000000 r6 : ee3e4d80 r5 : ef7c4a20 r4 : 00000017
[ 8.772308] r3 : 00000000 r2 : 00000002 r1 : ef7c4a20 r0 : 00000000
[ 8.779144] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 8.786621] Control: 10c5387d Table: ad02c019 DAC: 00000051
[ 8.792633] Process kworker/0:2 (pid: 917, stack limit = 0x(ptrval))
[ 8.799285] Stack: (0xee6a9e08 to 0xee6aa000)
[ 8.803833] 9e00: ee6a9e2c ee3e4d80 ef7c4a20 00000000 ed0cce10 00000000
[ 8.812408] 9e20: ed071140 c0437848 ed0d8210 00000001 00000002 ed265d00 ed071180 ed0711c0
[ 8.820953] 9e40: ed0cce10 00000000 00000000 ed071140 00000000 c0436080 00000014 ee26e4c0
[ 8.829498] 9e60: c0b5a0b0 c0890054 ee2b8010 00000000 ed265d50 ee2b8010 c0bccf44 fffffdfb
[ 8.838073] 9e80: bf1ac014 00000029 c0b95730 c0436248 00000000 ee2b8010 ed071290 c04c8cac
[ 8.846618] 9ea0: ee2b8010 00000000 c0bccf48 c04acd3c 00000000 ee6a9ee8 c04ad048 ee2b8044
[ 8.855194] 9ec0: ef7baf00 c0b03d00 00000000 c04ab59c ee020e6c ed146b38 ee2b8010 c0b64550
[ 8.863769] 9ee0: 00000001 c04acbbc ee2b8010 00000001 00000000 ee2b8010 c0b64550 ee2b8010
[ 8.872314] 9f00: c0b9f060 c04ac178 ee2b8010 c0b6433c c0b64358 c04ac688 c04ac59c ee6caa80
[ 8.880859] 9f20: c0b64370 ef7b7c40 00000000 ef7baf00 c0b03d00 c01444f0 ee6caa80 c0b64370
[ 8.889404] 9f40: ffff8e2b ee6caa80 ef7b7c40 ef7b7c40 ee6a8000 ef7b7c58 c0b03d00 ee6caa98
[ 8.897979] 9f60: 00000008 c0144cc4 ee223280 ee374180 ee374240 00000000 ee6caa80 c0144a04
[ 8.906555] 9f80: ee0bbef0 ee37419c 00000000 c0148cbc ee374240 c0148b88 00000000 00000000
[ 8.915100] 9fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[ 8.923645] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 8.932220] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 8.940795] [<c071a6e0>] (strcmp) from [<c0436880>] (pinctrl_get_group_selector+0x6c/0xa8)
[ 8.949462] [<c0436880>] (pinctrl_get_group_selector) from [<c0437848>] (pinmux_map_to_setting+0x158/0x1a0)
[ 8.959655] [<c0437848>] (pinmux_map_to_setting) from [<c0436080>] (create_pinctrl+0x1f0/0x2f8)
[ 8.968749] [<c0436080>] (create_pinctrl) from [<c0436248>] (devm_pinctrl_get+0x2c/0x6c)
[ 8.977233] [<c0436248>] (devm_pinctrl_get) from [<c04c8cac>] (pinctrl_bind_pins+0x3c/0x138)
[ 8.986083] [<c04c8cac>] (pinctrl_bind_pins) from [<c04acd3c>] (driver_probe_device+0xe8/0x318)
[ 8.995178] [<c04acd3c>] (driver_probe_device) from [<c04ab59c>] (bus_for_each_drv+0x84/0x94)
[ 9.004119] [<c04ab59c>] (bus_for_each_drv) from [<c04acbbc>] (__device_attach+0x88/0xfc)
[ 9.012695] [<c04acbbc>] (__device_attach) from [<c04ac178>] (bus_probe_device+0x28/0x80)
[ 9.021240] [<c04ac178>] (bus_probe_device) from [<c04ac688>] (deferred_probe_work_func+0xec/0x120)
[ 9.030731] [<c04ac688>] (deferred_probe_work_func) from [<c01444f0>] (process_one_work+0x244/0x464)
[ 9.040313] [<c01444f0>] (process_one_work) from [<c0144cc4>] (worker_thread+0x2c0/0x3ec)
[ 9.048889] [<c0144cc4>] (worker_thread) from [<c0148cbc>] (kthread+0x134/0x150)
[ 9.056610] [<c0148cbc>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[ 9.064178] Exception stack(0xee6a9fb0 to 0xee6a9ff8)
[ 9.069458] 9fa0: 00000000 00000000 00000000 00000000
[ 9.078002] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 9.086578] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 9.093505] Code: e3520000 e5e32001 1afffffb e12fff1e (e4d03001)
[ 9.563262] ---[ end trace 27838669a01b24aa ]---
[ 11.037506] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[ 11.078430] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[ 11.100646] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[ 11.119567] cfg80211: failed to load regulatory.db
[ 22.575958] random: crng init done
This was with
CONFIG_HIBERNATION=y
CONFIG_DEBUG_KERNEL=y
CONFIG_PAGE_POISONING=y
CONFIG_PAGE_POISONING_NO_SANITY=y
CONFIG_DEBUG_SLAB=y
The GTA04A5 did finally boot to a login:
root at letux:~# lsmod
Module Size Used by
bnep 20480 2
bluetooth 282624 5 bnep
ecdh_generic 24576 1 bluetooth
usb_f_ecm 16384 1
g_ether 16384 0
usb_f_rndis 20480 2 g_ether
u_ether 20480 3 usb_f_ecm,g_ether,usb_f_rndis
libcomposite 36864 3 usb_f_ecm,g_ether,usb_f_rndis
configfs 32768 4 usb_f_ecm,usb_f_rndis,libcomposite
ipv6 319488 16
wl18xx 86016 1
wlcore 163840 1 wl18xx
mac80211 503808 2 wl18xx,wlcore
cfg80211 491520 3 wl18xx,wlcore,mac80211
panel_tpo_td028ttec1 16384 0
pps_gpio 16384 1
snd_soc_simple_card 16384 1
snd_soc_simple_card_utils 16384 1 snd_soc_simple_card
snd_soc_omap_twl4030 16384 1
pps_core 16384 1 pps_gpio
encoder_opa362 16384 1
wwan_on_off 16384 1
snd_soc_gtm601 16384 0
connector_analog_tv 16384 0
pwm_omap_dmtimer 16384 0
omapdss_base 16384 3 connector_analog_tv,encoder_opa362,panel_tpo_td028ttec1
generic_adc_battery 16384 0
pwm_bl 16384 0
wlcore_sdio 16384 0
bmp280_spi 16384 0
bq27xxx_battery_hdq 16384 0
bq27xxx_battery 20480 1 bq27xxx_battery_hdq
omap_hdq 16384 0
omap2430 16384 0
ov9655 20480 0
v4l2_fwnode 16384 1 ov9655
v4l2_common 16384 1 ov9655
snd_soc_omap_mcbsp 24576 0
snd_soc_omap 16384 1 snd_soc_omap_mcbsp
snd_pcm_dmaengine 16384 1 snd_soc_omap
bmp280_i2c 16384 0
bmp280 20480 2 bmp280_spi,bmp280_i2c
videodev 139264 3 v4l2_fwnode,v4l2_common,ov9655
bmc150_magn_i2c 16384 0
tsc2007 16384 0
bmc150_accel_i2c 16384 0
bmc150_magn 16384 1 bmc150_magn_i2c
at24 20480 0
bmc150_accel_core 20480 1 bmc150_accel_i2c
leds_tca6507 16384 0
industrialio_triggered_buffer 16384 2 bmc150_accel_core,bmc150_magn
media 24576 2 videodev,ov9655
kfifo_buf 16384 1 industrialio_triggered_buffer
phy_twl4030_usb 16384 3
gpio_twl4030 16384 0
musb_hdrc 106496 2 omap2430,phy_twl4030_usb
twl4030_pwrbutton 16384 0
twl4030_vibra 16384 0
snd_soc_twl4030 49152 0
twl4030_madc 16384 0
twl4030_charger 20480 0
industrialio 53248 9 bmc150_accel_core,tsc2007,generic_adc_battery,twl4030_charger,bmp280,bmc150_magn,industrialio_triggered_buffer,twl4030_madc,kfifo_buf
gnss_w2sg0004 16384 0
w2cbw003_bluetooth 16384 0
gnss 16384 1 gnss_w2sg0004
ehci_omap 16384 0
root at letux:~# modprobe omapdss
[ 319.651580] omapdss: unknown parameter 'def_disp' ignored
^C^C^C^C^C^C^C
^C^C^C^C
but then did hang.
I start to think that something in the pinctrl group list management is lacking
a mutex so that the list searched by pinctrl_get_group_selector() is damaged.
Debugging concurrent thread issues is very very difficult...
BR,
Nikolaus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.goldelico.com/pipermail/letux-kernel/attachments/20180529/fd455431/attachment-0001.asc>
More information about the Letux-kernel
mailing list