[Letux-kernel] OMAP5: Debugging kernel not starting... (and clock: dpll_abe_ck failed transition to 'locked')

Tue Sep 25 08:47:36 CEST 2018

Hi Tero,
sorry for the delayed response.

> Am 17.09.2018 um 09:30 schrieb Tero Kristo <t-kristo at ti.com>:
> 
> On 15/09/18 18:50, H. Nikolaus Schaller wrote:
>> Hi,
>>> Am 13.09.2018 um 12:16 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>> 
>>> 
>>> I have done further tests on the Pyra and
>>> 
>>> a) the problem with pwm_bl goes away if I comment out the keyboard backlight (timer8) from DT and keep timer9 only.
>>>   It can still independently be triggered by modprobing snd_soc_omap_abe_twl6040
>>> 
>>> b) it appears that the problem is already triggered by modprobe and does not go away by modprobe -r, so it
>>>   is unlikely a cleanup problem before reboot.
>>> 
>>> All this indicates that some clock dividers or switches are not properly controlled.
>>> 
>>> But I have still not found what makes the significant difference between Pyra and uevm setup.
>> Having done more experiments, I found the following.
>> I see the problem on Pyra even when only modprobing pwm_omap_dmtimer but not pwm_bl.
>> So the problem is indeed in omap timer/pwm management and not outside.
>> This means the backlight pwm isn't even active and I have checked that there are
>> no calls to pwm_omap_dmtimer_config() or pwm_omap_dmtimer_enable() because pwm_bl
>> isn't loaded.
>> A simple run through pwm_omap_dmtimer_probe() by modprobe pwm_omap_dmtimer suffices
>> to trigger the reboot problem afterwards.
>> Next, I tried to inject printk() into pwm_omap_dmtimer_probe() to isolate where the problem starts.
>> It turns out that disabling the call
>> 	dm_timer = pdata->request_by_node(timer);
>> in drivers/pwm/pwm-omap-dmtimer.c suffices to make the problem go away. Of course without
>> working PWM even if I modprobe pwm_bl.
>> This seems to call _omap_dm_timer_request().
>> The deepest level I could identify is the call to
>> 	pm_runtime_get_sync(&timer->pdev->dev);
>> in omap_dm_timer_enable().
>> If I make omap_dm_timer_enable() return 0 before this call, I can reboot without problems.
>> The call sequence known to trigger the problem is so far:
>> pwm_omap_dmtimer_probe()
>> 	dm_timer = pdata->request_by_node(timer);
>> 		omap_dm_timer_request_by_node();
>> 			_omap_dm_timer_request();
>> 				omap_dm_timer_enable()
>> 					pm_runtime_get_sync(&timer->pdev->dev);
>> So it is a pm_runtime problem for timer8 (and others).
>> But not really in the pwm-omap-dmtimer and timer-ti-dm drivers but
>> it collides with something else (which is on the Pyra but not on the OMAP5EVM).
>> I had thought about adding printk into pm_runtime_get_sync but this is no longer
>> specific to the dm-timers. The same would be for printk in the hwmods for timers.
>> Any suggestions how to find out what the second factor could be?
> 
> Have you traced the reset signals on your board? How is sys_nreswarm connected for example compared to omap5uevm board?

I have done this comparison and we have indeed differences. The tca6424 is not connected to nreswarm on the Pyra (but to a separate gpio which is activated by u-boot). And there is a potentially very significant one: we have added an 1nF capacitor to GND.

The reason is that the nreswarm line was found to be extremely sensitive to external noise in nanosecond time frame. It suffices to tap the nreswarm pad of the RESET button with a 30 cm long measurement cable with nothing connected to the other end, to to trigger a reboot. So the tiny capacitive energy of the cable seems to trigger a reset. And this happens although the nreswarm has a 10 kOhm pull-up to 1.8V so that it should be sufficiently protected for ESD. I noticed that effect when trying to oscilloscope the nreswarm signal on the Pyra two years ago.

But before we draw the conclusion that this is the reason of the problem: I have this capacitor also installed on one EVM as well. And a second one without. Both do the "reboot" fine, while the Pyra doesn't. I have not yet tried to reboot a Pyra with this capacitor removed.

And tonight I received a private message through the openpandora forum that someone tried this on the igep5 and has the same reboot problem:

> Blacklisting snd_soc_omap_abe_twl6040 also solves the reboot issue on the igep.

I asked for config and setup:

> I used the 4.19-rc5 mainline with only the omap2plus_defconfig, no option changed, GCC-8.2.0.

So if you happen to own an igep5, please check if you can reproduce it there with mainline kernel.

> What happens if you replace the warm reset with a cold reset? Try replacing the warm reset from omap4_prminst_global_warm_sw_reset() with cold reset (just change the bit used to reset the device.)

I have this on my to-do list.

As a first look, there seems to be a delay constant MAX_MODULE_HARDRESET_WAIT for 10µs. Well, 10 kOhm and 1 nF is a time constant of 10 µs. So the reset impulse and wait time may be too short for the Pyra (and long enough for the EVM by pure chance)...

> 
> It looks like in case of pyra, the resets are not passed in properly, leaving some pieces of HW (like ABE) in bad state. None of the register level settings should impact the result of warm reset in ideal world, however there are some warm reset erratas in place for the device which call for proper routing of reset signals on the board itself, and might have impact.
> 
> Have you looked at the errata documents whether any of those apply to your design?

Have this on my to-do as well.

BR and thanks,
Nikolaus