[Letux-kernel] 1.5GHz problems

Belisko Marek marek.belisko at gmail.com
Sat Jul 30 22:14:41 CEST 2016


Hi Nikolaus,

On Fri, Jul 29, 2016 at 4:00 PM, H. Nikolaus Schaller <hns at goldelico.com> wrote:
> Hi all,
> we know that the Pyra CPU boards (at least the 3 units we have running)
> make problems when we use OPP to allow for 1.5GHz. The kernel suddenly
> hangs without obvious and repeatable error messages.
What about posting this message to LKML and add some TI people? Just an idea.
I have experience only with single core CPU and usually it is hanging
due to the fact that cpu freq is high but voltage isn't properly set.
>
> At 1.0 GHz (or 1.5 GHz and disabling the second core) the OMAP5432 works.
>
> To get some more insights I have done some tests.
>
> * Board M4+C19 w/o display
> * Kernel: letux-4.7.0
> * 500MHz + 750MHz OPP runs accoding to default DT
> * 1.0GHz OPP in DT modified to check what happens
> * temperature driven by /root/high-load (prints 3 temperature hwmon values
> every second)
>
> A) 1.0GHz at 1060000uV
> kernel boot: ok
> high-load: runs unlimited
>
> Comment:
> This is what works since march 2016.
>
>
> B) 1.1GHz at 1060000uV
> kernel boot: ok
> high-load: reaches 97°C after 25 min
> cpufreq-info: 96%@1.1 GHz
>
> Comment:
> I remember that temperature was ~92°C at 1.0 GHz so this drives
> the temperature up by 5K.
>
>
> C1) 1.3GHz at 1060000uV
> kernel boot: hangs during initial boot
>
> Note: hang means the CPU isn't responding on serial interface and status
> LEDs are no longer blinking
>
> Comment:
> obviously the voltage is too low for 1.3 GHz.
>
> C3) 1.0GHz at 1060000uV + 1.3GHz at 1150000uV
> kernel boot: ok
> high-load: hangs after 15 seconds after reaching 63°C
>
> repeated boot attempts:
> C3a) high-load: hangs after some seconds at 64°C
>
> C3b) high-load: runs >15 min
> ramps up to 100-103°C suddenly jumps down to 82-95°C.
> reaches after ca. 10 sec again >100°C.
> As if some over temperature protection throttles the CPU clock
> PCB temperature: 80°C
> cpufreq-info: just 73%@1.3 GHz
>
> C3c) high-load: hangs after 15 sec at 65°C
>
> Comment:
> This means it runs not 100% reliable at this OPP and the effect
> seems to be temperature dependent. But if the OMAP runs it
> comes into a temperature limit which triggers some overtemp
> protection built into the kernel.
>
>
> D) 1.0GHz at 1060000uV + 1.5GHz at 1150000uV
> kernel boot: hangs after 4.3-4.4 sek (3 times reproducible)
>
> E1) 1.5GHz at 1250000uV:
> kernel boot: hangs after 6.3-6.5 sek (3 times reproducible)
>
> E2) 1.5GHz at 1300000uV (close to upper limit according to "Data Manual
> Operating Condition Addendum Version 0.6"):
> kernel boot: hangs at 6.6 sek
>
> F) test E1 + OMAP5_ERRATA_801819 enabled
> kernel boot: hangs again 6.4 sek (in "Synthesizing the initial hotplug
> events...")
>
> Comment:
> it was not possible to boot in dual core 1.5 GHz mode. Very strange and
> unexpected
> is that the kernel hangs repeatedly at 6.3-6.6 seconds as if there is
> something in the
> code which increases the risk of a hang (deadlock).
>
> So it is either a kernel software issue (something critical is running
> faster
> at 1.5 Ghz resulting in a deadlock). Or the voltage is still too low. But I
> did not
> dare to increase it further since it may destroy the valuable CPU board...
>
>
> G) [Kernel] omap5 mpu bridge dividers
> Matthijs recently reported a potential issue here with the above subject
> line.
>
> A simple test would be to boot at 1.5Ghz and then run
>
> omapconf write 0x4A004320 0x06000001
>
> But I can't even boot at 1.5Ghz so I have no chance to test.
>
>
> Summary / Discussion:
> * it looks as if 1 GHz (or single core 1.5 GHz) works without problems
> * for 1.3 GHz we have to increase CPU voltage or the kernel hangs
> * at 1.5 GHz I wasn't able to boot even with increasing CPU voltage
>
> The data sheets hint at using AVS and ABB.
>
> "4.3.1
> AVS and ABB Requirements
> Adaptive Voltage Scaling (AVS) and Adaptive Body Biasing (ABB) are required
> on most of the VDD_* domains as defined in Table 4-7"
>
> Table 4-7 indirectly defines all operation points >1.0 GHz as required.
>
> " • The AVS Voltages are device-dependent, voltage domain-dependent, and
> OPP-dependent. They must be read from the CONTROL_STD_FUSE_OPP_VDD Registers
> in the Control Module Section of the TRM."
>
> From this I read that every sigle OMAP chip is slightly different and TI
> measures these differences during production.
>
> This should be done by the AVS drivers.
>
> We did not have CONFIG_POWER_AVS_OMAP enabled but only CONFIG_POWER_AVS.
>
> But although I changed that and did some additional tests, it has no
> influence.
> And the AVS seems to be incomplete and non-operational anyways:
>
> [    4.977605] sr_init: No PMIC hook to init smartreflex
> [    4.982922] driver_register 'smartreflex'
> [    4.987747] sr_init: platform driver register failed for SR
>
> ... and the kernel hangs again at 6.43 sec. As if there is a watchdog timer
> in the OMAP that is only running in 1.5GHz mode...
>
> So I think a hardware issue is quite unlikely, especially as the 1.5 GHz
> setup hangs always at
> the same 6.3-6.6 seconds after Linux Start.
>
> And before I waste more and more weeks on looking for really difficult to
> grab hardware issues
> I would like to hear kernel-specialist's opinions first.
>
> BR and thanks,
> Nikolaus
>
>
> _______________________________________________
> http://projects.goldelico.com/p/gta04-kernel/
> Letux-kernel mailing list
> Letux-kernel at openphoenux.org
> http://lists.goldelico.com/mailman/listinfo.cgi/letux-kernel

BR,

marek

-- 
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com


More information about the Letux-kernel mailing list