[Letux-kernel] 1.5GHz problems

H. Nikolaus Schaller hns at goldelico.com
Sat Jul 30 22:28:38 CEST 2016


Hi Marek,

> Am 30.07.2016 um 22:14 schrieb Belisko Marek <marek.belisko at gmail.com>:
> 
> Hi Nikolaus,
> 
> On Fri, Jul 29, 2016 at 4:00 PM, H. Nikolaus Schaller <hns at goldelico.com> wrote:
>> Hi all,
>> we know that the Pyra CPU boards (at least the 3 units we have running)
>> make problems when we use OPP to allow for 1.5GHz. The kernel suddenly
>> hangs without obvious and repeatable error messages.
> What about posting this message to LKML and add some TI people? Just an idea.

I think it is too early to address the broad audience of LKML. They do not know or
have the hardware to test. AFAIK, some TI people are also reading the
kernel at pyra-handheld.com list.

And I don't exactly know what to tell them besides that it does not work reliably >1Ghz.
And we can not 100% exclude a hardware issue where the LKML can't help (except
telling us to exclude that).

> I have experience only with single core CPU and usually it is hanging
> due to the fact that cpu freq is high but voltage isn't properly set.

Michael also pointed out that it works on OMAP5 EVM and IGEP5. And they use the
same voltage settings as we use. But that could already be the problem.

I did read the data sheets and TRM in a way that the values for voltage settings should
be modified according to some PROM value. And there is no driver for that...

So we simply might be unlucky that the unmodified values do not work for us :(


>> 
>> At 1.0 GHz (or 1.5 GHz and disabling the second core) the OMAP5432 works.
>> 
>> To get some more insights I have done some tests.
>> 
>> * Board M4+C19 w/o display
>> * Kernel: letux-4.7.0
>> * 500MHz + 750MHz OPP runs accoding to default DT
>> * 1.0GHz OPP in DT modified to check what happens
>> * temperature driven by /root/high-load (prints 3 temperature hwmon values
>> every second)
>> 
>> A) 1.0GHz at 1060000uV
>> kernel boot: ok
>> high-load: runs unlimited
>> 
>> Comment:
>> This is what works since march 2016.
>> 
>> 
>> B) 1.1GHz at 1060000uV
>> kernel boot: ok
>> high-load: reaches 97°C after 25 min
>> cpufreq-info: 96%@1.1 GHz
>> 
>> Comment:
>> I remember that temperature was ~92°C at 1.0 GHz so this drives
>> the temperature up by 5K.
>> 
>> 
>> C1) 1.3GHz at 1060000uV
>> kernel boot: hangs during initial boot
>> 
>> Note: hang means the CPU isn't responding on serial interface and status
>> LEDs are no longer blinking
>> 
>> Comment:
>> obviously the voltage is too low for 1.3 GHz.
>> 
>> C3) 1.0GHz at 1060000uV + 1.3GHz at 1150000uV
>> kernel boot: ok
>> high-load: hangs after 15 seconds after reaching 63°C
>> 
>> repeated boot attempts:
>> C3a) high-load: hangs after some seconds at 64°C
>> 
>> C3b) high-load: runs >15 min
>> ramps up to 100-103°C suddenly jumps down to 82-95°C.
>> reaches after ca. 10 sec again >100°C.
>> As if some over temperature protection throttles the CPU clock
>> PCB temperature: 80°C
>> cpufreq-info: just 73%@1.3 GHz
>> 
>> C3c) high-load: hangs after 15 sec at 65°C
>> 
>> Comment:
>> This means it runs not 100% reliable at this OPP and the effect
>> seems to be temperature dependent. But if the OMAP runs it
>> comes into a temperature limit which triggers some overtemp
>> protection built into the kernel.
>> 
>> 
>> D) 1.0GHz at 1060000uV + 1.5GHz at 1150000uV
>> kernel boot: hangs after 4.3-4.4 sek (3 times reproducible)
>> 
>> E1) 1.5GHz at 1250000uV:
>> kernel boot: hangs after 6.3-6.5 sek (3 times reproducible)
>> 
>> E2) 1.5GHz at 1300000uV (close to upper limit according to "Data Manual
>> Operating Condition Addendum Version 0.6"):
>> kernel boot: hangs at 6.6 sek
>> 
>> F) test E1 + OMAP5_ERRATA_801819 enabled
>> kernel boot: hangs again 6.4 sek (in "Synthesizing the initial hotplug
>> events...")
>> 
>> Comment:
>> it was not possible to boot in dual core 1.5 GHz mode. Very strange and
>> unexpected
>> is that the kernel hangs repeatedly at 6.3-6.6 seconds as if there is
>> something in the
>> code which increases the risk of a hang (deadlock).
>> 
>> So it is either a kernel software issue (something critical is running
>> faster
>> at 1.5 Ghz resulting in a deadlock). Or the voltage is still too low. But I
>> did not
>> dare to increase it further since it may destroy the valuable CPU board...
>> 
>> 
>> G) [Kernel] omap5 mpu bridge dividers
>> Matthijs recently reported a potential issue here with the above subject
>> line.
>> 
>> A simple test would be to boot at 1.5Ghz and then run
>> 
>> omapconf write 0x4A004320 0x06000001
>> 
>> But I can't even boot at 1.5Ghz so I have no chance to test.
>> 
>> 
>> Summary / Discussion:
>> * it looks as if 1 GHz (or single core 1.5 GHz) works without problems
>> * for 1.3 GHz we have to increase CPU voltage or the kernel hangs
>> * at 1.5 GHz I wasn't able to boot even with increasing CPU voltage
>> 
>> The data sheets hint at using AVS and ABB.
>> 
>> "4.3.1
>> AVS and ABB Requirements
>> Adaptive Voltage Scaling (AVS) and Adaptive Body Biasing (ABB) are required
>> on most of the VDD_* domains as defined in Table 4-7"
>> 
>> Table 4-7 indirectly defines all operation points >1.0 GHz as required.
>> 
>> " • The AVS Voltages are device-dependent, voltage domain-dependent, and
>> OPP-dependent. They must be read from the CONTROL_STD_FUSE_OPP_VDD Registers
>> in the Control Module Section of the TRM."
>> 
>> From this I read that every sigle OMAP chip is slightly different and TI
>> measures these differences during production.
>> 
>> This should be done by the AVS drivers.
>> 
>> We did not have CONFIG_POWER_AVS_OMAP enabled but only CONFIG_POWER_AVS.
>> 
>> But although I changed that and did some additional tests, it has no
>> influence.
>> And the AVS seems to be incomplete and non-operational anyways:
>> 
>> [    4.977605] sr_init: No PMIC hook to init smartreflex
>> [    4.982922] driver_register 'smartreflex'
>> [    4.987747] sr_init: platform driver register failed for SR
>> 
>> ... and the kernel hangs again at 6.43 sec. As if there is a watchdog timer
>> in the OMAP that is only running in 1.5GHz mode...
>> 
>> So I think a hardware issue is quite unlikely, especially as the 1.5 GHz
>> setup hangs always at
>> the same 6.3-6.6 seconds after Linux Start.
>> 
>> And before I waste more and more weeks on looking for really difficult to
>> grab hardware issues
>> I would like to hear kernel-specialist's opinions first.
>> 
>> BR and thanks,
>> Nikolaus
>> 
>> 
>> _______________________________________________
>> http://projects.goldelico.com/p/gta04-kernel/
>> Letux-kernel mailing list
>> Letux-kernel at openphoenux.org
>> http://lists.goldelico.com/mailman/listinfo.cgi/letux-kernel
> 
> BR,
> 
> marek

BR,
Nikolaus



More information about the Letux-kernel mailing list