[Letux-kernel] 1.5GHz problems

H. Nikolaus Schaller hns at goldelico.com
Sun Jul 31 23:37:04 CEST 2016

> Am 31.07.2016 um 00:57 schrieb Michael Mrozek <EvilDragon at openpandora.org>:
> Am Sat, 30 Jul 2016 21:54:13 +0200 hat "H. Nikolaus Schaller"
> <hns at goldelico.com> geschrieben:
> Hi,
>>>> The CPU board I am running these test on has additional wires which
>>>> allow to measure the voltages VDD_MPU, VDD_CORE, VDD_DDR3, VDD_MM,
>>>> 1V8, VSYS.   
>>> This might be a good idea. Maybe the combination of Palmas and the
>>> charger / battery chip causes an issue here and the voltage isn't
>>> increased for some reason.  
>> Very unlikely that the charger / battery has such an influence. How
>> should it know that I increase the cpu-freq...
> It provides the power that the Palmas requests, does it not?
> Or does it always provide the same power and the Palmas changes it
> based on what you set it to?

The latter.

Power flow is

USB -----+
             bq24297 ---> VSYS=3.7V ---> Palmas - LDO ----> OMAP5
Battery --+

And all this voltage stepping is done by the Palmas LDOs (or switching regulators).

>> There are other factors (e.g. Speakers on/off, Display on/off) which
>> also have a high influence on VSYS and currents flowing through
>> connectors but turning them on or off makes no such difference as
>> cpufreq.
> One thing just came to my mind:
> You mention that it runs fine with SINGLE core 1,5GHz.

Yes, back in March that was a workaround to get the boards booted.
> If that's the case, then the power has to be set correctly, otherwise,
> the single core would crash as well right away.

Yes. This is why I think we could be hit by some SMP synchronization
issue at high speed.

> So the question is: Why does it run with 200% CPU on dualcore 1GHz but
> crashes as soon as you switch it to 1,5GHz.
> 1,5GHz without doing anything should need less power than 200% at 1GHz,
> so it can't probably be the overall power consumption.

Yes. That is what makes me puzzled as well.

> Hm, well, there is one difference:
> I=U/R... if you have a higher voltage, you need more Ampere as well,
> depending on the resistance the traces have. Maybe it can't handle to
> keep the voltage high enough for that reason, or peaks.

There is a feedback line going from the end of the trace at the OMAP5
back to the Palmas so that it can compensate for voltage loss along the
high-current traces.

But: this depends on how fast the regulator can react. So if there are
fast current peaks it might regulate too slow.

>> If the voltages turn out that they are not increased, there would be
>> a software bug.
> But then it wouldn't work single core 1,5GHz, so you can probably rule
> that out.


>>> * Are there differences in the board files that could cause
>>> our CPU to misbehave?  
>> Yes. The Pyra has a different DT file than EVM. And for the IGEP I
>> don't even know what it has. There is omap5-board-common but
>> we have a lot of extensions.
> I know they have different DT-Files, but the question is:
> Are the power / voltage / timing settings in them different?

Not that I am aware of. Well, there is one difference: we have added 500 MHz
and 750 MHz OPPs.

>> And there could be some problem in a driver for a device that does
>> not exist on either EVM or IGEP.
> Well, in that case it should work when all drivers are disabled and the
> system is being run under a minimum condition.

That is difficult to do since it does not boot in all cases (for example the
bq24297 driver).

>>> * Is there a difference in our hardware setup that could lead to
>>> that? We got different RAM, a diffrent quartz, different power
>>> setup (we don't have a simple AC like the devboards, but a
>>> battery / charger circuit)  
>> RAM and power setup / battery / charger is something I would remove
>> from the list of potential reasons. Because they are exactly the same
>> at 500 MHz, 1 GHz or 1.5 GHz. There is nothing controlled differently.
> Well, except if they cap the maximum voltage for some reason, so the
> PALMAS can't change to the requested voltage.
> This would explain why 200% CPU at 1GHz would still work, as not the
> power is the issue, but the voltage.

I am not sure but the Palmas should detect over-current situations and
report an Interrupt. But it might be too late for the Linux kernel to print
a message.

>> The SoC die temperature when running at 500 MHz is ~55°C. And
>> when switching to 1.5 GHz it rises to 65°C in 2-3 seconds and then
>> the CPU hangs. This does not change the quartz temperature.
> 10°C more in 2-3 seconds?
> That sounds strange

Well, my statement wasn't precise - it is not idle temperature but running

> - unless you are running performance as governor,
> setting the CPU speed to 1,5GHz shouldn't really do much as the CPU
> would still be idling... so why does the temperature rise that fast?

I should write a tool to measure the temperature in idle mode.

>>>> Anyways, please all kernel developers think about potential kernel
>>>> issues (scheduling, SMP, locking, interrupts, I&D-caches) that
>>>> might lead to such a behaviour. And potential tests (I can add
>>>> printk etc. where needed).   
>> What puzzles me most is that the system hangs at low system load.
>> As soon as cpufreq-set goes to >ca. 1 GHz.
>> In that situation the OMAP should be in idle 98% of the time and is
>> just blinking some I2C LEDs and waiting that I type the next command
>> over UART console. Well, DSS is also running in the background. But
>> not much more.
> Yes, but that sudden increase of temperature does NOT sound like idling.

Sorry for the misunderstanding created by slightly imprecise description.

> -- 
> Mit freundlichen Grüßen,
> Michael Mrozek
> -----------------------
> OpenPandora GmbH
> Geschäftsführer: Michael Mrozek
> Schäffbräustr. 11
> 85049 Ingolstadt
> Deutschland
> Tel.: 0841 / 990 5548
> http://www.openpandora.de/
> HRB 4879, Amtsgericht Ingolstadt
> -----------------------
> eMail: mrozek at openpandora.org

More information about the Letux-kernel mailing list