[Letux-kernel] 1.5GHz problems
H. Nikolaus Schaller
hns at goldelico.com
Sun Jul 31 23:37:04 CEST 2016
> Am 31.07.2016 um 00:57 schrieb Michael Mrozek <EvilDragon at openpandora.org>:
>
> Am Sat, 30 Jul 2016 21:54:13 +0200 hat "H. Nikolaus Schaller"
> <hns at goldelico.com> geschrieben:
>
> Hi,
>
>>>> The CPU board I am running these test on has additional wires which
>>>> allow to measure the voltages VDD_MPU, VDD_CORE, VDD_DDR3, VDD_MM,
>>>> 1V8, VSYS.
>>> This might be a good idea. Maybe the combination of Palmas and the
>>> charger / battery chip causes an issue here and the voltage isn't
>>> increased for some reason.
>> Very unlikely that the charger / battery has such an influence. How
>> should it know that I increase the cpu-freq...
>
> It provides the power that the Palmas requests, does it not?
> Or does it always provide the same power and the Palmas changes it
> based on what you set it to?
The latter.
Power flow is
USB -----+
bq24297 ---> VSYS=3.7V ---> Palmas - LDO ----> OMAP5
Battery --+
And all this voltage stepping is done by the Palmas LDOs (or switching regulators).
>
>> There are other factors (e.g. Speakers on/off, Display on/off) which
>> also have a high influence on VSYS and currents flowing through
>> connectors but turning them on or off makes no such difference as
>> cpufreq.
>
> One thing just came to my mind:
> You mention that it runs fine with SINGLE core 1,5GHz.
Yes, back in March that was a workaround to get the boards booted.
> If that's the case, then the power has to be set correctly, otherwise,
> the single core would crash as well right away.
Yes. This is why I think we could be hit by some SMP synchronization
issue at high speed.
>
> So the question is: Why does it run with 200% CPU on dualcore 1GHz but
> crashes as soon as you switch it to 1,5GHz.
> 1,5GHz without doing anything should need less power than 200% at 1GHz,
> so it can't probably be the overall power consumption.
Yes. That is what makes me puzzled as well.
>
> Hm, well, there is one difference:
>
> I=U/R... if you have a higher voltage, you need more Ampere as well,
> depending on the resistance the traces have. Maybe it can't handle to
> keep the voltage high enough for that reason, or peaks.
There is a feedback line going from the end of the trace at the OMAP5
back to the Palmas so that it can compensate for voltage loss along the
high-current traces.
But: this depends on how fast the regulator can react. So if there are
fast current peaks it might regulate too slow.
>
>> If the voltages turn out that they are not increased, there would be
>> a software bug.
>
> But then it wouldn't work single core 1,5GHz, so you can probably rule
> that out.
Yes.
>
>>> * Are there differences in the board files that could cause
>>> our CPU to misbehave?
>> Yes. The Pyra has a different DT file than EVM. And for the IGEP I
>> don't even know what it has. There is omap5-board-common but
>> we have a lot of extensions.
>
> I know they have different DT-Files, but the question is:
> Are the power / voltage / timing settings in them different?
Not that I am aware of. Well, there is one difference: we have added 500 MHz
and 750 MHz OPPs.
>
>> And there could be some problem in a driver for a device that does
>> not exist on either EVM or IGEP.
>
> Well, in that case it should work when all drivers are disabled and the
> system is being run under a minimum condition.
That is difficult to do since it does not boot in all cases (for example the
bq24297 driver).
>
>>> * Is there a difference in our hardware setup that could lead to
>>> that? We got different RAM, a diffrent quartz, different power
>>> setup (we don't have a simple AC like the devboards, but a
>>> battery / charger circuit)
>> RAM and power setup / battery / charger is something I would remove
>> from the list of potential reasons. Because they are exactly the same
>> at 500 MHz, 1 GHz or 1.5 GHz. There is nothing controlled differently.
>
> Well, except if they cap the maximum voltage for some reason, so the
> PALMAS can't change to the requested voltage.
> This would explain why 200% CPU at 1GHz would still work, as not the
> power is the issue, but the voltage.
I am not sure but the Palmas should detect over-current situations and
report an Interrupt. But it might be too late for the Linux kernel to print
a message.
>
>> The SoC die temperature when running at 500 MHz is ~55°C. And
>> when switching to 1.5 GHz it rises to 65°C in 2-3 seconds and then
>> the CPU hangs. This does not change the quartz temperature.
>
> 10°C more in 2-3 seconds?
> That sounds strange
Well, my statement wasn't precise - it is not idle temperature but running
high-load.
> - unless you are running performance as governor,
> setting the CPU speed to 1,5GHz shouldn't really do much as the CPU
> would still be idling... so why does the temperature rise that fast?
I should write a tool to measure the temperature in idle mode.
>
>>>> Anyways, please all kernel developers think about potential kernel
>>>> issues (scheduling, SMP, locking, interrupts, I&D-caches) that
>>>> might lead to such a behaviour. And potential tests (I can add
>>>> printk etc. where needed).
>> What puzzles me most is that the system hangs at low system load.
>> As soon as cpufreq-set goes to >ca. 1 GHz.
>> In that situation the OMAP should be in idle 98% of the time and is
>> just blinking some I2C LEDs and waiting that I type the next command
>> over UART console. Well, DSS is also running in the background. But
>> not much more.
>
> Yes, but that sudden increase of temperature does NOT sound like idling.
Sorry for the misunderstanding created by slightly imprecise description.
>
>
> --
> Mit freundlichen Grüßen,
>
> Michael Mrozek
>
> -----------------------
> OpenPandora GmbH
> Geschäftsführer: Michael Mrozek
>
> Schäffbräustr. 11
> 85049 Ingolstadt
> Deutschland
> Tel.: 0841 / 990 5548
> http://www.openpandora.de/
> HRB 4879, Amtsgericht Ingolstadt
> -----------------------
> eMail: mrozek at openpandora.org
More information about the Letux-kernel
mailing list