[Letux-kernel] [PATCH] MIPS: Fix a longstanding error in div64.h
YunQiang Su
wzssyqa at gmail.com
Fri Apr 9 09:17:04 CEST 2021
Huacai Chen <chenhuacai at kernel.org> 于2021年4月8日周四 下午12:56写道:
>
> Hi, Maciej,
>
> On Wed, Apr 7, 2021 at 9:38 PM Maciej W. Rozycki <macro at orcam.me.uk> wrote:
> >
> > On Wed, 7 Apr 2021, Huacai Chen wrote:
> >
> > > > This code is rather broken in an obvious way, starting from:
> > > >
> > > > unsigned long long __n; \
> > > > \
> > > > __high = *__n >> 32; \
> > > > __low = __n; \
> > > >
> > > > where `__n' is used uninitialised. Since this is my code originally I'll
> > > > look into it; we may want to reinstate `do_div' too, which didn't have to
> > > > be removed in the first place.
> > > I think we can reuse the generic do_div().
> >
> > We can, but it's not clear to me if this is optimal. We have a DIVMOD
> > instruction which original code took advantage of (although I can see
> > potential in reusing bits from include/asm-generic/div64.h). The two
> > implementations would have to be benchmarked against each other across a
> > couple of different CPUs.
> The original MIPS do_div() has "h" constraint, and this is also the
> reason why Ralf rewrote this file. How can we reintroduce do_div()
> without "h" constraint?
>
I try to figure out a new version:
uint32_t __attribute__ ((noinline)) div64_32n(uint64_t *x, uint32_t b) {
uint64_t a = *x;
uint64_t t1 = ((a>>32)/b)<<32;
uint32_t t2 = (a>>32)%b;
uint32_t res = (uint32_t)a;
uint32_t t1lo = 0;
uint32_t t3 = 0xffffffffu/b;
uint32_t t4 = t3*b;
uint32_t hi, lo;
while(t2>0) {
__asm__ (
"multu %2, %3\n"
"mfhi %0\n"
"mflo %1\n"
: "=r" (hi), "=r"(lo)
: "r" (t4), "r"(t2)
);
// yes, we are sure that t2*t3 will not overflow
t1lo += (t3*t2);
t2 -= hi;
if (lo > 0) {
t2 --; // we are sure that t2 > 0
lo = 0xffffffff - lo + 1;
unsigned tmp = lo + res;
// overflow
if (tmp < lo || tmp < res) {
t2 ++;
}
res = tmp;
}
}
if (res >= b) {
t1lo += (res/b);
res = (res%b);
}
t1 += t1lo;
*x = t1;
return res;
}
With some test the performace: ((uint64_t)(-1))/3 with 0xfffff times
GCC: 5555555555555555, 0, seconds: 5
SYQ: 5555555555555555, 0, seconds: 4
KER: 5555555555555555, 0, seconds: 8
RAL: ffffffff, 2, seconds: 4
1. the MIPS current asm version cost 4s (and wrong result)
2. the simplest C code : a/b && a % b, cost 5s
3. the asm-generic version cost 8s.
4. my version cost 4s.
And the question is why asm-generic version exists
since it has bad performance than the code generated by GCC?
> Huacai
> >
> > > > Huacai, thanks for your investigation! Please be more careful in
> > > > verifying your future submissions however.
> > > Sorry, I thought there is only one bug in div64.h, but in fact there
> > > are three...
> >
> > This just shows the verification you made was not good enough, hence my
> > observation.
> >
> > Maciej
--
YunQiang Su
More information about the Letux-kernel
mailing list