From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | a_ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Cc: | Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: FunctionCallN improvement. |
Date: | 2005-02-01 21:23:56 |
Message-ID: | 19054.1107293036@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
a_ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> writes:
> I made the test program to measure the effect of this macro.
Well, if we're going to be tense about this, let's actually be tense
about it. Your test program isn't a great model for what's going to
happen in fmgr.c, because you've designed it so that Nargs cannot be
known at compile time. In the fmgr routines, Nargs is certainly a
compile-time constant, and so implementations that can exploit that
will have an advantage.
Also, we can take advantage of some improvements in the MemSet macro
family that occurred since fmgr.c was last rewritten. I see no reason
not to use MemSetLoop directly, since the fcinfo struct will have the
correct size and correct alignment.
In addition to your original macro, I tried two other variants: one
that uses MemSetLoop with a loop length rounded to the next higher
multiple of 4, and one that expects the argisnull settings to be written
out directly, in the same style as is currently done in FunctionCall1
and FunctionCall2. (This amounts to unrolling the loop in the original
macro; something that could be done by the compiler given a constant
Nargs, but it seems not to be done by the compilers I tested.)
I tested two cases: NARGS = 2, which is certainly the single most
critical case, and NARGS = 5, which is probably the largest number
of arguments that we really care too much about. (You have to hand-edit
the test program and recompile to adjust NARGS, since the point is to
treat it as a compile-time constant.)
Here are wall-clock timings on the architectures and compilers I have at
hand:
NARGS = 2
MemSetLoop OrigMacro SetMacro Unrolled
i386, gcc -O2 37.655s 6.411s 7.060s 6.362s
i386, gcc -O6 35.420s 1.129s 1.814s 0.567s
PPC, gcc -O2 54.033s 6.754s 11.138s 6.438s
HPPA, gcc -O2 58.82s 10.38s 9.79s 7.85s
HPPA, cc +O2 60.39s 13.43s 8.40s 7.31s
NARGS = 5
MemSetLoop OrigMacro SetMacro Unrolled
i386, gcc -O2 37.566s 11.329s 7.688s 8.874s
i386, gcc -O6 32.992s 5.928s 2.881s 0.566s
PPC, gcc -O2 86.300s 19.048s 14.626s 8.751s
HPPA, gcc -O2 58.28s 15.09s 13.42s 14.37s
HPPA, cc +O2 58.23s 8.96s 12.88s 7.28s
(I used different loop counts on the different machines to get similar
overall times for the memset case; so it's OK to compare numbers across
a row but not down a column.)
Based on this I think we ought to go with the "unrolled" approach, ie,
we'll create a macro to initialize the fixed fields of fcinfo but fill
in the arg and argisnull arrays with code like what's already in
FunctionCall2:
fcinfo.arg[0] = arg1;
fcinfo.arg[1] = arg2;
fcinfo.argnull[0] = false;
fcinfo.argnull[1] = false;
If anyone would like to try the results on other platforms, my test
program is attached.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 4.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-02-01 21:26:02 | Re: Huge memory consumption during vacuum (v.8.0) |
Previous Message | Oleg Bartunov | 2005-02-01 20:11:15 | Re: Huge memory consumption during vacuum (v.8.0) |