From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Luke Lonergan" <llonergan(at)greenplum(dot)com> |
Cc: | "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: COPY FROM performance improvements |
Date: | 2005-08-07 04:08:16 |
Message-ID: | 29007.1123387696@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches pgsql-performance |
"Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>> I had some difficulty in generating test cases that weren't largely
>> I/O-bound, but AFAICT the patch as applied is about the same speed
>> as what you submitted.
> You achieve the important objective of knocking the parsing stage down a
> lot, but your parsing code is actually about 20% slower than Alon's.
I would like to see the exact test case you are using to make this
claim; the tests I did suggested my code is the same speed or faster.
The particular test case I was using was the "tenk1" data from the
regression database, duplicated out to about 600K rows so as to run
long enough to measure with some degree of repeatability.
As best I can tell, my version of CopyReadAttributes is significantly
quicker than Alon's, approximately balancing out the fact that my
version of CopyReadLine is slower. I did the latter first, and would
now be tempted to rewrite it in the same style as CopyReadAttributes,
ie one pass of memory-to-memory copy using pointers rather than buffer
indexes.
BTW, late today I figured out a way to get fairly reproducible
non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
the actual inserts, thus:
create table foo ...
create function noway() returns trigger as
'begin return null; end' language plpgsql;
create trigger noway before insert on foo
for each row execute procedure noway();
then repeat:
copy foo from '/tmp/foo.data';
If the source file is not too large to fit in kernel disk cache, then
after the first iteration there is no I/O at all. I got numbers
that were reproducible within less than 1%, as opposed to 5% or more
variation when the thing was partially I/O bound. Pretty useless in the
real world, of course, but great for timing COPY's data-pushing.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2005-08-07 05:21:03 | Re: COPY FROM performance improvements |
Previous Message | Luke Lonergan | 2005-08-07 03:25:24 | Re: COPY FROM performance improvements |
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2005-08-07 05:21:03 | Re: COPY FROM performance improvements |
Previous Message | Luke Lonergan | 2005-08-07 03:25:24 | Re: COPY FROM performance improvements |