From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Making "COPY partitioned_table FROM" faster |
Date: | 2018-06-21 09:13:42 |
Message-ID: | CAKJS1f93DeHN+9RrD9jYn0iF_o89w2B+U8-Ao5V1kd8Cf7oSGQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I was looking at the COPY FROM performance gap between bulk loads with
partitioned tables vs non-partitioned tables. There's quite a gap!
Almost twice as slow in my test.
It seems to be mostly down to lack of usage of heap_multi_insert() for
the partitioned table case, which I guess is because we can only do
that into a single heap. I didn't really see any reason not do when
the partition for this tuple is the same as the one for the last
tuple. Such cases may well be quite common, especially so in time
series data stored in RANGE partitioned tables.
I've implemented this in the attached. Performance is much better
when the rows are located in the same partition. I've also tested the
worst case; when the partition changes on each row. That's now
slightly slower. Although, if we're worried about that we could
probably make the insert-method adaptive, and only enable
multi-inserts if the partition remains the same for X consecutive
tuples then have it revert back to single inserts when the partition
changed X times after X tuples, where X is some number above 1, say
10? I've not done that. I'm not sure it's worthwhile.
This patch seems fairly simple, only touching copy.c. I think it's a
good candidate for July's 'fest.
src/backend/commands/copy.c | 225 +++++++++++++++++++++++++++++++++++---------
1 file changed, 182 insertions(+), 43 deletions(-)
Benchmarks below:
Setup:
-- non-partitioned (control)
CREATE TABLE partbench_ (date TIMESTAMP NOT NULL, i1 INT NOT NULL, i2
INT NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL);
-- 10k parts
CREATE TABLE partbench (date TIMESTAMP NOT NULL, i1 INT NOT NULL, i2
INT NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL)
PARTITION BY RANGE (date);
\o /dev/null
select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench
FOR VALUES FROM (''' || '2017-03-06'::date + (x::text || '
hours')::interval || ''') TO (''' || '2017-03-06'::date + ((x+1)::text
|| ' hours')::interval || ''');'
from generate_Series(0,9999) x;
\gexec
\o
Test:
-- Time loading of 1GB of data.
\timing on
copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|';
truncate table partbench_;
copy partbench from program $$perl ~/partbench.pl$$ delimiter '|';
truncate table partbench;
copy partbench from program $$perl ~/partbench_alternate.pl$$ delimiter '|';
truncate table partbench;
Unpatched:
postgres=# copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 22669.017 ms (00:22.669)
postgres=# truncate table partbench_;
postgres=# copy partbench from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 44095.884 ms (00:44.096)
postgres=# truncate table partbench;
postgres=# copy partbench from program $$perl
~/partbench_alternate.pl$$ delimiter '|';
COPY 17825782
Time: 45129.004 ms (00:45.129)
postgres=# truncate table partbench;
Patched:
postgres=# copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 22701.290 ms (00:22.701)
postgres=# truncate table partbench_;
postgres=# copy partbench from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 27721.054 ms (00:27.721)
postgres=# truncate table partbench;
postgres=# copy partbench from program $$perl
~/partbench_alternate.pl$$ delimiter '|';
COPY 17825782
Time: 46151.844 ms (00:46.152)
postgres=# truncate table partbench;
partbench.pl:
for (my $i=0; $i < 8912891; $i++) {
print "2018-04-26 15:00:00|1|2|3|4|5\n";
print "2018-04-26 15:00:00|1|2|3|4|5\n";
}
partbench_alternate.pl:
for (my $i=0; $i < 8912891; $i++) {
print "2018-04-25 15:00:00|1|2|3|4|5\n";
print "2018-04-26 15:00:00|1|2|3|4|5\n";
}
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Allow-multi-inserts-during-COPY-into-a-partitione.patch | application/octet-stream | 14.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2018-06-21 09:48:50 | Re: bug with expression index on partition |
Previous Message | Arseny Sher | 2018-06-21 08:31:17 | Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding. |