From: | Pierre Frédéric Caillaud <lists(at)peufeu(dot)com> |
---|---|
To: | "Alvaro Herrera" <alvherre(at)commandprompt(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: COPY speedup |
Date: | 2009-08-13 17:34:09 |
Message-ID: | op.uyl227c3cke6l8@soyouz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
In the previous mails I made a mistake, writing "MTuples/s" instead of
"MDatums/s", sorry about that. It is the number of rows x columns. The
title was wrong, but the data was right.
I've been doing some tests on COPY FROM ... BINARY.
- inlines in various pg_get* etc
- a faster buffer handling for copy
- that's about it...
In the below tables, you have "p17" (ie test patch 17, the last one) and
straight postgres compared.
COPY annonces_2 FROM 'annonces.bin' BINARY :
Time | Speedup | Table | KRows | MDatums | Name
(s) | | MB/s | /s | /s |
-------|---------|--------|--------|---------|----------------------------------------------------
8.417 | 1.40 x | 38.70 | 49.13 | 1.92 | 8.4.0 / p17
11.821 | --- | 27.56 | 34.98 | 1.36 | 8.4.0 / compiled from source
COPY archive_data_2 FROM 'archive_data.bin' BINARY :
Time | Speedup | Table | KRows | MDatums | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|--------|---------|----------------------------------------------------
15.314 | 1.93 x | 25.94 | 172.88 | 4.84 | 8.4.0 / p17 COPY FROM BINARY
all
29.520 | --- | 13.46 | 89.69 | 2.51 | 8.4.0 / compiled from source
COPY test_one_int_2 FROM 'test_one_int.bin' BINARY :
Time | Speedup | Table | KRows | MDatums | Name
(s) | | MB/s | /s | /s |
-------|---------|--------|---------|---------|----------------------------------------------------
10.003 | 1.39 x | 30.63 | 999.73 | 1.00 | 8.4.0 / p17 COPY FROM
BINARY all
13.879 | --- | 22.08 | 720.53 | 0.72 | 8.4.0 / compiled from
source
COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY :
Time | Speedup | Table | KRows | MDatums | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|--------|---------|----------------------------------------------------
6.009 | 2.08 x | 21.31 | 166.42 | 4.33 | 8.4.0 / p17 COPY FROM BINARY
all
12.516 | --- | 10.23 | 79.90 | 2.08 | 8.4.0 / compiled from source
I thought it might be interesting to get split timings of the various
steps in COPY FROM, so I simply commented out bits of code and ran tests.
The "delta" columns are differences between two lines, that is the time
taken in the step mentioned on the right.
reading data only = reading all the data and parsing it into chunks, doing
everything until the RecvFunc is called.
RecvFuncs = same, + RecvFunc is called
heap_form_tuple = same + heap_form_tuple is called
triggers = same + triggers are applied
insert = actual tuple insertion
p17 = total time (post insert triggers, constraint check, etc)
Time | Delta | Row delta | Datum delta | Name
(s) | (s) | (us) | (us) |
-------|-------|-----------|-------------|----------------------
1.311 | --- | --- | --- | reading data only
4.516 | 3.205 | 7.750 | 0.199 | RecvFuncs
4.534 | 0.018 | 0.043 | 0.001 | heap_form_tuple
5.323 | 0.789 | 1.908 | 0.049 | triggers
8.182 | 2.858 | 6.912 | 0.177 | insert
8.417 | 0.236 | 0.570 | 0.015 | p17
COPY archive_data_2 FROM 'archive_data.bin' BINARY :
Time | Delta | Row delta | Datum delta | Name
(s) | (s) | (us) | (us) |
-------|--------|-----------|-------------|---------------------
4.729 | --- | --- | --- | reading data only
8.508 | 3.778 | 1.427 | 0.051 | RecvFuncs
8.567 | 0.059 | 0.022 | 0.001 | heap_form_tuple
10.804 | 2.237 | 0.845 | 0.030 | triggers
14.475 | 3.671 | 1.386 | 0.050 | insert
15.314 | 0.839 | 0.317 | 0.011 | p17
COPY test_one_int_2 FROM 'test_one_int.bin' BINARY :
Time | Delta | Row delta | Datum delta | Name
(s) | (s) | (us) | (us) |
-------|-------|-----------|-------------|----------------------
1.247 | --- | --- | --- | reading data only
1.745 | 0.498 | 0.050 | 0.050 | RecvFuncs
1.750 | 0.004 | 0.000 | 0.000 | heap_form_tuple
3.114 | 1.364 | 0.136 | 0.136 | triggers
9.984 | 6.870 | 0.687 | 0.687 | insert
10.003 | 0.019 | 0.002 | 0.002 | p17
COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY :
Time | Delta | Row delta | Datum delta | Name
(s) | (s) | (us) | (us) |
-------|-------|-----------|-------------|----------------------
1.701 | --- | --- | --- | reading data only
3.122 | 1.421 | 1.421 | 0.055 | RecvFuncs
3.129 | 0.008 | 0.008 | 0.000 | heap_form_tuple
3.754 | 0.624 | 0.624 | 0.024 | triggers
5.639 | 1.885 | 1.885 | 0.073 | insert
6.009 | 0.370 | 0.370 | 0.014 | p17
We can see that :
- reading and parsing the data is still slow (actually, everything is
copied something like 3-4 times)
- RecvFuncs take quite long, too
- triggers use some time, although the table has no triggers ....? This is
suspicious...
- the actual insertion (which is really what we are interested in when
loading a table) is actually very fast
Ideally in COPY the insertion time in the table should take most of the
time used in the operation...
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2009-08-13 17:36:07 | Re: Hot standby and synchronous replication status |
Previous Message | Tom Lane | 2009-08-13 17:28:05 | Re: Hot standby and synchronous replication status |