COPY speedup

From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: COPY speedup
Date: 2009-08-11 12:50:27
Message-ID: op.uyh0mdyucke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Backups always take too long...
COPY TO is CPU bound...

A few days of coding later, I think I'm on to something.

First the results.
All tables are cached in RAM (not in shared_buffers though).
Timings are best of 4 tries.

- test_one_int is a table with 1 INT column and 10.000.000 rows (from
generate_series)

SELECT count(*) FROM test_one_int :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|---------|---------|-----------------------------
2.040 | --- | 150.22 | 4903.03 | 4.90 | 8.4.0 / compiled from source

* count(*) gives a reference timing for scanning a table

COPY test_one_int TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|---------|---------|-----------------------------
2.003 | 3.15 x | 152.94 | 4991.87 | 4.99 | 8.4.0 / copy to patch 4
6.318 | --- | 48.49 | 1582.85 | 1.58 | 8.4.0 / compiled from source

* reduced per-row overhead
-> COPY BINARY faster than count(*) for a 1-column table
-> COPY BIANRY faster than "SELECT * WHERE x=-1" for a 1-column table
which doesn't contain the value "-1"

* reduced per-tuple overhead
-> COPY BINARY 3.15 times faster

COPY test_one_int TO '/dev/null' :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|-------|---------|---------|-----------------------------
4.879 | 1.48 x | 62.80 | 2049.78 | 2.05 | 8.4.0 / copy to patch 4
7.198 | --- | 42.56 | 1389.25 | 1.39 | 8.4.0 / compiled from source

* reduced per-row and per-tuple overheads
-> COPY 1.48x faster

* Patched Binary mode is 3.4x faster than un-patched text mode

*******************************************************************

- test_many_ints is a table with 26 INT column and 1.000.000 rows

SELECT count(*) FROM test_many_ints :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|---------|---------|-----------------------------
0.275 | --- | 465.88 | 3637.45 | 94.57 | 8.4.0 / copy to patch 4

COPY test_many_ints TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|-------|--------|---------|-----------------------------
1.706 | 5.19 x | 75.08 | 586.23 | 15.24 | 8.4.0 / copy to patch 4
8.861 | --- | 14.45 | 112.85 | 2.93 | 8.4.0 / compiled from source

COPY test_many_ints TO '/dev/null' :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|--------|---------|-----------------------------
8.941 | 1.36 x | 14.32 | 111.84 | 2.91 | 8.4.0 / copy to patch 4
12.149 | --- | 10.54 | 82.31 | 2.14 | 8.4.0 / compiled from source

* Patched Binary mode is 7.1x faster than un-patched text mode

*******************************************************************

- annonces is a 340MB table with a mix of ints, smallints, bools, date,
timestamp, etc, and a text field averaging 230 bytes

SELECT count(*) FROM annonces :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|---------|---------|-----------------------------
0.349 | --- | 933.45 | 1184.91 | 46.21 | 8.4.0 / copy to patch 4

COPY annonces TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|--------|---------|-----------------------------
2.149 | 2.60 x | 151.57 | 192.40 | 7.50 | 8.4.0 / copy to patch 4
5.579 | --- | 58.39 | 74.12 | 2.89 | 8.4.0 / compiled from source

* Patched Binary mode is 4.7x faster than un-patched text mode

COPY annonces TO '/dev/null' :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|-------|---------|-----------------------------
9.600 | 1.06 x | 33.93 | 43.08 | 1.68 | 8.4.0 / copy to patch 4
10.147 | --- | 32.10 | 40.75 | 1.59 | 8.4.0 / compiled from source

* Here, COPY isn't much faster : most of the time is actually spent
converting the DATE and TIMESTAMP columns to strings.
* In binary mode, such conversions are not needed.

*******************************************************************

- archive is 416MB, the same as annonces, without the text field, and many
more rows

SELECT count(*) FROM archive_data :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
------|---------|--------|---------|---------|-----------------------------
0.844 | --- | 470.60 | 3135.89 | 87.81 | 8.4.0 / copy to patch 4

COPY archive_data TO '/dev/null' BINARY :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|--------|---------|-----------------------------
5.372 | 3.75 x | 73.96 | 492.88 | 13.80 | 8.4.0 / copy to patch 4
20.165 | --- | 19.70 | 131.29 | 3.68 | 8.4.0 / compiled from source

* Patched Binary mode is 6.4x faster than un-patched text mode

COPY archive_data TO '/dev/null' :
Time | Speedup | Table | KRows | MTuples | Name
(s) | | MB/s | /s | /s |
-------|---------|-------|-------|---------|-----------------------------
28.471 | 1.21 x | 13.95 | 92.99 | 2.60 | 8.4.0 / copy to patch 4
34.344 | --- | 11.57 | 77.09 | 2.16 | 8.4.0 / compiled from source

* Most of the time is again spent converting the DATE and TIMESTAMP
columns to strings.

*******************************************************************

* Why ?

COPY in text mode should be "fast enough" but will never be really fast
because many types need complicated conversions.
COPY BINARY has drawbacks (not very portable...) so, to justify its
existence, it should compensate with a massive speed increase over text
mode, which is not the case in 8.4.

* How ?

- Created a new "WBuf" auto-flushing buffer type. It looks like a
StringInfo, but :
- it has a flush callback
- you add data to it in little pieces
- when it is full, it sends the buffer contents to the flush callback
- it never makes any palloc calls except on creation

- fmgr.c
- new way of calling SendFuncs and OutFuncs which uses the existing
"context" field
- copy.c passes WBuf through this context
- SendFuncs check if they are called with a context
- if yes, write directly to the buffer
- if no, previous behaviour remains, return a BYTEA

- copy.c
- creates a WBuf
- sets the flush callback to do the right thing (write file, send to
frontend, etc)
- writes data like headers and delimiters to it
- pass it to the SendFuncs
- if a SendFunc returns a BYTEA (because it has not been updated to write
directly to the buffer), use the BYTEA
- if not, do nothing, the data is already sent

- pqcomm.c
- removed memcpy on large blocks

- others
- removed some memcpy
- inlines added at strategic points (total size of executable is
actually smaller with the patch)

* Results

See performance numbers above.
It does generate the same output as old COPY (ie there doesn't seem to be
any bugs)
Should be 100% backward compatible, no syscatalogs change.

* Side effects

Uses less memory for big TEXT or BYTEA fields, since less copies are made.
This could be extended to make serialisation of query results sent to the
frontend faster.
Breaks COPY CSV, I need to fix it (it's simple, I just didn't have time).
I have ideas for COPY FROM too.

* Thoughts

Not for commitfest ;) it's too late.
Patch needs refactoring.
Maybe fuse StringInfo and WBuf together, with bits of pq_send*, maybe not,
have to think about this.
Some types have extremely slow outfuncs (for instance, box).

COPY BINARY should include in its header (in the variable-length field
specified for this), a sample of all types used in the table, that are
non-portable.
For instance, put a TIMESTAMP of a known value. On reading, check this :
if it's wrong, perhaps the dump was generated by a postgres with float
timestamps ?...
This would have another advantage : with the column types stored in the
header, you'd no longer ask yourself "hmmmm..... how many alter tables did
I do since I made this dump that doesn't seem to want to load ?..."
Also, currently you can load a binary dump of INTs in a DATE column and it
will work perfectly OK (except the dates will be rubbish of course).

With this patch COPY BINARY gets fast enough to sometimes saturate a
gigabit ethernet link...

Patch is for 8.4.0, if someone wants to try it.

Attachment Content-Type Size
pg_8.4.0_copyto_patch_4.txt text/plain 60.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Scrappy 2009-08-11 13:16:51 Re: HEAD docs
Previous Message Magnus Hagander 2009-08-11 12:43:58 Re: HEAD docs