From: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | Igor Stassiy <istassiy(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Creating table with data from a join |
Date: | 2015-07-14 18:02:22 |
Message-ID: | B6F6FD62F2624C4C9916AC0175D56D8828C03ECB@jenmbs01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>Hello,
>
>I am benchmarking different ways of putting data into table on table creation:
>
>1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;
>2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;
>3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |
>parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
>
>(the parallel command is available as part of parallel deb package in Ubuntu for example, it splits the stdin by newline character and feeds it to the corresponding command)
>
>Both tables a and b have ~16M records and one of the columns in a is geometry (ranging from several KB in size to several MB). Columns in b are mostly integers.
>
>The machine that I am running these commands on has the following parameters:
>
>default_statistics_target = 50 # pgtune wizard 2012-06-06 maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 constraint_exclusion = on # pgtune wizard 2012-06-06 checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 effective_cache_size = 48GB # pgtune wizard 2012-06-06
>work_mem = 80MB # pgtune wizard 2012-06-06
>wal_buffers = 8MB # pgtune wizard 2012-06-06
>checkpoint_segments = 16 # pgtune wizard 2012-06-06
>shared_buffers = 16GB # pgtune wizard 2012-06-06
>max_connections = 400 # pgtune wizard 2012-06-06
>
>One would expect the 3rd option to be faster than 1 and 2, however 2 outperforms both by a large margin (sometimes x2). This is especially surprising taking into account that COPY doesn't acquire a global lock on the table, only a RowExclusiveLock
which PG Version ?
I find interesting, that 2 outperforms 1.
The only explanation I can imagine is that "CREATE TABLE AS" freezes the data on the fly, as possible with "COPY FROM"
(http://www.postgresql.org/docs/9.4/interactive/sql-copy.html)
You may try parallel insert without using STDIN using modulo. Just start these 4 queries simultaneously:
INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id
WHERE a.id%4 = [0,1,2,3}
I usually avoid parallel INSERTS to avoid I/O contention and random distribution within the target tables.
Are you monitoring the I/O activity in your tests ?
Have you tried to use only 2 parallel processes?
regards,
Marc Mamin
>(according to http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)
>
>So is option 2 a winner by design? Could you please suggest other alternatives to try (if there are any)? And what might be the reason that 3 is not outperforming the other 2?
>
>Thank you,
>Igor
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Marc Mamin | 2015-07-14 18:19:51 | Re: Index Only Scan vs Cache |
Previous Message | Igor Stassiy | 2015-07-14 16:50:52 | Re: Creating table with data from a join |