Creating table with data from a join

From: Igor Stassiy <istassiy(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Creating table with data from a join
Date: 2015-07-14 09:12:24
Message-ID: CAKVOjezLRzeSPAZFho21aXQiYCfDBfj5wXRKKkeqRi95BPQu9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I am benchmarking different ways of putting data into table on table
creation:

1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;
2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;
3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |
parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";

(the parallel command is available as part of parallel deb package in
Ubuntu for example, it splits the stdin by newline character and feeds it
to the corresponding command)

Both tables a and b have ~16M records and one of the columns in a is
geometry (ranging from several KB in size to several MB). Columns in b are
mostly integers.

The machine that I am running these commands on has the following
parameters:

default_statistics_target = 50 # pgtune wizard 2012-06-06
maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 constraint_exclusion
= on # pgtune wizard 2012-06-06 checkpoint_completion_target = 0.9 # pgtune
wizard 2012-06-06 effective_cache_size = 48GB # pgtune wizard 2012-06-06
work_mem = 80MB # pgtune wizard 2012-06-06
wal_buffers = 8MB # pgtune wizard 2012-06-06
checkpoint_segments = 16 # pgtune wizard 2012-06-06
shared_buffers = 16GB # pgtune wizard 2012-06-06
max_connections = 400 # pgtune wizard 2012-06-06

One would expect the 3rd option to be faster than 1 and 2, however 2
outperforms both by a large margin (sometimes x2). This is especially
surprising taking into account that COPY doesn't acquire a global lock on
the table, only a RowExclusiveLock
(according to
http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)

So is option 2 a winner by design? Could you please suggest other
alternatives to try (if there are any)? And what might be the reason that 3
is not outperforming the other 2?

Thank you,
Igor

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shujie Shang 2015-07-14 09:59:32 Re: Index Only Scan vs Cache
Previous Message Scott Ribe 2015-07-14 01:15:12 Re: could not create shared memory segment: Invalid argument