Re: Creating table with data from a join

From: Igor Stassiy <istassiy(at)gmail(dot)com>
To: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Creating table with data from a join
Date: 2015-07-14 16:21:31
Message-ID: CAKVOjezuNg42PNDhwhio_Eo3OasuRGn+57d45i9KifmTq-3eOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Julien, I have the following setting for WAL level: #wal_level = minimal
(which defaults to minimal anyway)

On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
wrote:

> On 14/07/2015 11:12, Igor Stassiy wrote:
> > Hello,
> >
> > I am benchmarking different ways of putting data into table on table
> > creation:
> >
> > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id <http://a.id> = b.id
> > <http://b.id>;
> > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id <http://a.id> = b.id
> > <http://b.id>;
> > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id <http://a.id> = b.id
> > <http://b.id>) TO STDOUT" |
> > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
> >
> > (the parallel command is available as part of parallel deb package in
> > Ubuntu for example, it splits the stdin by newline character and feeds
> > it to the corresponding command)
> >
> > Both tables a and b have ~16M records and one of the columns in a is
> > geometry (ranging from several KB in size to several MB). Columns in b
> > are mostly integers.
> >
> > The machine that I am running these commands on has the following
> > parameters:
> >
> > default_statistics_target = 50 # pgtune wizard 2012-06-06
> > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> > constraint_exclusion = on # pgtune wizard 2012-06-06
> > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> > effective_cache_size = 48GB # pgtune wizard 2012-06-06
> > work_mem = 80MB # pgtune wizard 2012-06-06
> > wal_buffers = 8MB # pgtune wizard 2012-06-06
> > checkpoint_segments = 16 # pgtune wizard 2012-06-06
> > shared_buffers = 16GB # pgtune wizard 2012-06-06
> > max_connections = 400 # pgtune wizard 2012-06-06
> >
> > One would expect the 3rd option to be faster than 1 and 2, however 2
> > outperforms both by a large margin (sometimes x2). This is especially
> > surprising taking into account that COPY doesn't acquire a global lock
> > on the table, only a RowExclusiveLock
> > (according
> > to http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)
> >
>
> What is wal_level value? I think this is because of an optimisation
> happening with wal_level = minimal:
>
> "In minimal level, WAL-logging of some bulk operations can be safely
> skipped, which can make those operations much faster"
>
> see http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>
> > So is option 2 a winner by design? Could you please suggest other
> > alternatives to try (if there are any)? And what might be the reason
> > that 3 is not outperforming the other 2?
> >
> > Thank you,
> > Igor
> >
> >
>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Julien Rouhaud 2015-07-14 16:31:13 Re: Creating table with data from a join
Previous Message Julien Rouhaud 2015-07-14 16:18:58 Re: Creating table with data from a join