Quick Links

Re: Creating table with data from a join

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Igor Stassiy <istassiy(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Creating table with data from a join
Date:	2015-07-14 16:18:58
Message-ID:	55A53672.2050004@dalibo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 14/07/2015 11:12, Igor Stassiy wrote:
> Hello,
>
> I am benchmarking different ways of putting data into table on table
> creation:
>
> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id <http://a.id> = b.id
> <http://b.id>;
> 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id <http://a.id> = b.id
> <http://b.id>;
> 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id <http://a.id> = b.id
> <http://b.id>) TO STDOUT" |
> parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
>
> (the parallel command is available as part of parallel deb package in
> Ubuntu for example, it splits the stdin by newline character and feeds
> it to the corresponding command)
>
> Both tables a and b have ~16M records and one of the columns in a is
> geometry (ranging from several KB in size to several MB). Columns in b
> are mostly integers.
>
> The machine that I am running these commands on has the following
> parameters:
>
> default_statistics_target = 50 # pgtune wizard 2012-06-06
> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> constraint_exclusion = on # pgtune wizard 2012-06-06
> checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> effective_cache_size = 48GB # pgtune wizard 2012-06-06
> work_mem = 80MB # pgtune wizard 2012-06-06
> wal_buffers = 8MB # pgtune wizard 2012-06-06
> checkpoint_segments = 16 # pgtune wizard 2012-06-06
> shared_buffers = 16GB # pgtune wizard 2012-06-06
> max_connections = 400 # pgtune wizard 2012-06-06
>
> One would expect the 3rd option to be faster than 1 and 2, however 2
> outperforms both by a large margin (sometimes x2). This is especially
> surprising taking into account that COPY doesn't acquire a global lock
> on the table, only a RowExclusiveLock
> (according
> to http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)
>

What is wal_level value? I think this is because of an optimisation
happening with wal_level = minimal:

"In minimal level, WAL-logging of some bulk operations can be safely
skipped, which can make those operations much faster"

see http://www.postgresql.org/docs/current/static/runtime-config-wal.html

> So is option 2 a winner by design? Could you please suggest other
> alternatives to try (if there are any)? And what might be the reason
> that 3 is not outperforming the other 2?
>
> Thank you,
> Igor
>
>

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

In response to

Creating table with data from a join at 2015-07-14 09:12:24 from Igor Stassiy

Responses

Re: Creating table with data from a join at 2015-07-14 16:21:31 from Igor Stassiy

Browse pgsql-general by date

	From	Date	Subject
Next Message	Igor Stassiy	2015-07-14 16:21:31	Re: Creating table with data from a join
Previous Message	Ramesh T	2015-07-14 14:13:19	Re: timestamp check