Re: Creating table with data from a join

From: Igor Stassiy <istassiy(at)gmail(dot)com>
To: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Creating table with data from a join
Date: 2015-07-14 16:50:52
Message-ID: CAKVOjezNxzCPZWgaxT_0MND5B7BK=wMh+jBmu3-eAkhBD7xJWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Julien, I would gladly provide more information, I am just not sure what to
add.

I would be willing to leave the server compromised for things like corrupts
or data losses during the time of this import, but the server has to be up
and running before and after the import, if it is successful (so I can't
take it down then change some parameters and start it up with again).

On Tue, Jul 14, 2015 at 6:37 PM Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
wrote:

> On 14/07/2015 18:21, Igor Stassiy wrote:
> > Julien, I have the following setting for WAL level: #wal_level = minimal
> > (which defaults to minimal anyway)
> >
>
> Sorry, I sent my mail too early :/
>
> So, option #2 is winner by design. You didn't say anything about your
> needs, so it's hard to help you much more.
>
> If you don't care about losing data on this table if your server
> crashes, you can try option #3 with an unlogged table.
>
>
> > On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud
> > <julien(dot)rouhaud(at)dalibo(dot)com <mailto:julien(dot)rouhaud(at)dalibo(dot)com>> wrote:
> >
> > On 14/07/2015 11:12, Igor Stassiy wrote:
> > > Hello,
> > >
> > > I am benchmarking different ways of putting data into table on
> table
> > > creation:
> > >
> > > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id <http://a.id>
> > <http://a.id> = b.id <http://b.id>
> > > <http://b.id>;
> > > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id <http://a.id>
> > <http://a.id> = b.id <http://b.id>
> > > <http://b.id>;
> > > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id <http://a.id>
> > <http://a.id> = b.id <http://b.id>
> > > <http://b.id>) TO STDOUT" |
> > > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
> > >
> > > (the parallel command is available as part of parallel deb package
> in
> > > Ubuntu for example, it splits the stdin by newline character and
> feeds
> > > it to the corresponding command)
> > >
> > > Both tables a and b have ~16M records and one of the columns in a
> is
> > > geometry (ranging from several KB in size to several MB). Columns
> in b
> > > are mostly integers.
> > >
> > > The machine that I am running these commands on has the following
> > > parameters:
> > >
> > > default_statistics_target = 50 # pgtune wizard 2012-06-06
> > > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> > > constraint_exclusion = on # pgtune wizard 2012-06-06
> > > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> > > effective_cache_size = 48GB # pgtune wizard 2012-06-06
> > > work_mem = 80MB # pgtune wizard 2012-06-06
> > > wal_buffers = 8MB # pgtune wizard 2012-06-06
> > > checkpoint_segments = 16 # pgtune wizard 2012-06-06
> > > shared_buffers = 16GB # pgtune wizard 2012-06-06
> > > max_connections = 400 # pgtune wizard 2012-06-06
> > >
> > > One would expect the 3rd option to be faster than 1 and 2, however
> 2
> > > outperforms both by a large margin (sometimes x2). This is
> especially
> > > surprising taking into account that COPY doesn't acquire a global
> lock
> > > on the table, only a RowExclusiveLock
> > > (according
> > > to
> > http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)
> > >
> >
> > What is wal_level value? I think this is because of an optimisation
> > happening with wal_level = minimal:
> >
> > "In minimal level, WAL-logging of some bulk operations can be safely
> > skipped, which can make those operations much faster"
> >
> > see
> >
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
> >
> > > So is option 2 a winner by design? Could you please suggest other
> > > alternatives to try (if there are any)? And what might be the
> reason
> > > that 3 is not outperforming the other 2?
> > >
> > > Thank you,
> > > Igor
> > >
> > >
> >
> >
> > --
> > Julien Rouhaud
> > http://dalibo.com - http://dalibo.org
> >
>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Mamin 2015-07-14 18:02:22 Re: Creating table with data from a join
Previous Message Julien Rouhaud 2015-07-14 16:36:59 Re: Creating table with data from a join