Re: general questions postgresql performance config

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Dino Vliet <dino_vliet(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: general questions postgresql performance config
Date: 2010-01-26 02:12:26
Message-ID: 4B5E4F8A.2060008@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 26/01/2010 12:15 AM, Dino Vliet wrote:

> 5) Other considerations?

To get optimal performance for bulk loading you'll want to do concurrent
data loading over several connections - up to as many as you have disk
spindles. Each connection will individually be slower, but the overall
throughput will be much greater.

Just how many connections you'll want to use depends on your I/O
subsystem and to a lesser extent your CPU capacity.

Inserting data via multiple connections isn't something every data
loading tool supports, so make sure to consider this carefully.

Another consideration is how you insert the data. It's vital to insert
your data in large transaction-batched chunks (or all in one
transaction) ; even with synchronized_commit = off you'll still see
rather poor performance if you do each INSERT in its own transaction.
Doing your inserts as prepared statements where each INSERT is multi-row
valued will help too.

Even better is to use COPY to load large chunks of data. libpq provides
access to the COPY interface if you feel like some C coding. The JDBC
driver (dev version only so far) now provides access to the COPY API, so
you can also bulk-load via Java very efficiently now.

If your data needs little/no transformation and is externally validated
you can look into pg_bulkload as an alternative to all this.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2010-01-26 02:24:07 Re: Log full of: statement_timeout out of the valid range.
Previous Message Jeff Davis 2010-01-25 23:44:04 Re: postgres