Re: Question on Insert / Update

From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Alex <alex(at)meerkatsoft(dot)com>
Cc: Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question on Insert / Update
Date: 2005-11-09 15:08:28
Message-ID: 1131548908.8979.11.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I guess the best solution is one which allows you to do it in batches,
as inserting is more efficient if you don't commit after each insert.

On Wed, 2005-11-09 at 15:45, Alex wrote:
> Hi,
> have just a general question...
>
> I have a table of 10M records, unique key on 5 fields.
> I need to update/insert 200k records in one go.
>
> I could do a select to check for existence and then either insert or update.
> Or simply insert, check on the error code an update if required.
> The 2nd seems to be to logical choice, but will it actually be faster
> and moreover is that the right way to do it?

Are you aware that you can't do the 2nd in the same transaction ? The
error will invalidate the current transaction. So you either do each
insert in it's own transaction, basically with autocommit on, or you
place a savepoint before each insert and roll back to it on the error,
which is probably cheaper than commit but still not free. So I guess
this is not good for batching.

We here do all this kind of stuff by first looking up a batch of rows,
then insert the missing/update the existing also in batch mode.

In any case, I wonder too if it can't be done better, cause our imports
are significantly slower on postgres than on other DBs (intentionally no
names mentioned), using the same code and the machines having similar
hardware.

Cheers,
Csaba.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-11-09 15:20:28 Re: insert on duplicate update?
Previous Message Richard Huxton 2005-11-09 14:58:11 Re: newbie design question re impact of VACUUM