Quick Links

Re: Good way to insert/update when you're not sure of duplicates?

From:	Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	Curtis Stanford <curtis(at)stanfordcomputing(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Good way to insert/update when you're not sure of duplicates?
Date:	2003-09-26 06:43:40
Message-ID:	3F73E01C.2060109@persistent.co.in
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Richard Huxton wrote:

> On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
>
>>I'm in a situation where I need to load an ASCII file into a database. No
>>sweat so far right? The records are indexed by date and I have a unique
>>index on date. The ASCII can overlap, meaning it can contain duplicate
>>dates that have been loaded before.
>>
>>I started out with this ingenious idea. Just try to insert the record. If I
>>get an error about duplicate keys, do an update and switch to update mode.
>>If I'm updating and get zero rows updated, switch back to insert. Works
>>fine, except this whole thing is in a transaction and any errors abort the
>>transaction and I can't commit it. So, I decided I'd have to try to update
>>every single record and, if zero rows were updated, do the insert. This
>>takes a looooong time. Many hours for just 86000 records or so.
>>
>>Is there a standard way to do this. I can't imagine I'm the only guy that
>>need to do this kind of thing.
>
>
> Try inserting a batch of 1024. If you get an error, drop down to 512 and try
> that. Repeat until the batch works or you've reached a size of 1.
> If the batch worked, try the next set of records and repeat. If you reached a
> batch size of 1 with no success then, switch to updating, and repeat the
> cycle increasing your batch-size as you go.
>
> You might find it quickest to halve batch-size while having problems then
> doubling while it works. The balance is going to depend on how many insert vs
> update rows you have.
>

In addition to that, you can try inserting from multiple backends simaltenously
to speed up the whole process.

And I don't like the modes idea OP gave. I would rather follow
insert->if-error-update mode in a transaction for each record. And fork over say
20/40 parallel backends to achieve good speed.

Shridhar

In response to

Re: Good way to insert/update when you're not sure of duplicates? at 2003-09-25 17:27:18 from Richard Huxton

Responses

Re: Good way to insert/update when you're not sure of duplicates? at 2003-09-26 15:18:28 from Dennis Gearon

Browse pgsql-general by date

	From	Date	Subject
Next Message	Marek Lewczuk	2003-09-26 07:31:00	Re: MySQL-to-PostgreSQL
Previous Message	zhuangjifeng	2003-09-26 05:43:38	about catalog!