| From: | Curtis Stanford <curtis(at)stanfordcomputing(dot)com> |
|---|---|
| To: | Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org |
| Subject: | Re: Good way to insert/update when you're not sure of duplicates? |
| Date: | 2003-09-25 17:32:11 |
| Message-ID: | 200309251132.11906.curtis@stanfordcomputing.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On September 25, 2003 11:27 am, Richard Huxton wrote:
> On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
> > I'm in a situation where I need to load an ASCII file into a database. No
> > sweat so far right? The records are indexed by date and I have a unique
> > index on date. The ASCII can overlap, meaning it can contain duplicate
> > dates that have been loaded before.
> >
> > I started out with this ingenious idea. Just try to insert the record. If
> > I get an error about duplicate keys, do an update and switch to update
> > mode. If I'm updating and get zero rows updated, switch back to insert.
> > Works fine, except this whole thing is in a transaction and any errors
> > abort the transaction and I can't commit it. So, I decided I'd have to
> > try to update every single record and, if zero rows were updated, do the
> > insert. This takes a looooong time. Many hours for just 86000 records or
> > so.
> >
> > Is there a standard way to do this. I can't imagine I'm the only guy that
> > need to do this kind of thing.
>
> Try inserting a batch of 1024. If you get an error, drop down to 512 and
> try that. Repeat until the batch works or you've reached a size of 1.
> If the batch worked, try the next set of records and repeat. If you reached
> a batch size of 1 with no success then, switch to updating, and repeat the
> cycle increasing your batch-size as you go.
>
> You might find it quickest to halve batch-size while having problems then
> doubling while it works. The balance is going to depend on how many insert
> vs update rows you have.
Hey thanks! I actually got the time down to around 1000 rows/sec. I was
passing in one of the int arguments in the update where clause as a double. I
don't know why, but changing it to an int as it should be drastically reduced
the time. Your solution is very interesting. I'll probably try it and see
which is faster.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Vivek Khera | 2003-09-25 18:37:37 | Re: PostgreSQL at OSCON 2004 |
| Previous Message | Richard Huxton | 2003-09-25 17:27:18 | Re: Good way to insert/update when you're not sure of duplicates? |