From: | "Colin 't Hart" <cthart(at)yahoo(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: What's the fastest way to do this? |
Date: | 2001-11-09 08:39:10 |
Message-ID: | 9sg4mn$1r5e$1@news.tht.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Einar Karttunen writes:
> > I have several really big tables that have columns uniquely identified
by
> > single or multiple rows. [ I have about 25 tables, 10k to 500k rows
> > per table ]
> >
> > Each day I get a flat file of updates. I have no way of knowing which
> > lines in the file are new records and which are updates for existing
> > records.
> >
> > I need a way to insert the new ones and update the old ones. I have
> > a couple of ideas but none of them seem fast enough ( I will soon
> > be getting updates faster than I can feed them into the database ).
>
> I was facing a similar problem some time ago. My solution was to create
> a temp table and COPY the new data to it. After that I deleted all records
> in the original table which existed in the temporary table. Then I just
> did a insert from a select * from the temp table. Of course with this
> approach you have to lock the tables.
>
> - Einar Karttunen
Even faster is to:
1. load the data into a temporary table (even faster is to load into a
permanent
table -- just truncate it first in each run). Let's call this table
ImportTable.
2. update the existing records
UPDATE rt
SET a = t.a, b = x.b, c = x.c
FROM RealTable AS rt, ImportTable AS x
WHERE rt.pk = x.pk
3. insert the new records
INSERT INTO RealTable(pk, a, b, c)
SELECT pk, a, b, c
FROM ImportTable AS x
WHERE NOT EXISTS (SELECT 1 FROM RealTable WHERE pk = x.pk)
'pk' is the primary key of the table, or, at worst, a UNIQUEly INDEXed
column
combination.
This avoids the expensive DELETE operation (DBMSs are generally better at
INSERTing and UPDATEing than DELETEing, and remember that the DELETEs have
to
go through the transaction log).
Don't use cursors if you can help it -- cursors can be up to several orders
of
magnitude slower, and usually at least 4 times slower.
Using an import table allows you to sanitize the data by insert a step to do
this between steps 1 and 2.
Cheers,
Colin
From | Date | Subject | |
---|---|---|---|
Next Message | darwin | 2001-11-09 09:22:10 | problem connecting client to a postgresSQL server |
Previous Message | Stuart Robinson | 2001-11-09 07:51:33 | inserts on views using rules |