From: | Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> |
---|---|
To: | Scott Bailey <artacus(at)comcast(dot)net> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: SQL Programming Question |
Date: | 2010-09-11 10:09:33 |
Message-ID: | 0FB30C67-B41E-42F8-8211-8DCBD2A1D1DC@solfertje.student.utwente.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 11 Sep 2010, at 6:10, Scott Bailey wrote:
> On 09/10/2010 08:07 PM, tony(at)exquisiteimages(dot)com wrote:
>> I have a situation where I receive a file with transactions that have a
>> unique key from a vendor. These transactions should only be imported into
>> my system once, but the vendor system will occasionally resend a
>> transaction by mistake.
>>
>> The way I am currently handling this with Micorosft ADO and FoxPro files
>> is to open a table with an index on the vendor key and seek on the key. If
>> there is no match I add it, if there is a match I put it in an exception
>> file to be manually checked.
>>
>> Using PostgreSQL I can't open a table and do seeks against an index. I
>> could do a select against the database and see if 0 records are returned,
>> but that seems to take more time than doing a seek on an index. Is there a
>> more SQL friendly way of handling this task?
>
> Postgres isn't going to just use the index because it needs the visibility information in the table. But it will be better to load all of the data into a staging table using COPY and then insert the missing rows from there. It will be a ton faster than going a row at a time, looking for a match then doing an insert.
This is probably the best way to go about this.
Basically you perform:
BEGIN;
-- read in data-file
COPY staging_table FROM STDIN;
Your data here
\.
-- delete duplicates
DELETE FROM staging_table USING live_table
WHERE live_table.key = staging_table.key;
-- insert remaining data (non-duplicates)
INSERT INTO live_table (key, data1, data2, etc)
SELECT key, data1, data2, etc
FROM staging_table;
COMMIT;
You could add a step before deleting duplicates that would:
INSERT INTO duplicate_table (key, data1, data2, etc)
SELECT key, data1, data2, etc
FROM staging_table
WHERE EXISTS (SELECT 1 FROM live_table WHERE key = staging_table.key);
The following are a few would-be-nice-to-have's that AFAIK aren't possible yet. Often the reason we don't have these is the SQL standard, which is a pretty good reason. Still...
It would be great to be able to use a WITH statement to lock down a data set for multiple subsequent operations, something like:
WITH nonduplicates (key, data1, data2, etc) AS (
SELECT key, data1, data2, etc FROM staging_table
EXCEPT
SELECT key, data1, data2, etc FROM live_table
)
INSERT INTO live_table (key, data1, data2, etc)
SELECT key, data1, data2, etc FROM nonduplicates
RETURNING key, data1, data2, etc
UNION ALL
DELETE FROM staging_table USING nonduplicates
WHERE key = nonduplicates.key
RETURNING key, data1, data2, etc;
Or something like that. It's just an example from what I have in mind, after all ;)
But of course for this particular situation it would be really ideal to be able to just do:
MOVE * FROM staging_table TO live_table WHERE NOT EXISTS (
SELECT 1 FROM live_table WHERE key = staging_table.key
);
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4c8b557b10401521071037!
From | Date | Subject | |
---|---|---|---|
Next Message | Alban Hertroys | 2010-09-11 10:30:35 | Re: SQL Programming Question |
Previous Message | 夏武 | 2010-09-11 06:21:31 | Help! pg_dump: Error message from server: ERROR: cache lookup failed for type 19 |