From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Steven Schlansker <steven(at)likeness(dot)com> |
Cc: | François Beausoleil <francois(at)teksol(dot)info>, Forums postgresql <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Deduplication and transaction isolation level |
Date: | 2013-09-25 17:52:48 |
Message-ID: | CAHyXU0yeiSxs2bGGjm9fo+CRw0AyYYBKwMc+KhGnpjr=KzJqkQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Sep 25, 2013 at 12:50 PM, Steven Schlansker <steven(at)likeness(dot)com> wrote:
>
> On Sep 25, 2013, at 6:04 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil
>> <francois(at)teksol(dot)info> wrote:
>>> Hi all!
>>>
>>> I import many, many rows of data into a table, from three or more computers, 4 times per hour. I have a primary key, and the query I use to import the data is supposed to dedup before inserting, but I still get primary key violations.
>>>
>>> The import process is:
>>>
>>> * Load CSV data into temp table
>>> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT EXISTS(temp.pkey = dest.pkey)
>>>
>>> I assumed (erroneously) that this would guarantee no duplicate data could make it into the database. The primary key violations are proving me wrong.
>>
>> Right. Transaction A and B are interleaved: they both run the same
>> check against the same id at the same time. Both checks pass because
>> neither transaction is committed. This problem is not solvable by
>> adjusting the isolation level.
>
> Are you sure that this is the case? It is my understanding that since 9.1 with SSI (https://wiki.postgresql.org/wiki/SSI) if you set the transaction isolation level to SERIALIZABLE, this problem is solved, as the insert will take a "predicate lock" and the other insert cannot succeed.
>
> We use this to detect / resolve concurrent inserts that violate primary keys and it works great.
>
> However in this case it probably doesn't help the OP because the cost of restarting the entire import is likely too high.
ah, you're right!
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-09-25 21:47:27 | Re: postgres FDW doesn't support sequences? |
Previous Message | Steven Schlansker | 2013-09-25 17:50:35 | Re: Deduplication and transaction isolation level |