Re: How can I prevent duplicate inserts by multiple concurrent threads ?

From: Sean Davis <seandavi(at)gmail(dot)com>
To: Hursh Jain <hurshjain(at)beesell(dot)com>
Cc: "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: How can I prevent duplicate inserts by multiple concurrent threads ?
Date: 2015-01-07 19:35:24
Message-ID: CANeAVBkiF72mayAHsibbWc0ir1wEYt_b_5W5zcL=Xm=TsFh-ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Wed, Jan 7, 2015 at 2:29 PM, Hursh Jain <hurshjain(at)beesell(dot)com> wrote:

> I am running postgres 9.3.x and am feeling a little confused about
> transactions, isolations and locks, as they pertain to preventing
> duplicate concurrent *inserts* of the same data. No updates, only inserts.
>
> I am using JDBC and using the following code pattern, invoked from
> multiple front-end web server threads:
>
> getData()
> 1...get JDBC connection (using default transaction isolation).
> 2. START TRANSACTION
> 3....some prelim processing...
> 4. for a bunch of data chunks (say 100 different chunks),
> 4a...check if data chunk already exists in DB ?
> QUERY: select count(*) from table where data_chunk_id = 123
> 4b...if not exist (count is 0), create data and insert data into table
> 4c...else if already exists (count > 0), retrieve existing data from
> table
> 5. Add up all the chunks, and create a JSON string for the client
> 6. COMMIT TRANSACTION
> 7. write JSON to client
>
> We really only need to create/save these chunks the first time they are
> requested and then send only the already created ones thereafter. The
> issue is concurrent threads are running inside getData() the very first
> time the request is made (so the initial request, like subsequent
> requests, can be made by multiple clients at the same time).
>
> So, then:
>
> - Thread #1 is in step 4a and sees no data so inserts data into table.
> - Before it reaches Step 4b and commits, Thread #2 reaches step 4a as well.
> - Since Thread #1 has not committed yet, Thread #2 does not see any data
> in the table and also inserts data into the table.
> - Both threads insert the data and I get duplicate rows. (sometimes
> N-duplicates, for N threads).
>
> Any suggestions, tips ? Am I at least thinking along the right lines ?
> Should I be getting a table lock instead ? What is the recommended way
> to prevent duplicate inserts in a concurrent situation like this ?
>

Is there a reason that you cannot use a UNIQUE constraint?

> (again, there are no updates, only one-time inserts if that data does
> not exist already).
>
> I'm thinking, I could do this:
>
> Option (A)
> getData()
> 1...get JDBC connection (using default transaction isolation).
> 2. START TRANSACTION
> 3. ==> SET SAVEPOINT A
> 4. for a bunch of data chunks (say 100 different chunks),
> 4a...check if this data chunk already exists in DB, get ROW LOCK
> Query: select data_chunk_id from table where data_chunk_id = 123
> FOR UPDATE
> 4b...if null row returned, create data and insert data into table
> 4c...else use already retrieved row from table
> 4d...COMMIT TO SAVEPOINT A
> 5. Add up all the chunks, and create a JSON string for the client
> 6. COMMIT TRANSACTION
> 7. write JSON to client
>
> Another option
>
> Option (B)
> getData()
> 1...get JDBC connection (using default transaction isolation).
> 2. start transaction SERIALIZABLE
> 3. ==> SET SAVEPOINT A
> 4. for a bunch of data chunks (say 100 different chunks),
> 4a. always create data and insert data into table
> 4c...if error, ROLLBACK to A, else COMMIT
> 5. Add up all the chunks, and create a JSON string for the client
> 6. COMMIT TRANSACTION
> 7. write JSON to client
>
> This does imply I need to add a uniqueness constraint to the data (which
> I can, although right now there are no constraints).
>
> Any feedback appreciated...
>
> Best,
> --j
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Hursh Jain 2015-01-07 20:01:42 Re: How can I prevent duplicate inserts by multiple concurrent threads ?
Previous Message Hursh Jain 2015-01-07 19:29:33 How can I prevent duplicate inserts by multiple concurrent threads ?