Quick Links

How should I deal with disconnects during insert?

From:	Sergey Samokhin <prikrutil(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	How should I deal with disconnects during insert?
Date:	2009-05-20 20:41:33
Message-ID:	e42595410905201341k4ab2a4d2nd19d445775b75dc4@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello!

Problem I'm talking about in this letter is related to how we deal
with extremely big amounts of data to be inserted into DB. Recently I
switched to using buffers.

The best approach to make a buffer I know (and the only I can use with
my driver) is to create a long query (possibly made of some
statements) like:

"INSERT INTO test VALUES (1, 2), (3, 4) ... (N, M); INSERT INTO test
VALUES (7, 8), (9, 10) ... (N2, M2);"

...and then pass it to query() function your driver has (no matter
what driver you use, it should be quite fast).

But what will happen if the connection to PostgreSQL is lost during
the execution?

There are two different things which may happen with such a buffer:

1) I've found that once PostgreSQL has started writing, connection can
safely be lost without affecting the data passed. In this case all the
data will be inserted, because PostgreSQL has the whole query to
process.

2) Data can't be inserted in the case the connection is lost before
PostgreSQL has started such a writing (i.e. during transmission of the
query).

Unfortunatelly, my driver can't say me at which stage the connection
has been lost, so I don't know should I insert it again or not after
reconnect.

Way #1. One of the ways to solve this problem is to guarantee that
query I execute fails completely when connection is lost no matter at
which stage. In this case I can safely insert a buffer after reconnect
without any checks. Is there a way to guarantee that?

Way #2. Another way I see is to have additional table containing
values that show which buffers were inserted successfully. For
example, the last statement in my buffer could be:

INSERT INTO insert_history VALUES (<BUFFER_ID>, <SUCCESSFULLY_INSERTED_FLAG>)

After reconnect, I can check if buffer I'm dealing with has been
inserted or not.

Which way do you prefer and why?

Is there a way to do #1 (it's a bit cleaner and simpler to implement)?

Thanks.

--
Sergey Samokhin

Responses

Re: How should I deal with disconnects during insert? at 2009-05-20 20:45:54 from David Wilson
Re: How should I deal with disconnects during insert? at 2009-05-20 22:18:31 from Sam Mason
Re: How should I deal with disconnects during insert? at 2009-05-26 12:34:40 from Dimitri Fontaine

Browse pgsql-general by date

	From	Date	Subject
Next Message	David Wilson	2009-05-20 20:45:54	Re: How should I deal with disconnects during insert?
Previous Message	Merlin Moncure	2009-05-20 20:01:53	Re: HOT question - insert/delete