From: | Robert Klemme <shortcutter(at)googlemail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Postgres for a "data warehouse", 5-10 TB |
Date: | 2011-09-12 21:26:10 |
Message-ID: | j4ltdo$r65$1@dough.gmane.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 12.09.2011 19:22, Andy Colson wrote:
> On 9/12/2011 12:15 PM, Robert Klemme wrote:
>> On 11.09.2011 19:02, Marti Raudsepp wrote:
>>> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy(at)squeakycode(dot)net> wrote:
>>>> On 09/11/2011 08:59 AM, Igor Chudov wrote:
>>>>> By the way, does that INSERT UPDATE functionality or something like
>>>>> this exist in Postgres?
>>>> You have two options:
>>>> 1) write a function like:
>>>> create function doinsert(_id integer, _value text) returns void as
>>>> 2) use two sql statements:
>>>
>>> Unfortunately both of these options have caveats. Depending on your
>>> I/O speed, you might need to use multiple loader threads to saturate
>>> the write bandwidth.
>>>
>>> However, neither option is safe from race conditions. If you need to
>>> load data from multiple threads at the same time, they won't see each
>>> other's inserts (until commit) and thus cause unique violations. If
>>> you could somehow partition their operation by some key, so threads
>>> are guaranteed not to conflict each other, then that would be perfect.
>>> The 2nd option given by Andy is probably faster.
>>>
>>> You *could* code a race-condition-safe function, but that would be a
>>> no-go on a data warehouse, since each call needs a separate
>>> subtransaction which involves allocating a transaction ID.
>>
>> Wouldn't it be sufficient to reverse order for race condition safety?
>> Pseudo code:
>>
>> begin
>> insert ...
>> catch
>> update ...
>> if not found error
>> end
>>
>> Speed is another matter though...
> No, I dont think so, if you had two loaders, both would start a
> transaction, then neither could see what the other was doing.
It depends. But the point is that not both INSERTS can succeed. The
one which fails will attempt the UPDATE and - depending on isolation
level and DB implementation - will be blocked or fail.
In the case of PG this particular example will work:
1. TX inserts new PK row
2. TX tries to insert same PK row => blocks
1. TX commits
2. TX fails with PK violation
2. TX does the update (if the error is caught)
> There are
> transaction isolation levels, but they are like playing with fire. (in
> my opinion).
You make them sound like witchcraft. But they are clearly defined -
even standardized. Granted, different RDBMS might implement them in
different ways - here's PG's view of TX isolation:
http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html
In my opinion anybody working with RDBMS should make himself familiar
with this concept - at least know about it - because it is one of the
fundamental features of RDBMS and certainly needs consideration in
applications with highly concurrent DB activity.
Kind regards
robert
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Schnabel | 2011-09-12 22:09:18 | Re: Allow sorts to use more available memory |
Previous Message | Scott Marlowe | 2011-09-12 21:00:29 | Re: Postgres for a "data warehouse", 5-10 TB |