From: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | "Robert Klemme" <shortcutter(at)googlemail(dot)com>, "Marti Raudsepp" <marti(at)juffo(dot)org> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Postgres for a "data warehouse", 5-10 TB |
Date: | 2011-09-13 17:54:19 |
Message-ID: | C4DAC901169B624F933534A26ED7DF3103E91769@JENMAIL01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
> (see attachment)
under high concurency you may expect that your data is already in.
In such a case you better do nothing at all:
begin
select dat=a_dat from t where id=a_id into test:
if test is null then
begin
insert into t (id, dat) values (a_id, a_dat);
exception
when unique_violation then
update t set dat = a_dat where id = a_id and dat <> a_dat;
return 0;
end;
elsif not test then
update t set dat = a_dat where id = a_id;
return 0;
end if;
return 1;
best regards,
Marc Mamin
-----Ursprüngliche Nachricht-----
Von: pgsql-performance-owner(at)postgresql(dot)org im Auftrag von Robert Klemme
Gesendet: Di 9/13/2011 6:34
An: Marti Raudsepp
Cc: pgsql-performance(at)postgresql(dot)org
Betreff: Re: [PERFORM] Postgres for a "data warehouse", 5-10 TB
On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Tue, Sep 13, 2011 at 00:26, Robert Klemme <shortcutter(at)googlemail(dot)com> wrote:
>> In the case of PG this particular example will work:
>> 1. TX inserts new PK row
>> 2. TX tries to insert same PK row => blocks
>> 1. TX commits
>> 2. TX fails with PK violation
>> 2. TX does the update (if the error is caught)
>
> That goes against the point I was making in my earlier comment. In
> order to implement this error-catching logic, you'll have to allocate
> a new subtransaction (transaction ID) for EVERY ROW you insert.
I don't think so. You only need to catch the error (see attachment).
Or does this create a sub transaction?
> If
> you're going to be loading billions of rows this way, you will invoke
> the wrath of the "vacuum freeze" process, which will seq-scan all
> older tables and re-write every row that it hasn't touched yet. You'll
> survive it if your database is a few GB in size, but in the terabyte
> land that's unacceptable. Transaction IDs are a scarce resource there.
Certainly. But it's not needed as far as I can see.
> In addition, such blocking will limit the parallelism you will get
> from multiple inserters.
Yes, I mentioned the speed issue. But regardless of the solution for
MySQL's "INSERT..ON DUPLICATE KEY UPDATE" which Igor mentioned you
will have the locking problem anyhow if you plan to insert
concurrently into the same table and be robust.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Marti Raudsepp | 2011-09-13 18:11:31 | Re: Postgres for a "data warehouse", 5-10 TB |
Previous Message | Robert Klemme | 2011-09-13 16:34:09 | Re: Postgres for a "data warehouse", 5-10 TB |