From: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: logical changeset generation v6.4 |
Date: | 2013-10-21 15:26:47 |
Message-ID: | 526547B7.5030009@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/21/2013 05:06 PM, Andres Freund wrote:
> On 2013-10-21 16:40:43 +0200, Hannu Krosing wrote:
>> On 10/18/2013 08:50 PM, Andres Freund wrote:
>>> On 2013-10-18 08:11:29 -0400, Robert Haas wrote:
>> ...
>>>> 2. If that seems too complicated, how about just logging the whole old
>>>> tuple for version 1?
>>> I think that'd make the patch much less useful because it bloats WAL
>>> unnecessarily for the primary user (replication) of it. I'd rather go
>>> for primary keys only if that proves to be the contentious point.
>>>
>>> How about modifying the selection to go from:
>>> * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL;
>>> * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname
>>> * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb)
>>> * primary key
>>> * candidate key with the smallest oid
>>>
>>> Including the candidate key will help people using changeset extration
>>> for auditing that do not have primary key. That really isn't an
>>> infrequent usecase.
>> As I understand it for a table with *no* unique index,
>> the "candidate key" is the full tuple, so if we get an UPDATE for
>> it then this should be replicated as
>> "UPDATE first row matching (NOT DISTINCT FROM) all columns"
>> which on replay side will be equivalent to
>> CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...'
> No, it's not a candidate key since it's not uniquely identifying a
> row. You can play tricks as you describe, but that still doesn't make
> the whole row a candidate key.
>
> But anyway, I suggest allowing for logging all columns above...
I the "all columns" option this ?
How about modifying the selection to go from:
* all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL;
for some reason I thought it to be option to either log or not log PK column ...
>
>> I know that this will slow down replication, as you can not use direct
>> index updates internally - at least not easily - but need to let postgreSQL
>> actually plan this, but such single row update is no faster on origin
>> either.
> That's not actually true. Consider somebody doing something like:
> UPDATE big_table_without_indexes SET column = ...;
> On the source side that's essentialy O(n). If you replicate on a
> row-by-row basis it will be O(n^2) on the replay side.
Probably more like O(n^2 / 2) but yes, this is what I meant with the
sentence
after that ;)
Cheers
--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2013-10-21 15:46:05 | Re: Commitfest II CLosed |
Previous Message | Andres Freund | 2013-10-21 15:20:26 | Re: logical changeset generation v6.4 |