From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: logical changeset generation v6.4 |
Date: | 2013-10-21 15:06:02 |
Message-ID: | 20131021150602.GD2968@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-10-21 16:40:43 +0200, Hannu Krosing wrote:
> On 10/18/2013 08:50 PM, Andres Freund wrote:
> > On 2013-10-18 08:11:29 -0400, Robert Haas wrote:
> ...
> >> 2. If that seems too complicated, how about just logging the whole old
> >> tuple for version 1?
> > I think that'd make the patch much less useful because it bloats WAL
> > unnecessarily for the primary user (replication) of it. I'd rather go
> > for primary keys only if that proves to be the contentious point.
> >
> > How about modifying the selection to go from:
> > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL;
> > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname
> > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb)
> > * primary key
> > * candidate key with the smallest oid
> >
> > Including the candidate key will help people using changeset extration
> > for auditing that do not have primary key. That really isn't an
> > infrequent usecase.
> As I understand it for a table with *no* unique index,
> the "candidate key" is the full tuple, so if we get an UPDATE for
> it then this should be replicated as
> "UPDATE first row matching (NOT DISTINCT FROM) all columns"
> which on replay side will be equivalent to
> CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...'
No, it's not a candidate key since it's not uniquely identifying a
row. You can play tricks as you describe, but that still doesn't make
the whole row a candidate key.
But anyway, I suggest allowing for logging all columns above...
> I know that this will slow down replication, as you can not use direct
> index updates internally - at least not easily - but need to let postgreSQL
> actually plan this, but such single row update is no faster on origin
> either.
That's not actually true. Consider somebody doing something like:
UPDATE big_table_without_indexes SET column = ...;
On the source side that's essentialy O(n). If you replicate on a
row-by-row basis it will be O(n^2) on the replay side.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-10-21 15:10:09 | Re: Commitfest II CLosed |
Previous Message | Hannu Krosing | 2013-10-21 14:45:01 | Re: Commitfest II CLosed |