Quick Links

Re: bogus: logical replication rows/cols combinations

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: bogus: logical replication rows/cols combinations
Date:	2022-05-03 19:40:04
Message-ID:	a7a5e79f-053a-cff8-6b2c-f7ac4bab9920@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 5/2/22 22:34, Peter Eisentraut wrote:
> On 01.05.22 23:42, Tomas Vondra wrote:
>> Imagine have a table with customers from different regions, and you want
>> to replicate the data somewhere else, but for some reason you can only
>> replicate details for one particular region, and subset of columns for
>> everyone else. So you'd do something like this:
>>
>> CREATE PUBLICATION p1 FOR TABLE customers (... all columns ...)
>> WHERE region = 'USA';
>>
>> CREATE PUBLICATION p1 FOR TABLE customers (... subset of columns ...)
>> WHERE region != 'USA';
>>
>> I think ignoring the row filters and just merging the column lists makes
>> no sense for this use case.
>
> I'm thinking now the underlying problem is that we shouldn't combine
> column lists at all. Examples like the above where you want to redact
> values somehow are better addressed with something like triggers or an
> actual "column filter" that works dynamically or some other mechanism.
>

So what's wrong with merging the column lists as implemented in the v2
patch, posted a couple days ago?

I don't think triggers are a suitable alternative, as it executes on the
subscriber node. So you have to first copy the data to the remote node,
where it gets filtered. With column filters the data gets redacted on
the publisher.

> The main purpose, in my mind, of column lists is if the tables
> statically have different shapes on publisher and subscriber. Perhaps
> for space reasons or regulatory reasons you don't want to replicate
> everything. But then it doesn't make sense to combine column lists. If
> you decide over here that the subscriber table has this shape and over
> there that the subscriber table has that other shape, then the
> combination of the two will be a table that has neither shape and so
> will not work for anything.
>

Yeah. If we intend to use column lists only to adapt to a different
schema on the subscriber node, then maybe it'd be fine to not merge
column lists. It'd probably be reasonable to allow at least cases with
multiple publications using the same column list, though. In that case
there's no ambiguity.

> I think in general we should be much more restrictive in how we combine
> publications. Unless we are really sure it makes sense, we should
> disallow it. Users can always make a new publication with different
> settings and subscribe to that directly.

I agree with that in principle - correct first, flexibility second. If
the behavior is not correct, it doesn't matter how flexible it is.

I still think the data redaction use case is valid/interesting, but if
we want to impose some restrictions I'm OK with that, as long as it's
done in a way that we can relax in the future to allow that use case
(that is, without introducing any incompatibilities).

However, what's the definition of "correctness" in this context? Without
that it's hard to say if the restrictions make the behavior any more
correct. It'd be unfortunate to impose restritions, which will prevent
some use cases, only to discover we haven't actually made it correct.

For example, is it enough to restrict column lists, or does it need to
restrict e.g. row filters too? And does it need to consider other stuff,
like publications replicating different actions?

For example, if we allow different column lists (or row filters) for
different actions (one publication for insert, another one for update),
we still have the strange behavior described before.

And if we force users to use separate subscriptions, I'm not sure that
really improves the situation for users who actually need that. They'll
do that, and aside from all the problems they'll also face issues with
timing between the two concurrent subscriptions, having to decode stuff
multiple times, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: bogus: logical replication rows/cols combinations at 2022-05-02 20:34:48 from Peter Eisentraut

Responses

Re: bogus: logical replication rows/cols combinations at 2022-05-04 13:56:13 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2022-05-03 19:44:11	Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Previous Message	Andres Freund	2022-05-03 19:13:05	Re: failures in t/031_recovery_conflict.pl on CI