Re: row filtering for logical replication

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Greg Nancarrow" <gregn4422(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "vignesh C" <vignesh21(at)gmail(dot)com>, "Peter Smith" <smithpb2250(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "Ajin Cherian" <itsajin(at)gmail(dot)com>, "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, "Rahila Syed" <rahilasyed90(at)gmail(dot)com>, "Peter Eisentraut" <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "David Steele" <david(at)pgmasters(dot)net>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-29 15:10:09
Message-ID: ca8d270d-f930-4d15-9f24-60f95b364173@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 29, 2021, at 7:11 AM, Amit Kapila wrote:
> I don't think it is a good idea to combine the row-filter from the
> publication that publishes just 'insert' with the row-filter that
> publishes 'updates'. We shouldn't apply the 'insert' filter for
> 'update' and similarly for publication operations. We can combine the
> filters when the published operations are the same. So, this means
> that we might need to cache multiple row-filters but I think that is
> better than having another restriction that publish operation 'insert'
> should also honor RI columns restriction.
That's exactly what I meant to say but apparently I didn't explain in details.
If a subscriber has multiple publications and a table is part of these
publications with different row filters, it should check the publication action
*before* including it in the row filter list. It means that an UPDATE operation
cannot apply a row filter that is part of a publication that has only INSERT as
an action. Having said that we cannot always combine multiple row filter
expressions into one. Instead, it should cache individual row filter expression
and apply the OR during the row filter execution (as I did in the initial
patches before this caching stuff). The other idea is to have multiple caches
for each action. The main disadvantage of this approach is to create 4x
entries.

I'm experimenting the first approach that stores multiple row filters and its
publication action right now. Unfortunately we cannot use the
relentry->pubactions because it aggregates this information if you have
multiple entries. It seems a separate array should store this information that
will be used later while evaluating the row filter -- around
pgoutput_row_filter_exec_expr() call.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-11-29 15:10:59 Re: Enforce work_mem per worker
Previous Message Robert Haas 2021-11-29 15:05:28 Re: Catching missing Datum conversions