From: | "Euler Taveira" <euler(at)eulerto(dot)com> |
---|---|
To: | "Rahila Syed" <rahilasyed90(at)gmail(dot)com> |
Cc: | Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "David Steele" <david(at)pgmasters(dot)net>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: row filtering for logical replication |
Date: | 2021-03-22 02:15:10 |
Message-ID: | 7e4b5c48-a3d3-4b7e-b455-2d50c60a0ab7@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 18, 2021, at 7:51 AM, Rahila Syed wrote:
> 1.
> I think the docs are being incorrectly updated to add a column to pg_partitioned_table
> instead of pg_publication_rel.
Good catch.
> 2. +typedef struct PublicationRelationQual
> +{
> + Oid relid;
> + Relation relation;
> + Node *whereClause;
> +} PublicationRelationQual;
>
> Can this be given a more generic name like PublicationRelationInfo, so that the same struct
> can be used to store additional relation information in future, for ex. column names, if column filtering is introduced.
Good idea. I rename it and it'll be in this next patch set.
> 3. Also, in the above structure, it seems that we can do with storing just relid and derive relation information from it
> using table_open when needed. Am I missing something?
We need the Relation. See OpenTableList(). The way this code is organized, it
opens all publication tables and append each Relation to a list. This list is
used in PublicationAddTables() to update the catalog. I tried to minimize the
number of refactors while introducing this feature. We could probably revise
this code in the future (someone said in a previous discussion that it is weird
to open relations in one source code file -- publicationcmds.c -- and use it
into another one -- pg_publication.c).
> 4. Currently in logical replication, I noticed that an UPDATE is being applied on the subscriber even if the column values
> are unchanged. Can row-filtering feature be used to change it such that, when all the OLD.columns = NEW.columns, filter out
> the row from being sent to the subscriber. I understand this would need REPLICA IDENTITY FULL to work, but would be an
> improvement from the existing state.
This is how Postgres works.
postgres=# create table foo (a integer, b integer);
CREATE TABLE
postgres=# insert into foo values(1, 100);
INSERT 0 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid | xmin | xmax | a | b
-------+--------+------+---+-----
(0,1) | 488920 | 0 | 1 | 100
(1 row)
postgres=# update foo set b = 101 where a = 1;
UPDATE 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid | xmin | xmax | a | b
-------+--------+------+---+-----
(0,2) | 488921 | 0 | 1 | 101
(1 row)
postgres=# update foo set b = 101 where a = 1;
UPDATE 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid | xmin | xmax | a | b
-------+--------+------+---+-----
(0,3) | 488922 | 0 | 1 | 101
(1 row)
You could probably abuse this feature and skip some UPDATEs when old tuple is
identical to new tuple. The question is: why would someone issue the same
command multiple times? A broken application? I would say: don't do it. Besides
that, this feature could impose an overhead into a code path that already
consume substantial CPU time. I've seen some tables with RIF and dozens of
columns that would certainly contribute to increase the replication lag.
> 5. Currently, any existing rows that were not replicated, when updated to match the publication quals
> using UPDATE tab SET pub_qual_column = 'not_filtered' where a = 1; won't be applied, as row
> does not exist on the subscriber. It would be good if ALTER SUBSCRIBER REFRESH PUBLICATION
> would help fetch such existing rows from publishers that match the qual now(either because the row changed
> or the qual changed)
I see. This should be addressed by a resynchronize feature. Such option is
useful when you have to change the row filter. It should certainly be implement
as an ALTER SUBSCRIPTION subcommand.
I attached a new patch set that addresses:
* fix documentation;
* rename PublicationRelationQual to PublicationRelationInfo;
* remove the memset that was leftover from a previous patch set;
* add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise the row
filter code).
--
Euler Taveira
EDB https://www.enterprisedb.com/
Attachment | Content-Type | Size |
---|---|---|
v12-0001-Rename-a-WHERE-node.patch | text/x-patch | 1.7 KB |
v12-0002-Row-filter-for-logical-replication.patch | text/x-patch | 62.3 KB |
v12-0003-Print-publication-WHERE-condition-in-psql.patch | text/x-patch | 1.2 KB |
v12-0004-Publication-WHERE-condition-support-for-pg_dump.patch | text/x-patch | 2.9 KB |
v12-0005-Measure-row-filter-overhead.patch | text/x-patch | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2021-03-22 02:22:07 | Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump. |
Previous Message | Euler Taveira | 2021-03-22 01:58:14 | Re: row filtering for logical replication |