From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shiy(dot)fnst(at)fujitsu(dot)com |
Subject: | Re: Column Filtering in Logical Replication |
Date: | 2022-03-09 13:53:12 |
Message-ID: | 7ce6a43a-830a-931f-d311-a5bfd444b3ae@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/9/22 11:03, Amit Kapila wrote:
> ...
>> Hmm, yeah. That seems like a genuine problem - it should not depend on
>> the order of publications in the subscription, I guess.
>>
>> But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
>> the problem is that we initialize publish_as_relid=relid before the loop
>> over publications, and then just update it. So the first iteration
>> starts with relid, but the second iteration ends with whatever value is
>> set by the first iteration (e.g. the root).
>>
>> So with the example you posted, we start with
>>
>> publish_as_relid = relid = test_part1
>>
>> but then if the first publication is pubviaroot=true, we update it to
>> parent. And in the second iteration, we fail to find the column filter,
>> because "parent" (publish_as_relid) is not part of the pub2.
>>
>> If we do it in the other order, we leave the publish_as_relid value as
>> is (and find the filter), and then update it in the second iteration
>> (and find the column filter too).
>>
>> Now, this can be resolved by re-calculating the publish_as_relid from
>> scratch in each iteration (start with relid, then maybe update it). But
>> that's just half the story - the issue is there even without column
>> filters. Consider this example:
>>
>> create table t (a int, b int, c int) partition by range (a);
>>
>> create table t_1 partition of t for values from (1) to (10)
>> partition by range (a);
>>
>> create table t_2 partition of t_1 for values from (1) to (10);
>>
>> create publication pub1 for table t(a)
>> with (PUBLISH_VIA_PARTITION_ROOT);
>>
>> create publication pub2 for table t_1(a)
>> with (PUBLISH_VIA_PARTITION_ROOT);
>>
>>
>> Now, is you change subscribe to "pub1, pub2" and "pub2, pub1", we'll end
>> up with different publish_as_relid values (t or t_1). Which seems like
>> the same ambiguity issue.
>>
>
> I think we should fix this existing problem by always using the
> top-most table as publish_as_relid. Basically, we can check, if the
> existing publish_as_relid is an ancestor of a new rel that is going to
> replace it then we shouldn't replace it.
Right, using the topmost relid from all publications seems like the
correct solution.
> However, I think even if we
> fix the existing problem, we will still have the order problem for the
> column filter patch, and to avoid that instead of fetching column
> filters in the publication loop, we should use the final
> publish_as_relid. I think it will have another problem as well if we
> don't use final publish_as_relid which is that sometimes when we
> should not use any filter (say when pubviaroot is true and that
> publication has root partitioned table which has no filter) as per our
> rule of filters for a partitioned table, it can still use some filter
> from the non-root table.
>
Yeah, the current behavior is just a consequence of how we determine
publish_as_relid now. If we rework that, we should first determine the
relid and then fetch the filter only for that single rel.
>>
>>> *
>>> Fetching column filter info in tablesync.c is quite expensive. It
>>> seems to be using four round-trips to get the complete info whereas
>>> for row-filter we use just one round trip. I think we should try to
>>> get both row filter and column filter info in just one round trip.
>>>
>>
>> Maybe, but I really don't think this is an issue.
>>
>
> I am not sure but it might matter for small tables. Leaving aside the
> performance issue, I think the current way will get the wrong column
> list in many cases: (a) The ALL TABLES IN SCHEMA case handling won't
> work for partitioned tables when the partitioned table is part of one
> schema and partition table is part of another schema. (b) The handling
> of partition tables in other cases will fetch incorrect lists as it
> tries to fetch the column list of all the partitions in the hierarchy.
>
> One of my colleagues has even tested these cases both for column
> filters and row filters and we find the behavior of row filter is okay
> whereas for column filter it uses the wrong column list. We will share
> the tests and results with you in a later email. We are trying to
> unify the column filter queries with row filter to make their behavior
> the same and will share the findings once it is done. I hope if we are
> able to achieve this that we will reduce the chances of bugs in this
> area.
>
OK, I'll take a look at that email.
> Note: I think the first two patches for tests are not required after
> commit ceb57afd3c.
>
Right. Will remove.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Westermann (DWE) | 2022-03-09 14:15:53 | Re: Changing "Hot Standby" to "hot standby" |
Previous Message | Ashutosh Sharma | 2022-03-09 13:37:32 | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |