Re: pg_get_publication_tables() output duplicate relid

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: pg_get_publication_tables() output duplicate relid
Date: 2021-11-23 03:20:57
Message-ID: CAA4eK1KqzL8h74e5G1o+_NVMfEYyFKy9s0CzCj-C4v_pKork6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 23, 2021 at 7:28 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Mon, Nov 22, 2021 6:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09(at)gmail(dot)com>
> > wrote:
> > >
> > > On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote
> > <amitlangote09(at)gmail(dot)com> wrote:
> > > > > > The problematic case is attaching the partition *after* the
> > > > > > subscriber has already marked the root parent as synced and/or
> > > > > > ready for replication. Refreshing the subscription doesn't help
> > > > > > it discover the newly attached partition, because a
> > > > > > publish_via_partition_root only ever tells about the root
> > > > > > parent, which would be already synced, so the subscriber would think
> > > > > > there's nothing to copy.
> > > > >
> > > > > Okay, I see this could be a problem but I haven't tried to reproduce it.
> > > > >
> > > > > > > Anyway, if this is a problem
> > > > > > > we need to figure the solution for this separately.
> > > > > >
> > > > > > Sure, we might need to do that after all. Though it might be a
> > > > > > good idea to be sure that we won't need to reconsider the fix we
> > > > > > push for the issue(s) being discussed here and elsewhere,
> > > > > > because I suspect that the solution to the problem I mentioned
> > > > > > is likely to involve tweaking pg_publication_tables view output.
> > > >
> > > > I have thought about this problem and I see two possibilities for a
> > > > solution (a) We could provide a new option say 'truncate' (something
> > > > on lines proposed here [1]) which would truncate the table(s) and
> > > > change its status to 'i' in the pg_subscription_rel, this would
> > > > allow the newly added partition to be synced after refresh. This
> > > > could lead to a large copy in such a case.
> > >
> > > Maybe I am missing something about the proposal, though I'd think a
> > > more automatic solution would be better, something that doesn't need
> > > to rely on an unrelated feature.
> > >
> >
> > Agreed, this was more of a workaround for users if we didn't get any automatic
> > solution.
> >
> > > > (b) We could somehow get and store all the partition info from the
> > > > publisher-side on the subscriber-side while initial sync (say in new
> > > > system table pg_subscription_rel_members). Now, after the refresh,
> > > > if this list changes, we can allow to just get the data of that
> > > > particular partition but I guess it would mean that we need to store
> > > > oids of the publisher which might or might not be safe considering
> > > > oids can wraparound before the refresh.
> > > >
> > > > Do you have any other ideas?
> > >
> > > I thought that the idea I had earlier mentioned at [1] may be useful,
> > > which I can see is similar to your idea (b). I also suspect that it
> > > can be implemented without needing a separate catalog and storing
> > > publication-side relation OIDs in the subscription-side catalog,
> > > though maybe I haven't thought hard enough.
> > >
> >
> > The problem with storing this info in pg_subscription_rel as you were
> > describing in your proposal is that currently, we ensure that the same table
> > exists in subscriber and then store the subscriber side table id in that catalog. I
> > am not sure if we can store publisher-side oids in that catalog and if we store
> > then it would be confusing as now it will have info of both publisher-side oids
> > and subscriber-side oids. Isn't that a problem with this approach?
> >
> > One more problem with this overall approach is the risk of OID wraparound.
> > Say between Create Subscription and Alter Subscription ..
> > Refresh, we detach one partition, the oid wraparounds, and we create/reattach
> > another partition which gets the same oid as for the partition which we
> > detached earlier then we won't be able to identify the new partition even after
> > refresh.
>
> If we use this approach, I think maybe we can store the publisher-side's table
> name instead of oid. Because for non-partitioned table, it's possible that user
> drop and create a new table with the same name in which case the oid would be
> changed. And in this case, the existing behavior wouldn't sync the table again.
> So I think it might be better to make the partitioned table's behavior
> consistent with the non-partitioned table which only sync the table's data when
> detect new table name.
>

I think firstly you need to then also store schema info. In the case
of non-partitioned tables, it is derived differently because we remove
the table's entry from pg_publication_rel, so the data won't be sent
to subscriber till the new table is added to the publication. In the
case of partitioned table (especially for the case of
publish_via_partition_root) we won't sync initial data in such a case
(dropped and re-created the partition with the same name) both with
and without your solution but the future data will be streamed.

Another point related to the Attach problem is what if such a
partition before was also part of publication separately, then if we
copy its initial data then that data copy might be considered a
duplicate. Isn't it better to document this case and explain what
users can expect instead of trying to design a solution around this?
Even if we do so the streaming after attach partition problem as
discussed above should be fixed.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-11-23 04:00:34 Re: Sequence's value can be rollback after a crashed recovery.
Previous Message Amit Kapila 2021-11-23 02:42:18 Re: rename SnapBuild* macros in slot.c