Re: Collect statistics about conflicts in logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Collect statistics about conflicts in logical replication
Date: 2024-08-28 06:13:15
Message-ID: CAJpy0uD9WE7iScjxTqcGAza+P6mM9vZ5F9TeU+uG0Pb=8bvhtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 27, 2024 at 3:21 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, August 27, 2024 10:59 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > ~~~
> >
> > 3.
> > +# Enable track_commit_timestamp to detect origin-differ conflicts in
> > +logical # replication. Reduce wal_retrieve_retry_interval to 1ms to
> > +accelerate the # restart of the logical replication worker after encountering a
> > conflict.
> > +$node_subscriber->append_conf(
> > + 'postgresql.conf', q{
> > +track_commit_timestamp = on
> > +wal_retrieve_retry_interval = 1ms
> > +});
> >
> > Later, after CDR resolvers are implemented, it might be good to revisit these
> > conflict test cases and re-write them to use some conflict resolvers like 'skip'.
> > Then the subscriber won't give ERRORs and restart apply workers all the time
> > behind the scenes, so you won't need the above configuration for accelerating
> > the worker restarts. In other words, running these tests might be more efficient
> > if you can avoid restarting workers all the time.
> >
> > I suggest putting an XXX comment here as a reminder that these tests should
> > be revisited to make use of conflict resolvers in the future.
>
> I think it would be too early to mention the resolution implementation detail
> in the comments considering that the resolution is still not RFC. Also, I think
> reducing wal_retrieve_retry_interval is a reasonable way to speed up the test
> in this case because the test is not letting the worker to restart all the time, the
> error causes the restart will be resolved immediately after the stats check. So, I
> think adding XXX is not very appropriate.
>
> Other comments look good to me.
> I slightly adjusted few words and merged the changes. Thanks !
>
> Here is V3 patch.
>

Thanks for the patch. Just thinking out loud, since we have names like
'apply_error_count', 'sync_error_count' which tells that they are
actually error-count, will it be better to have something similar in
conflict-count cases, like 'insert_exists_conflict_count',
'delete_missing_conflict_count' and so on. Thoughts?

I noticed that now we do mention this (as I suggested earlier):
+ Note that any conflict resulting in an apply error will be counted
in both apply_error_count and the corresponding conflict count.

But we do not mention clearly which ones are conflict-counts. As an
example, we have this:

+ insert_exists_count bigint:
+ Number of times a row insertion violated a NOT DEFERRABLE unique
constraint during the application of changes

It does not mention that it is a conflict count. So we need to either
change names or mention clearly against each that it is a conflict
count.

thanks
sHveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2024-08-28 06:14:35 Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
Previous Message Amit Kapila 2024-08-28 06:06:19 Re: Pgoutput not capturing the generated columns