RE: Failed transaction statistics to measure the logical replication progress

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Greg Nancarrow' <gregn4422(at)gmail(dot)com>
Subject: RE: Failed transaction statistics to measure the logical replication progress
Date: 2022-02-22 01:34:25
Message-ID: TYCPR01MB83739F01E478BED5EDECD0DFED3B9@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, February 22, 2022 10:15 AM Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> On Mon, Feb 21, 2022 11:46 AM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> > On Saturday, February 19, 2022 12:00 AM osumi(dot)takamichi(at)fujitsu(dot)com
> > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > > On Friday, February 18, 2022 3:34 PM Tang, Haiying/唐 海英
> > > <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > On Wed, Jan 12, 2022 8:35 PM osumi(dot)takamichi(at)fujitsu(dot)com
> > > > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > > > 4) I noticed that the abort_count doesn't include aborted
> > > > streaming transactions.
> > > > Should we take this case into consideration?
> > > Hmm, we can add this into this column, when there's no objection.
> > > I'm not sure but someone might say those should be separate columns.
> > I've addressed this point in a new v23 patch, since there was no
> > opinion on this so far.
> >
> > Kindly have a look at the attached one.
> >
>
> Thanks for updating the patch.
>
> I found a problem when using it. When a replication workers exits, the
> transaction stats should be sent to stats collector if they were not sent before
> because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats
> weren't updated as expected.
>
> I looked into it and found that the replication worker would send the transaction
> stats (if any) before it exits. But it got invalid subid in
> pgstat_send_subworker_xact_stats(), which led to the following result:
>
> postgres=# select pg_stat_get_subscription_worker(0, null);
> pg_stat_get_subscription_worker
> ---------------------------------
> (0,,2,0,0,,,,0,"",)
> (1 row)
>
> I think that's because subid has already been cleaned when trying to send the
> stats. I printed the value of before_shmem_exit_list, the functions in this list
> would be called in shmem_exit() when the worker exits.
> logicalrep_worker_onexit() would clean up the worker info (including subid),
> and
> pgstat_shutdown_hook() would send stats if any. logicalrep_worker_onexit()
> was called before calling pgstat_shutdown_hook().
>
> (gdb) p before_shmem_exit_list
> $1 = {{function = 0xa88f1e <pgstat_shutdown_hook>, arg = 0}, {function =
> 0xb619e7 <BeforeShmemExit_Files>, arg = 0}, {function = 0xb07b5c
> <ReplicationSlotShmemExit>, arg = 0}, {
> function = 0xabdd93 <logicalrep_worker_onexit>, arg = 0}, {function =
> 0xe30c89 <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 15
> times>}
>
> Maybe we should make some modification to fix it.
Thank you for letting me know this issue.
I'll investigate this and will report the result.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-02-22 02:04:16 Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)
Previous Message Nathan Bossart 2022-02-22 01:19:48 remove more archiving overhead