From: | "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Greg Nancarrow' <gregn4422(at)gmail(dot)com> |
Subject: | RE: Failed transaction statistics to measure the logical replication progress |
Date: | 2022-02-22 01:15:24 |
Message-ID: | OS0PR01MB61138EC4E18C020BE71A1CA1FB3B9@OS0PR01MB6113.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 21, 2022 11:46 AM osumi(dot)takamichi(at)fujitsu(dot)com <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Saturday, February 19, 2022 12:00 AM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > On Friday, February 18, 2022 3:34 PM Tang, Haiying/唐 海英
> > <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> > > On Wed, Jan 12, 2022 8:35 PM osumi(dot)takamichi(at)fujitsu(dot)com
> > > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > > 4) I noticed that the abort_count doesn't include aborted streaming
> > > transactions.
> > > Should we take this case into consideration?
> > Hmm, we can add this into this column, when there's no objection.
> > I'm not sure but someone might say those should be separate columns.
> I've addressed this point in a new v23 patch,
> since there was no opinion on this so far.
>
> Kindly have a look at the attached one.
>
Thanks for updating the patch.
I found a problem when using it. When a replication workers exits, the
transaction stats should be sent to stats collector if they were not sent before
because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats weren't
updated as expected.
I looked into it and found that the replication worker would send the
transaction stats (if any) before it exits. But it got invalid subid in
pgstat_send_subworker_xact_stats(), which led to the following result:
postgres=# select pg_stat_get_subscription_worker(0, null);
pg_stat_get_subscription_worker
---------------------------------
(0,,2,0,0,,,,0,"",)
(1 row)
I think that's because subid has already been cleaned when trying to send the
stats. I printed the value of before_shmem_exit_list, the functions in this list
would be called in shmem_exit() when the worker exits.
logicalrep_worker_onexit() would clean up the worker info (including subid), and
pgstat_shutdown_hook() would send stats if any. logicalrep_worker_onexit() was
called before calling pgstat_shutdown_hook().
(gdb) p before_shmem_exit_list
$1 = {{function = 0xa88f1e <pgstat_shutdown_hook>, arg = 0}, {function = 0xb619e7 <BeforeShmemExit_Files>, arg = 0}, {function = 0xb07b5c <ReplicationSlotShmemExit>, arg = 0}, {
function = 0xabdd93 <logicalrep_worker_onexit>, arg = 0}, {function = 0xe30c89 <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 15 times>}
Maybe we should make some modification to fix it.
Regards,
Tang
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2022-02-22 01:19:48 | remove more archiving overhead |
Previous Message | Michael Paquier | 2022-02-22 01:13:27 | Re: Trap errors from streaming child in pg_basebackup to exit early |