From: | Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Resetting spilled txn statistics in pg_stat_replication |
Date: | 2020-10-13 07:20:42 |
Message-ID: | CA+fd4k59yKGXVTahoez_G_aARPSNks=uQFK-XtDD_LS9tU22qA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 13 Oct 2020 at 16:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > The original theory I have given above [1] which is an interleaved
> > > > > autovacumm transaction. Let me try to explain in a bit more detail.
> > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test
> > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000)
> > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen
> > > > > in buildfarm will happen when autovacuum transaction happens after 80%
> > > > > or more of the Insert is done.
> > > > >
> > > > > In such a situation we will start decoding 'Insert' first and need to
> > > > > spill multiple times due to the amount of changes (more than threshold
> > > > > logical_decoding_work_mem) and then before we encounter Commit of
> > > > > transaction that performed Insert (and probably some more changes from
> > > > > that transaction) we will encounter a small transaction (autovacuum
> > > > > transaction). The decode of that small transaction will send the
> > > > > stats collected till now which will lead to the problem shown in
> > > > > buildfarm.
> > > >
> > > > That seems a possible scenario.
> > > >
> > > > I think probably this also explains the reason why spill_count
> > > > slightly varied and spill_txns was still 1. The spill_count value
> > > > depends on how much the process spilled out transactions before
> > > > encountering the commit of an autovacuum transaction. Since we have
> > > > the spill statistics per reorder buffer, not per transactions, it's
> > > > possible.
> > > >
> > >
> > > Okay, here is an updated version (changed some comments) of the patch
> > > I posted some time back. What do you think? I have tested this on both
> > > Windows and Linux environments. I think it is a bit tricky to
> > > reproduce the exact scenario so if you are fine we can push this and
> > > check or let me know if you any better idea?
> >
> > I agree to check if the spill_counts and spill_txns are positive.
> >
>
> I am able to reproduce this problem via debugger. Basically, execute
> the Insert mentioned above from one the psql sessions and in
> ExecInsert() stop the execution once 'estate->es_processed > 4000' and
> then from another psql terminal execute some DDL which will be ignored
> but will any try to decode commit. Then perform 'continue' in the
> first session. This will lead to inconsistent stats value depending
> upon at what time DDL is performed.
Thanks!
I'm also able to reproduce this in a similar way and have confirmed
the patch fixes it.
> I'll push the patch as I am more
> confident now.
+1. Let's check how the tests are going to be.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2020-10-13 07:48:04 | Re: [HACKERS] Runtime Partition Pruning |
Previous Message | Amit Kapila | 2020-10-13 07:15:05 | Re: Resetting spilled txn statistics in pg_stat_replication |