Re: Resetting spilled txn statistics in pg_stat_replication

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-10-13 14:27:15
Message-ID: 3436520.1602599235@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> I am able to reproduce this problem via debugger. Basically, execute
> the Insert mentioned above from one the psql sessions and in
> ExecInsert() stop the execution once 'estate->es_processed > 4000' and
> then from another psql terminal execute some DDL which will be ignored
> but will any try to decode commit. Then perform 'continue' in the
> first session. This will lead to inconsistent stats value depending
> upon at what time DDL is performed. I'll push the patch as I am more
> confident now.

So ... doesn't this mean that if the concurrent transaction commits very
shortly after our query starts, decoding might stop without having ever
spilled at all? IOW, I'm afraid that the revised test can still fail,
just at a frequency circa one-twelfth of before.

I'm also somewhat suspicious of this explanation because it doesn't
seem to account for the clear experimental evidence that 32-bit machines
were more prone to failure than 64-bit.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-10-13 14:54:42 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Hamid Akhtar 2020-10-13 13:59:21 Re: Remove unnecessary else branch