From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Resetting spilled txn statistics in pg_stat_replication |
Date: | 2020-10-13 05:54:07 |
Message-ID: | CAA4eK1LgDmuKzFanLDJtbhC18UffWgxZpaDS_ZHYUok91PjgXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> >> It is possible that MAXALIGN stuff is playing a role here and or the
> >> background transaction stuff. I think if we go with the idea of
> >> testing spill_txns and spill_count being positive then the results
> >> will be stable. I'll write a patch for that.
>
> Here's our first failure on a MAXALIGN-8 machine:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08
>
> So this is just plain not stable. It is odd though. I can
> easily think of mechanisms that would cause the WAL volume
> to occasionally be *more* than the "typical" case. What
> would cause it to be *less*, if MAXALIGN is ruled out?
>
The original theory I have given above [1] which is an interleaved
autovacumm transaction. Let me try to explain in a bit more detail.
Say when transaction T-1 is performing Insert ('INSERT INTO stats_test
SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000)
g(i);') a parallel autovacuum transaction occurs. The problem as seen
in buildfarm will happen when autovacuum transaction happens after 80%
or more of the Insert is done.
In such a situation we will start decoding 'Insert' first and need to
spill multiple times due to the amount of changes (more than threshold
logical_decoding_work_mem) and then before we encounter Commit of
transaction that performed Insert (and probably some more changes from
that transaction) we will encounter a small transaction (autovacuum
transaction). The decode of that small transaction will send the
stats collected till now which will lead to the problem shown in
buildfarm.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-10-13 06:18:29 | Re: Resetting spilled txn statistics in pg_stat_replication |
Previous Message | Tom Lane | 2020-10-13 05:35:55 | Re: Resetting spilled txn statistics in pg_stat_replication |