From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: xid_wraparound tests intermittent failure. |
Date: | 2024-07-23 22:59:28 |
Message-ID: | CAD21AoCGExmPt8XTfyuF2433cPYvGCLgR5wvQOA1Wzw6jBkEng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 23, 2024 at 3:49 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 2024-07-22 Mo 9:29 PM, Masahiko Sawada wrote:
>
> On Mon, Jul 22, 2024 at 12:53 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 2024-07-22 Mo 12:46 PM, Tom Lane wrote:
>
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> writes:
>
> Looking at dodo's failures, it seems that while it passes
> module-xid_wraparound-check, all failures happened only during
> testmodules-install-check-C. Can we check the server logs written
> during xid_wraparound test in testmodules-install-check-C?
>
> Oooh, that is indeed an interesting observation. There are enough
> examples now that it's hard to dismiss it as chance, but why would
> the two runs be different?
>
>
> It's not deterministic.
>
> I tested the theory that it was some other concurrent tests causing the issue, but that didn't wash. Here's what I did:
>
> for f in `seq 1 100`
> do echo iteration = $f
> meson test --suite xid_wraparound || break
> done
>
> It took until iteration 6 to get an error. I don't think my Ubuntu instance is especially slow. e.g. "meson compile" normally takes a handful of seconds. Maybe concurrent tests make it more likely, but they can't be the only cause.
>
> Could you provide server logs in both OK and NG tests? I want to see
> if there's a difference in the rate at which tables are vacuumed.
>
>
> See <https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2>
>
>
> The failure logs are from a run where both tests 1 and 2 failed.
>
Thank you for sharing the logs.
I think that the problem seems to match what Alexander Lakhin
mentioned[1]. Probably we can fix such a race condition somehow but
I'm not sure it's worth it as setting autovacuum = off and
autovacuum_max_workers = 1 (or a low number) is an extremely rare
case. I think it would be better to stabilize these tests. One idea is
to turn the autovacuum GUC parameter on while setting
autovacuum_enabled = off for each table. That way, we can ensure that
autovacuum workers are launched. And I think it seems to align real
use cases.
Regards,
[1] https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Sergey Prokhorenko | 2024-07-23 23:09:48 | Re: UUID v7 |
Previous Message | Masahiko Sawada | 2024-07-23 22:39:49 | Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |