Re: 011_crash_recovery.pl intermittently fails

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: 011_crash_recovery.pl intermittently fails
Date: 2021-03-05 03:32:21
Message-ID: 1436448.1614915141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> I noticed that 011_crash_recovery.pl intermittently (that being said,
> one out of three or so on my environment) fails in the second test.

Hmmm ... what environment is that? This test script hasn't changed
meaningfully in several years, and we have not seen any real issues
with it up to now.

> If the server crashed before emitting WAL records for the transaction
> just started, the restarted server cannot know the xid is even
> started. I'm not sure that is the intention of the test but we must
> make sure the WAL to be emitted before crashing. CHECKPOINT ensures
> that.

The original commit for this test says

----
commit 857ee8e391ff6654ef9dcc5dd8b658d7709d0a3c
Author: Robert Haas <rhaas(at)postgresql(dot)org>
Date: Fri Mar 24 12:00:53 2017 -0400

Add a txid_status function.

If your connection to the database server is lost while a COMMIT is
in progress, it may be difficult to figure out whether the COMMIT was
successful or not. This function will tell you, provided that you
don't wait too long to ask. It may be useful in other situations,
too.

Craig Ringer, reviewed by Simon Riggs and by me

Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com
----

If the function needs a CHECKPOINT to give a reliable answer,
is it actually good for the claimed purpose?

Independently of that, I doubt that adding a checkpoint call
after the pg_current_xact_id() call is going to help. The
Perl script is able to move on as soon as it's read the
function result. If we need this hack, it has to be put
before that SELECT, AFAICS.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-03-05 03:41:06 Re: 011_crash_recovery.pl intermittently fails
Previous Message Justin Pryzby 2021-03-05 03:13:17 Re: [PATCH] remove deprecated v8.2 containment operators