Quick Links

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date:	2022-02-20 00:12:31
Message-ID:	87697D0B-3A41-4A5B-B9CF-85D989361E27@anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

(On phone, so crappy formatting and no source access)

On February 19, 2022 3:08:41 PM PST, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>On Fri, Feb 18, 2022 at 5:00 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> Another testing strategy occurs to me: we could stress-test the
>> implementation by simulating an environment where the no-cleanup-lock
>> path is hit an unusually large number of times, possibly a fixed
>> percentage of the time (like 1%, 5%), say by making vacuumlazy.c's
>> ConditionalLockBufferForCleanup() call return false randomly. Now that
>> we have lazy_scan_noprune for the no-cleanup-lock path (which is as
>> similar to the regular lazy_scan_prune path as possible), I wouldn't
>> expect this ConditionalLockBufferForCleanup() testing gizmo to be too
>> disruptive.
>
>I tried this out, using the attached patch. It was quite interesting,
>even when run against HEAD. I think that I might have found a bug on
>HEAD, though I'm not really sure.
>
>If you modify the patch to simulate conditions under which
>ConditionalLockBufferForCleanup() fails about 2% of the time, you get
>much better coverage of lazy_scan_noprune/heap_tuple_needs_freeze,
>without it being so aggressive as to make "make check-world" fail --
>which is exactly what I expected. If you are much more aggressive
>about it, and make it 50% instead (which you can get just by using the
>patch as written), then some tests will fail, mostly for reasons that
>aren't surprising or interesting (e.g. plan changes). This is also
>what I'd have guessed would happen.
>
>However, it gets more interesting. One thing that I did not expect to
>happen at all also happened (with the current 50% rate of simulated
>ConditionalLockBufferForCleanup() failure from the patch): if I run
>"make check" from the pg_surgery directory, then the Postgres backend
>gets stuck in an infinite loop inside lazy_scan_prune, which has been
>a symptom of several tricky bugs in the past year (not every time, but
>usually). Specifically, the VACUUM statement launched by the SQL
>command "vacuum freeze htab2;" from the file
>contrib/pg_surgery/sql/heap_surgery.sql, at line 54 leads to this
>misbehavior.

>This is a temp table, which is a choice made by the tests specifically
>because they need to "use a temp table so that vacuum behavior doesn't
>depend on global xmin". This is convenient way of avoiding spurious
>regression tests failures (e.g. from autoanalyze), and relies on the
>GlobalVisTempRels behavior established by Andres' 2020 bugfix commit
>94bc27b5.

We don't have a blocking path for cleanup locks of temporary buffers IIRC (normally not reachable). So I wouldn't be surprised if a cleanup lock failing would cause some odd behavior.

>It's quite possible that this is nothing more than a bug in my
>adversarial gizmo patch -- since I don't think that
>ConditionalLockBufferForCleanup() can ever fail with a temp buffer
>(though even that's not completely clear right now). Even if the
>behavior that I saw does not indicate a bug on HEAD, it still seems
>informative. At the very least, it wouldn't hurt to Assert() that the
>target table isn't a temp table inside lazy_scan_noprune, documenting
>our assumptions around temp tables and
>ConditionalLockBufferForCleanup().

Definitely worth looking into more.

This reminds me of a recent thing I noticed in the aio patch. Spgist can end up busy looping when buffers are locked, instead of blocking. Not actually related, of course.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations at 2022-02-19 23:08:41 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joseph Koshakow	2022-02-20 00:16:54	Re: Fix overflow in DecodeInterval
Previous Message	Justin Pryzby	2022-02-19 23:53:09	Re: set TESTDIR from perl rather than Makefile