Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date: 2024-07-23 01:26:11
Message-ID: CAAKRu_b3OEmU2Epx_ER+MjPO9GOwp6pxqUqYytqRhtBG8NRHdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2024 at 6:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Melanie Plageman <melanieplageman(at)gmail(dot)com> writes:
> > We've only run tests with this commit on some of the back branches for
> > some of these animals. Of those, I don't see any failures so far. So,
> > it seems the test instability is just related to trying to get
> > multiple passes of index vacuuming reliably with TIDStore.
>
> > AFAICT, all the 32bit machine failures are timeouts waiting for the
> > standby to catch up (mamba, gull, merswine). Unfortunately, the
> > failures on copperhead (a 64 bit machine) are because we don't
> > actually succeed in triggering a second vacuum pass. This would not be
> > fixed by a longer timeout.
>
> Ouch. This seems to me to raise the importance of getting a better
> way to test multiple-index-vacuum-passes. Peter argued upthread
> that we don't need a better way, but I don't see how that argument
> holds water if copperhead was not reaching it despite being 64-bit.
> (Did you figure out exactly why it doesn't reach the code?)

I wasn't able to reproduce the failure (failing to do > 1 index vacuum
pass) on my local machine (which is 64 bit) without decreasing the
number of tuples inserted. The copperhead failure confuses me because
the speed of the machine should *not* affect how much space the dead
item TIDStore takes up. I would have bet money that the same number
and offsets of dead tuples per page in a relation would take up the
same amount of space in a TIDStore on any 64-bit system -- regardless
of how slowly it runs vacuum.

Here is some background on how I came up with the DDL and tuple count
for the test: TIDStore uses 32 BITS_PER_BITMAPWORD on 32 bit systems
and 64 on 64 bit systems. So, if you only have one bitmapword's worth
of dead items per page, it was easy to figure out that you would need
double the number of pages with dead items to take up the same amount
of TIDStore space on a 32 bit system than on a 64 bit system.

I wanted to figure out how to take up double the amount of TIDStore
space *without* doubling the number of tuples. This is not
straightforward. You can't just delete twice as many dead tuples per
page. For starters, you can compactly represent many dead tuples in a
single bitmapword. Outside of this, there seems to be some effect on
the amount of space the adaptive radix tree takes up if the dead items
on the pages are at the same offsets on all the pages. I thought this
might have to do with being able to use the same chunk (in ART terms)?
I spent some time trying to figure it out, but I gave up once I got
confused enough to try and read the adaptive radix tree paper.

I found myself wishing there was some way to visualize the TIDStore. I
don't have good ideas how to represent this, but if we found one, we
could add a function to the test_tidstore module.

I also think it would be useful to have peak TIDStore usage in bytes
in the vacuum verbose output. I had it on my list to propose something
like this after I hacked together a version myself while trying to
debug the test locally.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-07-23 01:29:51 Re: xid_wraparound tests intermittent failure.
Previous Message Masahiko Sawada 2024-07-23 01:25:28 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin