Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date: 2024-07-23 02:53:24
Message-ID: CAD21AoDpN=9W5GVYbkDqNBX0LrDPCWotVFs4wZvXxzJLV7SVxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2024 at 6:26 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Mon, Jul 22, 2024 at 6:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Melanie Plageman <melanieplageman(at)gmail(dot)com> writes:
> > > We've only run tests with this commit on some of the back branches for
> > > some of these animals. Of those, I don't see any failures so far. So,
> > > it seems the test instability is just related to trying to get
> > > multiple passes of index vacuuming reliably with TIDStore.
> >
> > > AFAICT, all the 32bit machine failures are timeouts waiting for the
> > > standby to catch up (mamba, gull, merswine). Unfortunately, the
> > > failures on copperhead (a 64 bit machine) are because we don't
> > > actually succeed in triggering a second vacuum pass. This would not be
> > > fixed by a longer timeout.
> >
> > Ouch. This seems to me to raise the importance of getting a better
> > way to test multiple-index-vacuum-passes. Peter argued upthread
> > that we don't need a better way, but I don't see how that argument
> > holds water if copperhead was not reaching it despite being 64-bit.
> > (Did you figure out exactly why it doesn't reach the code?)
>
> I wasn't able to reproduce the failure (failing to do > 1 index vacuum
> pass) on my local machine (which is 64 bit) without decreasing the
> number of tuples inserted. The copperhead failure confuses me because
> the speed of the machine should *not* affect how much space the dead
> item TIDStore takes up. I would have bet money that the same number
> and offsets of dead tuples per page in a relation would take up the
> same amount of space in a TIDStore on any 64-bit system -- regardless
> of how slowly it runs vacuum.

Looking at copperhead's failure logs, I could not find that "VACUUM
(VERBOSE, FREEZE) vac_horizon_floor_table;" wrote the number of index
scans in logs. Is there any clue that made you think the test failed
to do multiple index vacuum passes?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message . 2024-07-23 02:54:38 Add new fielids only at last
Previous Message Richard Guo 2024-07-23 02:45:41 Re: Redundant code in create_gather_merge_path