From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Eagerly scan all-visible pages to amortize aggressive vacuum |
Date: | 2025-02-11 16:06:27 |
Message-ID: | kapu6h6imfbqi3lrokbzynrm5mt4siih5lkzv6az4iiznlworu@brthxdrvxyvw |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-02-10 14:30:15 -0500, Robert Haas wrote:
> On Wed, Feb 5, 2025 at 12:23 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I started getting worried thinking about this. If you have a cursor
> > for select * from a table and fetch forward far enough, couldn't
> > vacuum fail to get cleanup locks on a whole range of blocks?
>
> I don't think so. A given scan holds at most 1 heap pin at a time, so
> I don't see how a cursor doing a FETCH FORWARD would make anything
> happen that wouldn't otherwise. I do think it's theoretically possible
> that a running query and a running VACUUM could be going at exactly
> the same speed so that the VACUUM always tries to pin every page at
> the exact same moment the SELECT has it pinned, but in practice I
> think it is very unlikely that the timing will work out like that.
> Like win-the-lottery kind of unlikely.
I think the most common way to get a range of pages pinned is to have a bunch
of backends doing index nested loop joins to a fairly narrow value range in a
table. You can sometimes see (via pg_buffercache) that there are some pages
that are rather heavily pinned, but just about no other page in the table is.
> > Yea, I guess that means the freeze limit caps wasted read I/O -- but
> > you could end up doing no read I/O and still hitting the freeze fail
> > limit because it is trying to detect when data is too new to be worth
> > eager scanning. But that's probably the behavior we want.
>
> The too-new case is very scary, I think. It's already a problem that
> when there are long-running transactions in play, autovacuum commands
> a VACUUM which finds nothing it can usefully clean up. It would be
> very valuable for someone to figure out a way to prevent that, but the
> relevance to this patch is that we need to try hard to avoid making
> that problem substantially worse than it is already. Even if the whole
> database is memory-resident, the too-new case allows for wasting a
> bunch of effort that we would rather not waste; the cap here tries to
> make sure that isn't any worse than it needs to be.
IME the most common way for this issue is anti-wraparound vacuums that can't
actually clean up anything, due to some old transaction. We'll just fire those
off over and over again, without taking into account that we still can't make
progress.
That scenario doesn't apply the same way to this patch, because by the time
you get to an anti-wrap vacuum, this patch doesn't do anything.
Of course it's possible that the dead tuple trigger causes repeated vacuums
that each can't clean up anything, but that's much less common IME.
> > > > Interestingly, we call heap_tuple_should_freeze() in
> > > > lazy_scan_noprune(), so we actually could tell whether or not there
> > > > are some tuples on the page old enough to trigger freezing if we had
> > > > gotten the cleanup lock. One option would be to add a new output
> > > > parameter to lazy_scan_noprune() that indicates whether or not there
> > > > were tuples with xids older than the FreezeLimit, and only if there
> > > > were not, count it as a failed eager scan.
> > >
> > > Yeah, that could be done, and it doesn't sound like a bad idea.
> > > However, I'm also unsure whether we need to add additional logic for
> > > this case. My intuition is that it just won't happen very often. I'm
> > > not quite confident that I'm correct about that, but if I had to
> > > guess, my guess would be "rare scenario".
> >
> > I've not done this in the attached v16. I have added a comment about it.
> > I think not doing it is a judgment call and not a bug, right?
>
> As of this writing, I do not believe that this is an essential part of
> this patch, but it is always possible that with the benefit of
> hindsight it will seem otherwise. That's just life, though. If I knew
> in advance which of my decisions would later seem like mistakes, I
> would make very few mistakes.
I agree that we can disregard this for now.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-02-11 16:07:48 | Re: Proposal to CREATE FOREIGN TABLE LIKE |
Previous Message | Jelte Fennema-Nio | 2025-02-11 16:00:36 | Re: RFC: Allow EXPLAIN to Output Page Fault Information |