Re: Set visibility map bit after HOT prune

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Set visibility map bit after HOT prune
Date: 2012-12-19 15:56:57
Message-ID: CABOikdPf-NQ=hi2iKt-WL7HaP7T6G6vcT2xpL=6gbbDe3zn7Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 19, 2012 at 8:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On the other hand, the HOT prune operation itself is worthless when
>> we're running a SELECT. The only reason we do it that way is that we
>> have to prune before the query starts to use the page, else pruning
>> might invalidate pointers-to-tuples that are being held within the
>> query plan tree.
>>
>> Maybe it's time to look at what it'd take for the low-level scan
>> operations to know whether they're scanning the target relation of
>> an UPDATE query, so that we could skip pruning altogether except
>> when a HOT update could conceivably ensue. I think this was discussed
>> back when HOT went in, but nobody wanted to make the patch more invasive
>> than it had to be.
>
> I think it's wrong to assume that HOT pruning has no value except in
> this case. Truncating dead tuples to line pointers and collapsing HOT
> chains speeds up future page scans, and if we were able to set the
> all-visible bit, that would help even more.

Good point.

> The problem is that this
> is all somewhat prospective: HOT pruning the page doesn't help the
> *current* scan - in fact, it can sometimes slow it down considerably -
> but it can be a great help to the next scan that comes through. We
> say, oh, don't worry, VACUUM will take care of it, but there are
> plenty of cases where a page can be scanned a very large number of
> times before VACUUM comes along; and you do can lose a lot of
> performance in those cases.
>

Also, since we discount for number of tuples pruned by HOT pruning
while tracking number of dead tuples in a table, in a perfectly stable
system, autovacuum may not ever pick the table for vacuuming, slowly
stopping index-only scans from working. Soon we will have a situation
when all VM bits are clear, but autovacuum would fail to pick the
table. Tom had a good suggestion to periodically count vm bits to
choose tables for vacuuming even if there are no dead tuples or dead
line pointers to remove. I'm not sure though if the extra vacuum will
be better than setting the bit after HOT prune. Also, deciding when to
count the bits can be tricky. Do it every vacuum cycle ? Or after
every 5/10 cycles ? I don't have the answer.

> That having been said, I agree with the concerns expressed elsewhere
> in this thread that setting the visibility map bit too aggressively
> will be a waste. If the page is about to get dirtied again we surely
> don't want to go there.

Yeah, I agree. If we could figure out that we are soon going to UPDATE
a tuple in the page again, it will be worthless to set the bit. But
predicting that also could turn out to be tricky. Even if we could
somehow tell that the scan is happening on the result relation of an
UPDATE operation, not every page may receive updates because of where
quals etc. So we may get lots of false positives.

> Aside from the obvious problem of doing work
> that may not be necessary, it figures to create buffer-lock contention
> on the visibility map page. One of the strengths of the current
> design is that we avoid that pretty effectively.
>

Its a valid concern, though my limited pgbench tests did not show any
drop in the number. But thats hardly any proof. We can possibly
mitigate this by conditional update to the VM bit. Do it only if you
get a conditional exclusive lock on the buffer page.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-12-19 16:00:19 ThisTimeLineID in checkpointer and bgwriter processes
Previous Message Andres Freund 2012-12-19 15:56:46 Re: [ADMIN] Problems with enums after pg_upgrade