Re: Eliminating PD_ALL_VISIBLE, take 2

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robins <robins(at)pobox(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Eliminating PD_ALL_VISIBLE, take 2
Date: 2013-07-15 17:41:26
Message-ID: 1373910086.14172.17.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2013-07-14 at 23:06 -0400, Robert Haas wrote:
> > Of course, there's a reason that PD_ALL_VISIBLE is not like a normal
> > hint: we need to make sure that inserts/updates/deletes clear the VM
> > bit. But my patch already addresses that by keeping the VM page pinned.
>
> I'm of the opinion that we ought to extract the parts of the patch
> that hold the VM pin for longer, review those separately, and if
> they're good and desirable, apply them.

I'm confused. My patch holds a VM page pinned for those cases where
PD_ALL_VISIBLE is currently used -- scans or insert/update/delete. If we
have PD_ALL_VISIBLE, there's no point in the cache, right?

> I am not convinced. I thought about the problem of repeatedly
> switching pinned VM pages during the index-only scans work, and
> decided that we could live with it because, if the table was large
> enough that we were pinning VM pages frequently, we were also avoiding
> I/O. Of course, this is a logical fallacy, since the table could
> easily be large enough to have quite a few VM pages and yet small
> enough to fit in RAM. And, indeed, at least in the early days, an
> index scan could beat out an index-only scan by a significant margin
> on a memory-resident table, precisely because of the added cost of the
> VM lookups. I haven't benchmarked lately so I don't know for sure
> whether that's still the case, but I bet it is.

To check visibility from an index scan, you either need to pin a heap
page or a VM page. Why would checking the heap page be cheaper? Is it
just other code in the VM-testing path that's slower? Or is there
concurrency involved in your measurements which may indicate contention
over VM pages?

> I think this idea is worth exploring, although I fear the overhead is
> likely to be rather large. We could find out, though. Suppose we
> simply change XLOG_HEAP2_VISIBLE to emit FPIs for the heap pages; how
> much does that slow down vacuuming a large table into which many pages
> have been bulk loaded? Sadly, I bet it's rather a lot, but I'd like
> to be wrong.

My point was that, if freezing needs to emit an FPI anyway, and we
combine freezing and PD_ALL_VISIBLE, then using WAL properly wouldn't
cost us anything. Whether that makes sense depends on what other
combination of proposals makes it in, of course. I agree that we don't
want to start adding FPIs unnecessarily.

Anyway, thanks for the feedback. Moved out of this 'fest.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2013-07-15 18:21:59 Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])
Previous Message Robert Haas 2013-07-15 17:32:23 Re: mvcc catalo gsnapshots and TopTransactionContext