Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)
Date: 2017-07-31 17:54:02
Message-ID: 20170731175402.GB19829@marmot
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> I really don't know if that would be worthwhile. It would certainly fix
>> the regression shown in my test case, but that might not go far enough.
>> I strongly suspect that there are more complicated workloads where
>> LP_DEAD cleanup from SELECT statements matters, which is prevented by
>> the LSN check thing, just because there are always other sessions that
>> modify the page concurrently. This might be true of Alik's Zipfian test
>> case, for example.
>
>I haven't studied the test case, but I think as a general principle it
>makes sense to be happy when someone comes up with an algorithm that
>holds a lock for a shorter period of time (and buffer pins are a type
>of lock). There are a number of places (fast-path locking, for
>example, or vacuum skipping pinned heap pages) where we have
>fast-paths that get taken most of the time and slow paths that get
>used when concurrent activity happens; empirically, such things often
>work out to a win. I think it's disturbing that this code seems to be
>taking the slow-path (which, in context, means skipping LP_DEAD
>cleanup) even there is no concurrent activity. That's hard to
>justify.

That is hard to justify. I don't think that failing to set LP_DEAD hints
is the cost that must be paid to realize a benefit elsewhere, though. I
don't see much problem with having both benefits consistently. It's
actually very unlikely that VACUUM will run, and a TID will be recycled
at exactly the wrong time. We could probably come up with a more
discriminating way of detecting that that may have happened, at least
for Postgres 11. We'd continue to use LSN; the slow path would be taken
when the LSN changed, but we do not give up on setting LP_DEAD bits. I
think we can justify going to the heap again in this slow path, if
that's what it takes.

>But the fact that it is taking the slow-path when there *is*
>concurrent activity is harder to complain about. That might win or it
>might lose; the non-concurrent case only loses.

Let's wait to see what difference it makes if Alik's zipfian
distribution pgbench test case uses unlogged tables. That may gives us a
good sense of the problem for cases with contention/concurrency.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-07-31 18:43:10 Re: Transactions involving multiple postgres foreign servers
Previous Message Tom Lane 2017-07-31 17:53:13 Re: PL_stashcache, or, what's our minimum Perl version?