Re: [Patch] Optimize dropping of relation buffers using dlist

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-08-07 14:39:21
Message-ID: CA+TgmoaHiGhgG++vV_7citMRi_hyGJJoVGytyq85-WE8B4LfUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 7, 2020 at 12:03 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, there is no room for "good enough" here. If a dirty buffer remains
> in the system, the checkpointer will eventually try to flush it, and fail
> (because there's no file to write it to), and then checkpointing will be
> stuck. So we cannot afford to risk missing any buffers.

This comment suggests another possible approach to the problem, which
is to just make a note someplace in shared memory when we drop a
relation. If we later find any of its buffers, we drop them without
writing them out. This is not altogether simple, because (1) we don't
have infinite room in shared memory to accumulate such notes and (2)
it's not impossible for the OID counter to wrap around and permit the
creation of a new relation with the same OID, which would be a problem
if the previous note is still around.

But this might be solvable. Suppose we create a shared hash table
keyed by <dboid, reload> with room for 1 entry per 1000 shared
buffers. When you drop a relation, you insert into the hash table.
Periodically you "clean" the hash table by marking all the entries,
scanning shared buffers to remove any matches, and then deleting all
the marked entries. This should be done periodically in the
background, but if you try to drop a relation and find the hash table
full, or you try to create a relation and find the OID of your new
relation in the hash table, then you have to clean synchronously.

Right now, the cost of dropping K relation when N shared buffers is
O(KN). But with this approach, you only have to incur the O(N)
overhead of scanning shared_buffers when the hash table fills up, and
the hash table size is proportional to N, so the amortized complexity
is O(K); that is, dropping relations takes time proportional to the
number of relations being dropped, but NOT proportional to the size of
shared_buffers, because as shared_buffers grows the hash table gets
proportionally bigger, so that scans don't need to be done as
frequently.

Andres's approach (retail hash table lookups just for blocks less than
the relation size, rather than a full scan) is going to help most with
small relations, whereas this approach helps with relations of any
size, but if you're trying to drop a lot of relations, they're
probably small, and if they are large, scanning shared buffers may not
be the dominant cost, since unlinking the files also takes time. Also,
this approach might turn out to slow down buffer eviction too much.
That could maybe be mitigated by having some kind of cheap fast-path
that gets used when the hash table is empty (like an atomic flag that
indicates whether a hash table probe is needed), and then trying hard
to keep it empty most of the time (e.g. by aggressive background
cleaning, or by ruling that after some number of hash table lookups
the next process to evict a buffer is forced to perform a cleanup).
But you'd probably have to try it to see how well you can do.

It's also possible to combine the two approaches. Small relations
could use Andres's approach while larger ones could use this approach;
or you could insert both large and small relations into this hash
table but use different strategies for cleaning out shared_buffers
depending on the relation size (which could also be preserved in the
hash table).

Just brainstorming here...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-07 14:44:35 Re: Handing off SLRU fsyncs to the checkpointer
Previous Message Robert Haas 2020-08-07 13:52:59 Re: Parallel worker hangs while handling errors.