From: | Hannu Krosing <hannuk(at)google(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jan Wieck <jan(at)wi3ck(dot)info>, Gregory Smith <gregsmithpgsql(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: The Free Space Map: Problems and Opportunities |
Date: | 2021-09-07 12:24:56 |
Message-ID: | CAMT0RQRjOqFgPakoH3pCV_=NvXfKFniOo8FoB5_S9XnmzNfR0Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 7, 2021 at 2:29 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Sep 6, 2021 at 4:33 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > When I have been thinking of this type of problem it seems that the
> > latest -- and correct :) -- place which should do all kinds of
> > cleanup like removing aborted tuples, freezing committed tuples and
> > setting any needed hint bits would be background writer or CHECKPOINT.
> >
> > This would be more PostgreSQL-like, as it moves any work not
> > immediately needed from the critical path, as an extension of how MVCC
> > for PostgreSQL works in general.
>
> I think it depends. There is no need to do work in the background
> here, with TPC-C. With my patch series each backend can know that it
> just had an aborted transaction that affected a page that it more or
> less still owns. And has very close at hand, for further inserts. It's
> very easy to piggy-back the work once you have that sense of ownership
> of newly allocated heap pages by individual backends/transactions.
Are you speaking of just heap pages here or also index pages ?
It seems indeed easy for heap, but for index pages can be mixed up by
other parallel work, especially things like Serial Primary Keys .
Or are you expecting these to be kept in good-enoug shape by your
earlier index manager work ?
> > This would be more PostgreSQL-like, as it moves any work not
> > immediately needed from the critical path, as an extension of how MVCC
> > for PostgreSQL works in general.
>
> I think that it also makes sense to have what I've called "eager
> physical rollback" that runs in the background, as you suggest.
>
> I'm thinking of a specialized form of VACUUM that targets a specific
> aborted transaction's known-dirtied pages. That's my long term goal,
> actually. Originally I wanted to do this as a way of getting rid of
> SLRUs and tuple freezing, by representing that all heap pages must
> only have committed tuples implicitly. That seemed like a good enough
> reason to split VACUUM into specialized "eager physical rollback
> following abort" and "garbage collection" variants.
>
> The insight that making abort-related cleanup special will help free
> space management is totally new to me -- it emerged from working
> directly on this benchmark. But it nicely complements some of my
> existing ideas about improving VACUUM.
A minimal useful patch emerging from that understanding could be
something which just adds hysteresis to FSM management. (TBH, I
actually kind of expected some hysteresis to be there already, as it
is in my mental model of "how things should be done" for managing
almost any resource :) )
Adding hysteresis to FSM management can hopefully be done independent
of all the other stuff and also seems to be something that is
unobtrusive and non-controversial enough to fit in current release and
possibly be even back-ported .
> > But doing it as part of checkpoint probably ends up with less WAL
> > writes in the end.
>
> I don't think that checkpoints are special in any way. They're very
> important in determining the total number of FPIs we'll generate, and
> so have huge importance today. But that seems accidental to me.
I did not mean CHECKPOINT as a command, but more the concept of
writing back / un-dirtying the page. In this sense it *is* special
because it is the last point in time where you are guaranteed to have
the page available in buffercache and thus cheap to access for
modifications plus you will avoid a second full-page writeback because
of cleanup. Also you do not want to postpone the cleanup to actual
page eviction, as that is usually in the critical path for some user
query or command.
Of course this becomes most important for workloads where the active
working set is larger than fits in memory and this is not a typical
case for OLTP any more. But at least freezing the page before
write-out could have a very big impact on the need to freeze "old"
pages available only on disk and thus would be a cheap way to improve
the problems around running out of transaction ids.
> > There could be a possibility to do a small amount of cleanup -- enough
> > for TPC-C-like workloads, but not larger ones -- while waiting for the
> > next command to arrive from the client over the network. This of
> > course assumes that we will not improve our feeder mechanism to have
> > back-to-back incoming commands, which can already be done today, but
> > which I have seen seldom used.
>
> That's what I meant, really. Doing the work of cleaning up a heap page
> that a transaction inserts into (say pruning away aborted tuples or
> setting hint bits) should ideally happen right after commit or abort
> -- at least for OLTP like workloads, which are the common case for
> Postgres. This cleanup doesn't have to be done by exactly the same
> transactions (and can't be in most interesting cases). It should be
> quite possible for the work to be done by approximately the same
> transaction, though -- the current transaction cleans up inserts made
> by the previous (now committed/aborted) transaction in the same
> backend (for the same table).
Again, do I assume correctly that you are here mainly targeting heap
and not indexes ?
> The work of setting hint bits and pruning-away aborted heap tuples has
> to be treated as a logical part of the cost of inserting heap tuples
> -- backends pay this cost directly. At least with workloads where
> transactions naturally only insert a handful of rows each in almost
> all cases -- very much the common case.
I know that in general we are very reluctant to add threads to
postgresql, but this looks a very valid case for having a
"microthread" running on the same core as the DML as it will share all
the CPU caches and can thus be really cheap without being actually in
the critical path.
Cheers,
-----
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2021-09-07 12:42:28 | Re: Diagnostic comment in LogicalIncreaseXminForSlot |
Previous Message | Daniel Gustafsson | 2021-09-07 12:04:23 | Re: OpenSSL 3.0.0 compatibility |