From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date: | 2023-04-07 09:55:41 |
Message-ID: | CAFBsxsGh9dvoJ-Q8Hciku=Rbr_kfLGDSmci6kkoic3FMua8QPw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 16, 2023 at 11:44 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> We really ought to replace the tid bitmap used for bitmap heap scans. The
> hashtable we use is a pretty awful data structure for it. And that's not
> filled in-order, for example.
I spent some time studying tidbitmap.c, and not only does it make sense to
use a radix tree there, but since it has more complex behavior and stricter
runtime requirements, it should really be the thing driving the design and
tradeoffs, not vacuum:
- With lazy expansion and single-value leaves, the root of a radix tree can
point to a single leaf. That might get rid of the need to track TBMStatus,
since setting a single-leaf tree should be cheap.
- Fixed-size PagetableEntry's are pretty large, but the tid compression
scheme used in this thread (in addition to being complex) is not a great
fit for tidbitmap because it makes it more difficult to track per-block
metadata (see also next point). With the "combined pointer-value slots"
technique, if a page's max tid offset is 63 or less, the offsets can be
stored directly in the pointer for the exact case. The lowest bit can tag
to indicate a pointer to a single-value leaf. That would complicate
operations like union/intersection and tracking "needs recheck", but it
would reduce memory use and node-traversal in common cases.
- Managing lossy storage. With pure blocknumber keys, replacing exact
storage for a range of 256 pages amounts to replacing a last-level node
with a single leaf containing one lossy PagetableEntry. The leader could
iterate over the nodes, and rank the last-level nodes by how much storage
they (possibly with leaf children) are using, and come up with an optimal
lossy-conversion plan.
The above would address the points (not including better iteration and
parallel bitmap index scans) raised in
Ironically, by targeting a more difficult use case, it's easier since there
is less freedom. There are many ways to beat a binary search, but fewer
good ways to improve bitmap heap scan. I'd like to put aside vacuum for
some time and try killing two birds with one stone, building upon our work
thus far.
Note: I've moved the CF entry to the next CF, and set to waiting on
author for now. Since no action is currently required from Masahiko, I've
added myself as author as well. If tackling bitmap heap scan shows promise,
we could RWF and resurrect at a later time.
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2023-04-07 10:15:24 | Re: Add index scan progress to pg_stat_progress_vacuum |
Previous Message | Hayato Kuroda (Fujitsu) | 2023-04-07 09:40:14 | RE: [PoC] pg_upgrade: allow to upgrade publisher node |