From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Date: | 2019-09-19 02:25:23 |
Message-ID: | CAH2-Wzmn97x3JRbmF=2uQrc5ruusuGrpB_eOUSuJfhYOdikS7Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 18, 2019 at 10:43 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I'm currently working on merging my refactored version of
> _bt_dedup_one_page() with your v15 WAL-logging. This is a bit tricky.
> (I have finished merging the other WAL-logging stuff, though -- that
> was easy.)
I attach version 16. This revision merges your recent work on WAL
logging with my recent work on simplifying _bt_dedup_one_page(). See
my e-mail from earlier today for details.
Hopefully this will be a bit easier to work with when you go to make
_bt_dedup_one_page() do raw PageIndexMultiDelete() + PageAddItem()
calls against the page contained in a buffer directly (rather than
using a temp version of the page in local memory in the style of
_bt_split()). I find the loop within _bt_dedup_one_page() much easier
to follow now.
While I'm looking forward to seeing the
PageIndexMultiDelete()/PageAddItem() approach that you come up with,
the basic design of _bt_dedup_one_page() seems to be in much better
shape today than it was a few weeks ago. I am going to spend the next
few days teaching _bt_dedup_one_page() about space utilization. I'll
probably make it respect a fillfactor-style target. I've noticed that
it is often too aggressive about filling a page, though less often it
actually shows the opposite problem: it fails to use more than about
2/3 of the page for the same value, again and again (must be something
to do with the exact width of the tuples). In general,
_bt_dedup_one_page() should know a few things about what nbtsplitloc.c
will do when the page is very likely to be split soon.
I'll also spend some more time working on the opclass infrastructure
that we need to disable deduplication with datatypes where it is
unsafe [1].
Other changes:
* qsort() is no longer used by BTreeFormPostingTuple() in v16 -- we
can easily sorting the array of heap TIDs the caller's responsibility.
Since the heap TID column is sorted in ascending order among
duplicates on a page, and since TIDs within individual posting lists
are also sorted in ascending order, there is no need to resort. I
added a new assertion to BTreeFormPostingTuple() that verifies that
its caller actually gets it right.
* The new nbtpage.c/VACUUM code has been tweaked to minimize the
changes required against master. Nothing significant, though.
It was easier to refactor the _bt_dedup_one_page() stuff by
temporarily making nbtsort.c not use it. I didn't want to delay
getting v16 to you, so I didn't take the time to fix-up nbtsort.c to
use the new stuff. It's actually using its own old copy of stuff that
it should get from nbtinsert.c in v16 -- it calls
_bt_dedup_item_tid_sort(), not the new _bt_dedup_save_htid() function.
I'll update it soon, though.
[1] https://www.postgresql.org/message-id/flat/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ(at)mail(dot)gmail(dot)com
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
v16-0001-Add-deduplication-to-nbtree.patch | application/octet-stream | 138.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexandra Wang | 2019-09-19 02:39:41 | Re: Zedstore - compressed in-core columnar storage |
Previous Message | Amit Langote | 2019-09-19 02:10:16 | Re: pgbench - allow to create partitioned tables |