From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Date: | 2019-09-13 01:04:25 |
Message-ID: | CAH2-WzkjuaM7_aFrfrbPrmow4jakeQmQ=mrntKw_aA9OVvcsRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 11, 2019 at 2:04 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I think that the new WAL record has to be created once per posting
> list that is generated, not once per page that is deduplicated --
> that's the only way that I can see that avoids a huge increase in
> total WAL volume. Even if we assume that I am wrong about there being
> value in making deduplication incremental, it is still necessary to
> make the WAL-logging behave incrementally.
Attached is v13 of the patch, which shows what I mean. You could say
that v13 makes _bt_dedup_one_page() do a few extra things that are
kind of similar to the things that nbtsplitloc.c does for _bt_split().
More specifically, the v13-0001-* patch includes code that makes
_bt_dedup_one_page() "goal orientated" -- it calculates how much space
will be freed when _bt_dedup_one_page() goes on to deduplicate those
items on the page that it has already "decided to deduplicate". The
v13-0002-* patch makes _bt_dedup_one_page() actually use this ability
-- it makes _bt_dedup_one_page() give up on deduplication when it is
clear that the items that are already "pending deduplication" will
free enough space for its caller to at least avoid a page split. This
revision of the patch doesn't truly make deduplication incremental. It
is only a proof of concept that shows how _bt_dedup_one_page() can
*decide* that it will free "enough" space, whatever that may mean, so
that it can finish early. The task of making _bt_dedup_one_page()
actually avoid lots of work when it finishes early remains.
As I said yesterday, I'm not asking you to accept that v13-0002-* is
an improvement. At least not yet. In fact, "finishes early" due to the
v13-0002-* logic clearly makes everything a lot slower, since
_bt_dedup_one_page() will "thrash" even more than earlier versions of
the patch. This is especially problematic with WAL-logged relations --
the test case that I shared yesterday goes from about 6GB to 10GB with
v13-0002-* applied. But we need to fundamentally rethink the approach
to the rewriting + WAL-logging by _bt_dedup_one_page() anyway. (Note
that total index space utilization is barely affected by the
v13-0002-* patch, so clearly that much works well.)
Other changes:
* Small tweaks to amcheck (nothing interesting, really).
* Small tweaks to the _bt_killitems() stuff.
* Moved all of the deduplication helper functions to nbtinsert.c. This
is where deduplication gets complicated, so I think that it should all
live there. (i.e. nbtsort.c will call nbtinsert.c code, never the
other way around.)
Note that I haven't merged any of the changes from v12 of the patch
from yesterday. I didn't merge the posting list WAL logging changes
because of the bug I reported, but I would have were it not for that.
The WAL logging for _bt_dedup_one_page() added to v12 didn't appear to
be more efficient than your original approach (i.e. calling
log_newpage_buffer()), so I have stuck with your original approach.
It would be good to hear your thoughts on this _bt_dedup_one_page()
WAL volume/"write amplification" issue.
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
v13-0002-Stop-deduplicating-when-a-page-split-is-avoided.patch | application/octet-stream | 1.8 KB |
v13-0003-DEBUG-Add-pageinspect-instrumentation.patch | application/octet-stream | 8.6 KB |
v13-0001-Add-deduplication-to-nbtree.patch | application/octet-stream | 119.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-09-13 03:17:46 | Re: psql - improve test coverage from 41% to 88% |
Previous Message | Tsunakawa, Takayuki | 2019-09-13 00:21:15 | [bug fix??] Fishy code in tts_cirtual_copyslot() |