From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
Cc: | Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Date: | 2019-08-20 00:52:05 |
Message-ID: | CAH2-WzkqzaPoELm3seY0fucfF_owAX87yJNwmjkUY3pKJi8N3g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 13, 2019 at 8:45 AM Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> > I still need to think about the exact details of alignment within
> > _bt_insertonpg_in_posting(). I'm worried about boundary cases there. I
> > could be wrong.
> Could you explain more about these cases?
> Now I don't understand the problem.
Maybe there is no problem.
> Thank you for the patch.
> Still, I'd suggest to leave it as a possible future improvement, so that
> it doesn't
> distract us from the original feature.
I don't even think that it's useful work for the future. It's just
nice to be sure that we could support unique index deduplication if it
made sense. Which it doesn't. If I didn't write the patch that
implements deduplication for unique indexes, I might still not realize
that we need the index_compute_xid_horizon_for_tuples() stuff in
certain other places. I'm not serious about it at all, except as a
learning exercise/experiment.
> I added to v6 another related fix for _bt_compress_one_page().
> Previous code was implicitly deleted DEAD items without
> calling index_compute_xid_horizon_for_tuples().
> New code has a check whether DEAD items on the page exist and remove
> them if any.
> Another possible solution is to copy dead items as is from old page to
> the new one,
> but I think it's good to remove dead tuples as fast as possible.
I think that what you've done in v7 is probably the best way to do it.
It's certainly simple, which is appropriate given that we're not
really expecting to see LP_DEAD items within _bt_compress_one_page()
(we just need to be prepared for them).
> > v5 makes _bt_insertonpg_in_posting() prepared to overwrite an
> > existing item if it's an LP_DEAD item that falls in the same TID range
> > (that's _bt_compare()-wise "equal" to an existing tuple, which may or
> > may not be a posting list tuple already). I haven't made this code do
> > something like call index_compute_xid_horizon_for_tuples(), even
> > though that's needed for correctness (i.e. this new code is currently
> > broken in the same way that I mentioned unique index support is
> > broken).
> Is it possible that DEAD tuple to delete was smaller than itup?
I'm not sure what you mean by this. I suppose that it doesn't matter,
since we both prefer the alternative that you came up with anyway.
> > How do you feel about officially calling this deduplication, not
> > compression? I think that it's a more accurate name for the technique.
> I agree.
> Should I rename all related names of functions and variables in the patch?
Please rename them when convenient.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2019-08-20 01:04:18 | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Previous Message | Alexandra Wang | 2019-08-19 23:15:30 | Re: Zedstore - compressed in-core columnar storage |