From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [WIP] [B-Tree] Retail IndexTuple deletion |
Date: | 2018-07-19 11:29:51 |
Message-ID: | CAD21AoApVGFf3q7WV5FuFzHDRtew9+fHHdCyOkk1uG+XG_6OKw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jul 13, 2018 at 4:00 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Jul 3, 2018 at 5:17 AM, Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Done.
>> Attachment contains an update for use v.2 of the 'Ensure nbtree leaf tuple
>> keys are always unique' patch.
>
> My v3 is still pending, but is now a lot better than v2. There were
> bugs in v2 that were fixed.
>
> One area that might be worth investigating is retail index tuple
> deletion performed within the executor in the event of non-HOT
> updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
> at least in unique index tuples with no NULL values. The idea is that
> MVCC index scans can skip over those if they've already found a
> visible tuple with the same value.
I think that's a good idea. The overhead of marking it as ghost seems
small and it would speed up index scans. If MVCC index scans have
already found a visible tuples with the same value they can not only
skip scanning but also kill them? If can, we can kill index tuples
without checking the heap.
> Also, when there was about to be a
> page split, they could be treated a little bit like LP_DEAD items. Of
> course, the ghost bit would have to be treated as a hint that could be
> "wrong" (e.g. because the transaction hasn't committed yet), so you'd
> have to go to the heap in the context of a page split, to double
> check. Also, you'd need heuristics that let you give up on this
> strategy when it didn't help.
>
> I think that this could work well enough for OLTP workloads, and might
> be more future-proof than doing it in VACUUM. Though, of course, it's
> still very complicated.
Agreed.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2018-07-19 11:35:40 | Re: de-deduplicate code in DML execution hooks in postgres_fdw |
Previous Message | Pavel Stehule | 2018-07-19 11:22:00 | Re: Runtime partition pruning for MergeAppend |