Re: Deleting older versions in unique indexes to avoid page splits

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2021-01-20 03:54:35
Message-ID: CAA4eK1JfqYKJOj--M+KSpRxWjsohyHDQRP4wGSFU_MGVuxDFCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 19, 2021 at 3:03 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Jan 18, 2021 at 1:10 PM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> > I must admit, that it's a bit difficult to understand you here (at least for me).
> >
> > I assume that by "bet" you mean flagged tuple, that we marked as such
> > (should we implement the suggested case).
> > As heapam will give up early in case there are no deletable tuples, why do say,
> > that "bet" is no longer low? At the end, we still decide between page split (and
> > index bloat) vs a beneficial space cleanup.
>
> Well, as I said, there are various ways in which our inferences (say
> the ones in nbtdedup.c) are likely to be wrong. You understand this
> already. For example, obviously if there are two duplicate index
> tuples pointing to the same heap page then it's unlikely that both
> will be deletable, and there is even a fair chance that neither will
> be (for many reasons). I think that it's important to justify why we
> use stuff like that to drive our decisions -- the reasoning really
> matters. It's very much not like the usual optimization problem thing.
> It's a tricky thing to discuss.
>
> I don't assume that I understand all workloads, or how I might
> introduce regressions. It follows that I should be extremely
> conservative about imposing new costs here. It's good that we
> currently know of no workloads that the patch is likely to regress,
> but absence of evidence isn't evidence of absence.
>

The worst cases could be (a) when there is just one such duplicate
(indexval logically unchanged) on the page and that happens to be the
last item and others are new insertions, (b) same as (a) and along
with it lets say there is an open transaction due to which we can't
remove even that duplicate version. Have we checked the overhead or
results by simulating such workloads?

I feel unlike LP_DEAD optimization this new bottom-up scheme can cost
us extra CPU and I/O because there seems to be not much consideration
given to the fact that we might not be able to delete any item (or
very few) due to long-standing open transactions except that we limit
ourselves when we are not able to remove even one tuple from any
particular heap page. Now, say due to open transactions, we are able
to remove very few tuples (for the sake of argument say there is only
'one' such tuple) from the heap page then we will keep on accessing
the heap pages without much benefit. I feel extending the deletion
mechanism based on the number of LP_DEAD items sounds more favorable
than giving preference to duplicate items. Sure, it will give equally
good or better results if there are no long-standing open
transactions.

> I personally will
> never vote for a theoretical risk with only a theoretical benefit. And
> right now that's what the idea of doing bottom-up deletions in more
> marginal cases (the page flag thing) looks like.
>

I don't think we can say that it is purely theoretical because I have
dome shown some basic computation where it can lead to fewer splits.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-01-20 04:10:28 Re: Stronger safeguard for archive recovery not to miss data
Previous Message Masahiro Ikeda 2021-01-20 03:48:27 Re: Add statistics to pg_stat_wal view for wal related parameter tuning