Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-10-29 23:48:37
Message-ID: CAH2-WzkrDmfJiinW=T5Ve7=vfjvP0sgc7K1M5PaTFCOPH60CBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 29, 2020 at 3:05 PM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> And some more comments after another round of reading the patch.
>
> 1. Looks like UNIQUE_CHECK_NO_WITH_UNCHANGED is used for HOT updates,
> should we use UNIQUE_CHECK_NO_HOT here? It is better understood like this.

This would probably get me arrested by the tableam police, though.

FWIW the way that that works is still kind of a hack. I think that I
actually need a new boolean flag, rather than overloading the enum
like this.

> 2. You're modifying the table_tuple_update() function on 1311 line of include/access/tableam.h,
> adding modified_attrs_hint. There's a large comment right before it describing parameters,
> I think there should be a note about modified_attrs_hint parameter in that comment, 'cos
> it is referenced from other places in tableam.h and also from backend/access/heap/heapam.c

Okay, makes sense.

> 3. Can you elaborate on the scoring model you're using?
> Why do we expect a score of 25, what's the rationale behind this number?
> And should it be #define-d ?

See my remarks on this from the earlier e-mail.

> 4. heap_compute_xid_horizon_for_tuples contains duplicate logic. Is it possible to avoid this?

Maybe? I think that duplicating code is sometimes the lesser evil.
Like in tuplesort.c, for example. I'm not sure if that's true here,
but it certainly can be true. This is the kind of thing that I usually
only make my mind up about at the last minute. It's a matter of taste.

> 5. In this comment
>
> + * heap_index_batch_check() helper function. Sorts deltids array in the
> + * order needed for useful processing.
>
> perhaps it is better to replace "useful" with more details? Or point to the place
> where "useful processing" is described.

Okay.

> + else if (_bt_keep_natts_fast(rel, state->base, itup) > nkeyatts &&
> + _bt_dedup_save_htid(state, itup))
> + {
> +
> + }
>
> I would rather add a comment, explaining that the empty body of the clause is actually expected.

Okay. Makes sense.

> 7. In the _bt_dedup_delete_finish_pending() you're setting ispromising to false for both
> posting and non-posting tuples. This contradicts comments before function.

The idea is that we can have plain tuples (non-posting list tuples)
that are non-promising when they're duplicates. Because why not?
Somebody might have deleted them (rather than updating them). It is
fine to have an open mind about this possibility despite the fact that
it is close to zero (in the general case). Including these TIDs
doesn't increase the amount of work we do in heapam. Even when we
don't succeed in finding any of the non-dup TIDs as dead (which is
very much the common case), telling heapam about their existence could
help indirectly (which is somewhat more common). This factor alone
could influence which heap pages heapam visits when there is no
concentration of promising tuples on heap pages (since the total
number of TIDs on each block is the tie-breaker condition when
comparing heap blocks with an equal number of promising tuples during
the block group sort in heapam.c). I believe that this approach tends
to result in heapam going after older TIDs when it wouldn't otherwise,
at least in some cases.

You're right, though -- this is still unclear. Actually, I think that
I should move the handling of promising/duplicate tuples into
_bt_dedup_delete_finish_pending(), too (move it from
_bt_dedup_delete_pass()). That would allow me to talk about all of the
TIDs that get added to the deltids array (promising and non-promising)
in one central function. I'll do it that way soon.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-10-30 00:31:32 Re: Consistent error reporting for encryption/decryption in pgcrypto
Previous Message Andres Freund 2020-10-29 23:40:32 Re: contrib/sslinfo cleanup and OpenSSL errorhandling