Re: Improve eviction algorithm in ReorderBuffer

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Improve eviction algorithm in ReorderBuffer
Date: 2024-02-02 06:42:28
Message-ID: CAD21AoB9+EMZbZx+Jx1RaJcDeyL+rArOsnja2k=DEuzzyfG32Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 31, 2024 at 2:18 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> I have started to read your patches. Here are my initial comments.
> At least, all subscription tests were passed on my env.

Thank you for the review comments!

>
> A comment for 0001:
>
> 01.
> ```
> +static void
> +bh_enlarge_node_array(binaryheap *heap)
> +{
> + if (heap->bh_size < heap->bh_space)
> + return;
> +
> + heap->bh_space *= 2;
> + heap->bh_nodes = repalloc(heap->bh_nodes,
> + sizeof(bh_node_type) * heap->bh_space);
> +}
> ```
>
> I'm not sure it is OK to use repalloc() for enlarging bh_nodes. This data
> structure public one and arbitrary codes and extensions can directly refer
> bh_nodes. But if the array is repalloc()'d, the pointer would be updated so that
> their referring would be a dangling pointer.

Hmm I'm not sure this is the case that we need to really worry about,
and cannot come up with a good use case where extensions refer to
bh_nodes directly rather than binaryheap. In PostgreSQL codes, many
Nodes already have pointers and are exposed.

> I think the internal of the structure should be a private one in this case.
>
> Comments for 0002:
>
> 02.
> ```
> +#include "utils/palloc.h"
> ```
>
> Is it really needed? I'm not sure who referrs it.

Seems not, will remove.

>
> 03.
> ```
> typedef struct bh_nodeidx_entry
> {
> bh_node_type key;
> char status;
> int idx;
> } bh_nodeidx_entry;
> ```
>
> Sorry if it is a stupid question. Can you tell me how "status" is used?
> None of binaryheap and reorderbuffer components refer it.

It's required by simplehash.h

>
> 04.
> ```
> extern binaryheap *binaryheap_allocate(int capacity,
> binaryheap_comparator compare,
> - void *arg);
> + bool indexed, void *arg);
> ```
>
> I felt pre-existing API should not be changed. How about adding
> binaryheap_allocate_extended() or something which can specify the `bool indexed`?
> binaryheap_allocate() sets heap->bh_indexed to false.

I'm really not sure it's worth inventing a
binaryheap_allocate_extended() function just for preserving API
compatibility. I think it's generally a good idea to have
xxx_extended() function to increase readability and usability, for
example, for the case where the same (kind of default) arguments are
passed in most cases and the function is called from many places.
However, we have a handful binaryheap_allocate() callers, and I
believe that it would not hurt the existing callers.

>
> 05.
> ```
> +extern void binaryheap_update_up(binaryheap *heap, bh_node_type d);
> +extern void binaryheap_update_down(binaryheap *heap, bh_node_type d);
> ```
>
> IIUC, callers must consider whether the node should be shift up/down and use
> appropriate function, right? I felt it may not be user-friendly.

Right, I couldn't come up with a better interface.

Another idea I've considered was that the caller provides a callback
function where it can compare the old and new keys. For example, in
reorderbuffer case, we call like:

binaryheap_update(rb->txn_heap, PointerGetDatum(txn),
ReorderBufferTXNUpdateCompare, (void *) &old_size);

Then in ReorderBufferTXNUpdateCompare(),
ReorderBufferTXN *txn = (ReorderBufferTXN *) a;Size old_size = *(Size *) b;
(compare txn->size to "b" ...)

However it seems complicated...

>
> Comments for 0003:
>
> 06.
> ```
> This commit changes the eviction algorithm in ReorderBuffer to use
> max-heap with transaction size,a nd use two strategies depending on
> the number of transactions being decoded.
> ```
>
> s/a nd/ and/
>
> 07.
> ```
> It could be too expensive to pudate max-heap while preserving the heap
> property each time the transaction's memory counter is updated, as it
> could happen very frquently. So when the number of transactions being
> decoded is small, we add the transactions to max-heap but don't
> preserve the heap property, which is O(1). We heapify the max-heap
> just before picking the largest transaction, which is O(n). This
> strategy minimizes the overheads of updating the transaction's memory
> counter.
> ```
>
> s/pudate/update/

Will fix them.

>
> 08.
> IIUC, if more than 1024 transactions are running but they have small amount of
> changes, the performance may be degraded, right? Do you have a result in sucha
> a case?

I've run a benchmark test that I shared before[1]. Here are results of
decoding a transaction that has 1M subtransaction each of which has 1
INSERT:

HEAD:
1810.192 ms

HEAD w/ patch:
2001.094 ms

I set a large enough value to logical_decoding_work_mem not to evict
any transactions. I can see about about 10% performance regression in
this case.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-02-02 06:48:02 Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)
Previous Message Michael Paquier 2024-02-02 06:41:08 Re: Commitfest 2024-01 is now closed