Re: Adding doubly linked list type which stores the number of items in the list

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding doubly linked list type which stores the number of items in the list
Date: 2022-10-31 07:14:16
Message-ID: CAApHDvpPvAjM6_hsmuGL1vJk++Q7tsiOLRZUcvcPGF_TM3HcUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 31 Oct 2022 at 19:05, Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> So, when an overflow occurs, the head->count wraps around after
> PG_UINT32_MAX, meaning, becomes 0 and we will catch it in an assert
> build. This looks reasonable to me. However, the responsibility lies
> with the developers to deal with such overflows.

I'm not sure what the alternatives are here. One of the advantages of
dlist over List is that there are no memory allocations to add a node
which is already allocated onto a list. lappend() might need to
enlarge the list, which means you can't do that in a critical section.
It's currently OK to add an item to a dlist in a critical section,
however, if we add an elog(ERROR) then it won't be. The best I think
we can do is to just let the calling code ensure that it only uses
dlist when it's certain that there can't be more than 2^32 items to
store at once.

Additionally, everywhere I've replaced dlist with dclist in the patch
used either an int or uint32 for the counter. There was no code which
checked if the existing counter had wrapped.

> Thanks. The v3 patch looks good to me.

Great. Thanks for having a look.

> BTW, do we need sclist_* as well? AFAICS, no such use-case exists
> needing slist and element count, maybe we don't need.

I don't see anywhere that requires it.

> I'm wondering if adding count to dlist_head and maintaining it as part
> of the existing dlist_* data structure and functions is any better
> than introducing dclist_*? In that case, we need only one function,
> dlist_count, no? Or do we choose to go dclist_* because we want to
> avoid the extra cost of maintaining count within dlist_*? If yes, is
> maintaining count in dlist_* really costly?

I have a few reasons for not wanting to do that:

1) I think dlist operations are very fast at the moment. The fact
that the functions are static inline tells me the function call
overhead matters. Therefore, it's likely maintaining a count also
matters.
2) Code bloat. The functions are static inline. That means all
compiled code that adds or removes an item from a dlist would end up
larger. That results in more instruction cache misses.
3) I've no reason to believe that all call sites that do
dlist_delete() have the ability to know which list the node is on.
Just look at ReorderBufferAssignChild(). I decided to not convert the
subtxns dlist into a dclist as the subtransaction sometimes seems to
go onto the top transaction list before it's moved to the sub-txn's
list.
4) There's very little or no scope for bugs in dclist relating to the
dlist implementation as all that stuff is done by just calling the
dlist_* functions. The only scope is really that it could call the
wrong dlist_* function. It does not seem terribly hard to ensure we
don't write any bugs like that.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2022-10-31 07:14:54 Re: Lock on ShmemVariableCache fields?
Previous Message Peter Smith 2022-10-31 06:21:48 Re: GUC values - recommended way to declare the C variables?