Re: Parallel CREATE INDEX for BRIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for BRIN indexes
Date: 2023-11-28 15:39:33
Message-ID: CAEze2WhCSKQDhP3nyARWH5Zf+B-5rtDteA7d4QSDa-BuSSkSPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 23 Nov 2023 at 14:35, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 11/23/23 13:33, Matthias van de Meent wrote:
>> The union operator may leak (lots of) memory, so I think it makes
>> sense to keep a context around that can be reset after we've extracted
>> the merge result.
>>
>
> But does the current code actually achieve that? It does create a "brin
> union" context, but then it only does this:
>
> /* Use our own memory context to avoid retail pfree */
> cxt = AllocSetContextCreate(CurrentMemoryContext,
> "brin union",
> ALLOCSET_DEFAULT_SIZES);
> oldcxt = MemoryContextSwitchTo(cxt);
> db = brin_deform_tuple(bdesc, b, NULL);
> MemoryContextSwitchTo(oldcxt);
>
> Surely that does not limit the amount of memory used by the actual union
> functions in any way?

Oh, yes, of course. For some reason I thought that covered the calls
to the union operator function too, but it indeed only covers
deserialization. I do think it is still worthwhile to not do the
create/delete cycle, but won't hold the patch back for that.

>>> However, I don't think the number of union_tuples calls is likely to be
>>> very high, especially for large tables. Because we split the table into
>>> 2048 chunks, and then cap the chunk size by 8192. For large tables
>>> (where this matters) we're likely close to 8192.
>>
>> I agree that the merging part of the index creation is the last part,
>> and usually has no high impact on the total performance of the reindex
>> operation, but in memory-constrained environments releasing and then
>> requesting the same chunk of memory over and over again just isn't
>> great.
>
> OK, I'll take a look at the scratch context you suggested.
>
> My point however was we won't actually do that very often, because on
> large tables the BRIN ranges are likely smaller than the parallel scan
> chunk size, so few overlaps. OTOH if the table is small, or if the BRIN
> ranges are large, there'll be few of them.

That's true, so maybe I'm concerned about something that amounts to
only marginal gains.

I noticed that the v4 patch doesn't yet update the documentation in
indexam.sgml with am->amcanbuildparallel.
Once that is included and reviewed I think this will be ready, unless
you want to address any of my comments upthread (that I marked with
'not in this patch') in this patch.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-28 15:42:27 Re: SSL tests fail on OpenSSL v3.2.0
Previous Message Tristan Partin 2023-11-28 15:33:04 Re: SSL tests fail on OpenSSL v3.2.0