Re: Parallel CREATE INDEX for GIN indexes

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-08-29 04:30:44
Message-ID: 87ikvkgdcb.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(at)vondra(dot)me> writes:

Hi Tomas,

> Yeah. I think we have agreement on 0001-0007.

Yes, the design of 0001-0007 looks good to me and because of the
existing compexitity, I want to foucs on this part for now. I am doing
code review from yesterday, and now my work is done. Just some small
questions:

1. In GinBufferStoreTuple,

/*
* Check if the last TID in the current list is frozen. This is the case
* when merging non-overlapping lists, e.g. in each parallel worker.
*/
if ((buffer->nitems > 0) &&
(ItemPointerCompare(&buffer->items[buffer->nitems - 1], &tup->first) == 0))
buffer->nfrozen = buffer->nitems;

should we do (ItemPointerCompare(&buffer->items[buffer->nitems - 1],
&tup->first) "<=" 0), rather than "=="?

2. Given the "non-overlap" case should be the major case
GinBufferStoreTuple , does it deserve a fastpath for it before calling
ginMergeItemPointers since ginMergeItemPointers have a unconditionally
memory allocation directly, and later we pfree it?

new = ginMergeItemPointers(&buffer->items[buffer->nfrozen], /* first unfronzen */
(buffer->nitems - buffer->nfrozen), /* num of unfrozen */
items, tup->nitems, &nnew);

3. The following comment in index_build is out-of-date now :)

/*
* Determine worker process details for parallel CREATE INDEX. Currently,
* only btree has support for parallel builds.
*

4. Comments - Buffer is not empty and it's storing "a different key"
looks wrong to me. the key may be same and we just need to flush them
because of memory usage. There is the same issue in both
_gin_process_worker_data and _gin_parallel_merge.

if (GinBufferShouldTrim(buffer, tup))
{
Assert(buffer->nfrozen > 0);

state->buildStats.nTrims++;

/*
* Buffer is not empty and it's storing a different key - flush
* the data into the insert, and start a new entry for current
* GinTuple.
*/
AssertCheckItemPointers(buffer, true);

I also run valgrind testing with some testcase, no memory issue is
found.

> I'm a bit torn about 0008, I have not expected changing tuplesort like
> this when I started working
> on the patch, but I can't deny it's a massive speedup for some cases
> (where the patch doesn't help otherwise). But then in other cases it
> doesn't help at all, and 0010 helps.

Yes, I'd like to see these improvements both 0008 and 0010 as a
dedicated improvement.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-08-29 04:38:57 Re: New function normal_rand_array function to contrib/tablefunc.
Previous Message Tender Wang 2024-08-29 03:38:11 Re: Eager aggregation, take 3