Quick Links

Re: Parallel CREATE INDEX for GIN indexes

From:	Andy Fan <zhihuifan1213(at)163(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Parallel CREATE INDEX for GIN indexes
Date:	2024-05-28 09:29:48
Message-ID:	87jzjes2ia.fsf@163.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Tomas,

I have completed my first round of review, generally it looks good to
me, more testing need to be done in the next days. Here are some tiny
comments from my side, just FYI.

1. Comments about GinBuildState.bs_leader looks not good for me.

/*
* bs_leader is only present when a parallel index build is performed, and
* only in the leader process. (Actually, only the leader process has a
* GinBuildState.)
*/
GinLeader *bs_leader;

In the worker function _gin_parallel_build_main:
initGinState(&buildstate.ginstate, indexRel); is called, and the
following members in workers at least: buildstate.funcCtx,
buildstate.accum and so on. So is the comment "only the leader process
has a GinBuildState" correct?

2. progress argument is not used?
_gin_parallel_scan_and_build(GinBuildState *state,
GinShared *ginshared, Sharedsort *sharedsort,
Relation heap, Relation index,
int sortmem, bool progress)

3. In function tuplesort_begin_index_gin, comments about nKeys takes me
some time to think about why 1 is correct(rather than
IndexRelationGetNumberOfKeyAttributes) and what does the "only the index
key" means.

base->nKeys = 1; /* Only the index key */

finally I think it is because gin index stores each attribute value into
an individual index entry for a multi-column index, so each index entry
has only 1 key. So we can comment it as the following?

"Gin Index stores the value of each attribute into different index entry
for mutli-column index, so each index entry has only 1 key all the
time." This probably makes it easier to understand.

4. GinBuffer: The comment "Similar purpose to BuildAccumulator, but much
simpler." makes me think why do we need a simpler but
similar structure, After some thoughts, they are similar at accumulating
TIDs only. GinBuffer is designed for "same key value" (hence
GinBufferCanAddKey). so IMO, the first comment is good enough and the 2
comments introduce confuses for green hand and is potential to remove
it.

/*
* State used to combine accumulate TIDs from multiple GinTuples for the same
* key value.
*
* XXX Similar purpose to BuildAccumulator, but much simpler.
*/
typedef struct GinBuffer

5. GinBuffer: ginMergeItemPointers always allocate new memory for the
new items and hence we have to pfree old memory each time. However it is
not necessary in some places, for example the new items can be appended
to Buffer->items (and this should be a common case). So could we
pre-allocate some spaces for items and reduce the number of pfree/palloc
and save some TID items copy in the desired case?

6. GinTuple.ItemPointerData first; /* first TID in the array */

is ItemPointerData.ip_blkid good enough for its purpose? If so, we can
save the memory for OffsetNumber for each GinTuple.

Item 5) and 6) needs some coding and testing. If it is OK to do, I'd
like to take it as an exercise in this area. (also including the item
1~4.)

--
Best Regards
Andy Fan

In response to

Re: Parallel CREATE INDEX for GIN indexes at 2024-05-13 09:19:08 from Tomas Vondra

Responses

Re: Parallel CREATE INDEX for GIN indexes at 2024-06-19 11:55:35 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andy Fan	2024-05-28 09:47:17	Re: why memoize is not used for correlated subquery
Previous Message	Nisha Moond	2024-05-28 09:17:16	Re: Conflict Detection and Resolution