| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | Zhihong Yu <zyu(at)yugabyte(dot)com> |
| Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Parallel Full Hash Join |
| Date: | 2021-04-06 18:59:23 |
| Message-ID: | CAAKRu_ZraYTHdfNA=sGqt9J+hsoKSas5wr4PBrtmVe_tc2+qbw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
> Hi,
> For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch
>
> + * current_chunk_idx: index in current HashMemoryChunk
>
> The above comment seems to be better fit for ExecScanHashTableForUnmatched(), instead of ExecParallelPrepHashTableForUnmatched.
> I wonder where current_chunk_idx should belong (considering the above comment and what the code does).
>
> + while (hashtable->current_chunk_idx < hashtable->current_chunk->used)
> ...
> + next = hashtable->current_chunk->next.unshared;
> + hashtable->current_chunk = next;
> + hashtable->current_chunk_idx = 0;
>
> Each time we advance to the next chunk, current_chunk_idx is reset. It seems current_chunk_idx can be placed inside chunk.
> Maybe the consideration is that, with the current formation we save space by putting current_chunk_idx field at a higher level.
> If that is the case, a comment should be added.
>
Thank you for the review. I think that moving the current_chunk_idx into
the HashMemoryChunk would probably take up too much space.
Other places that we loop through the tuples in the chunk, we are able
to just keep a local idx, like here in
ExecParallelHashIncreaseNumBuckets():
case PHJ_GROW_BUCKETS_REINSERTING:
...
while ((chunk = ExecParallelHashPopChunkQueue(hashtable, &chunk_s)))
{
size_t idx = 0;
while (idx < chunk->used)
but, since we cannot do that while also emitting tuples, I thought,
let's just stash the index in the hashtable for use in serial hash join
and the batch accessor for parallel hash join. A comment to this effect
sounds good to me.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hellmuth Vargas | 2021-04-06 19:03:03 | PostgreSQL log query's result size |
| Previous Message | Bruce Momjian | 2021-04-06 18:42:17 | Re: Key management with tests |