Re: Support tid range scan in parallel?

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc: Cary Huang <cary(dot)huang(at)highgo(dot)ca>, David Rowley <dgrowleyml(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support tid range scan in parallel?
Date: 2024-08-22 11:29:07
Message-ID: CA+FpmFeLj0V4p-GXWVGOd4B+BRtTZFSLeDbDK31Hs9DOyYBPWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is a good idea to extend parallelism in postgres.
I went through this patch, and here are a few review comments,

+ Size pscan_len; /* size of parallel tid range scan descriptor */

The other name for this var could be tidrs_PscanLen, following the pattern
in indexScanState and IndexOnlyScanState.
Also add it and its description in the comment above the struct.

/* ----------------------------------------------------------------
* ExecTidRangeScanInitializeDSM
*
* Set up a parallel heap scan descriptor.
* ----------------------------------------------------------------
*/
This comment doesn't seem right, please correct it to say for Tid range
scan descriptor.

+
+ sscan = relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL,
+ pscan, flags);
+
+ return sscan;

I do not see any requirement of using this sscan var.

- if (nallocated >= pbscan->phs_nblocks)
+ if (nallocated >= pbscan->phs_nblocks || (pbscan->phs_numblock != 0 &&
+ nallocated >= pbscan->phs_numblock))
page = InvalidBlockNumber; /* all blocks have been allocated */

Please add a comment for the reason for this change. As far as I
understood, this is only for the case of TIDRangeScan so it requires an
explanation for the case.

On Sun, 11 Aug 2024 at 09:03, Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

> Hi Cary,
>
> On Wed, May 15, 2024 at 5:33 AM Cary Huang <cary(dot)huang(at)highgo(dot)ca> wrote:
> >
> > > You should add a test that creates a table with a very low fillfactor,
> > > low enough so only 1 tuple can fit on each page and insert a few dozen
> > > tuples. The test would do SELECT COUNT(*) to ensure you find the
> > > correct subset of tuples. You'd maybe want MIN(ctid) and MAX(ctid) in
> > > there too for extra coverage to ensure that the correct tuples are
> > > being counted. Just make sure and EXPLAIN (COSTS OFF) the query first
> > > in the test to ensure that it's always testing the plan you're
> > > expecting to test.
> >
> > thank you for the test suggestion. I moved the regress tests from
> select_parallel
> > to tidrangescan instead. I follow the existing test table creation in
> tidrangescan
> > with the lowest fillfactor of 10, I am able to get consistent 5 tuples
> per page
> > instead of 1. It should be okay as long as it is consistently 5 tuples
> per page so
> > the tuple count results from parallel tests would be in multiples of 5.
> >
> > The attached v4 patch includes the improved regression tests.
> >
> > Thank you very much!
> >
> > Cary Huang
> > -------------
> > HighGo Software Inc. (Canada)
> > cary(dot)huang(at)highgo(dot)ca
> > www.highgo.ca
> >
>
> +++ b/src/backend/access/heap/heapam.c
> @@ -307,6 +307,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool
> keep_startblock)
> * results for a non-MVCC snapshot, the caller must hold some higher-level
> * lock that ensures the interesting tuple(s) won't change.)
> */
> +
>
> I see no reason why you add a blank line here, is it a typo?
>
> +/* ----------------------------------------------------------------
> + * ExecSeqScanInitializeWorker
> + *
> + * Copy relevant information from TOC into planstate.
> + * ----------------------------------------------------------------
> + */
> +void
> +ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
> + ParallelWorkerContext *pwcxt)
> +{
> + ParallelTableScanDesc pscan;
>
> Function name in the comment is not consistent.
>
> @@ -81,6 +81,7 @@ typedef struct ParallelBlockTableScanDescData
> BlockNumber phs_startblock; /* starting block number */
> pg_atomic_uint64 phs_nallocated; /* number of blocks allocated to
> * workers so far. */
> + BlockNumber phs_numblock; /* max number of blocks to scan */
> } ParallelBlockTableScanDescData;
>
> Can this be reorganized by putting phs_numblock after phs_startblock?
>
>
> >
> >
> >
> >
> >
>
>
> --
> Regards
> Junwang Zhao
>
>
>

--
Regards,
Rafia Sabih

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-22 11:32:43 Re: Redundant Result node
Previous Message Melih Mutlu 2024-08-22 11:26:11 Re: ANALYZE ONLY