From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Cary Huang <cary(dot)huang(at)highgo(dot)ca> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support tid range scan in parallel? |
Date: | 2024-05-04 01:14:16 |
Message-ID: | CAApHDvqeySxbaz35yCe_kWNDWT03QoHxA9MtmqhUHf-P1AMZ7A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 4 May 2024 at 06:55, Cary Huang <cary(dot)huang(at)highgo(dot)ca> wrote:
> With syncscan enabled, the "table_block_parallelscan_nextpage()" would
> return the next block since the end of the first tid rangescan instead of the
> correct start block that should be scanned. I see that single tid rangescan
> does not have SO_ALLOW_SYNC set, so I figure syncscan should also be
> disabled in parallel case. With this change, then it would be okay to call
> heap_setscanlimits() in parallel case, so I added this call back to
> heap_set_tidrange() in both serial and parallel cases.
This now calls heap_setscanlimits() for the parallel version, it's
just that table_block_parallelscan_nextpage() does nothing to obey
those limits.
The only reason the code isn't reading the entire table is due to the
optimisation in heap_getnextslot_tidrange() which returns false when
the ctid goes out of range. i.e, this code:
/*
* When scanning forward, the TIDs will be in ascending order.
* Future tuples in this direction will be higher still, so we can
* just return false to indicate there will be no more tuples.
*/
if (ScanDirectionIsForward(direction))
return false;
If I comment out that line, I see all pages are accessed:
postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=18.80..18.81 rows=1 width=8) (actual
time=33.530..36.118 rows=1 loops=1)
Buffers: shared read=4425
-> Gather (cost=18.78..18.79 rows=2 width=8) (actual
time=33.456..36.102 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared read=4425
-> Partial Aggregate (cost=18.78..18.79 rows=1 width=8)
(actual time=20.389..20.390 rows=1 loops=3)
Buffers: shared read=4425
-> Parallel Tid Range Scan on a (cost=0.01..16.19
rows=1035 width=0) (actual time=9.375..20.349 rows=829 loops=3)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared read=4425 <---- this is all
pages in the table instead of 11 pages.
With that code still commented out, the non-parallel version still
won't read all pages due to the setscanlimits being obeyed.
postgres=# set max_parallel_workers_per_gather=0;
SET
postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Aggregate (cost=45.07..45.08 rows=1 width=8) (actual
time=0.302..0.302 rows=1 loops=1)
Buffers: shared hit=11
-> Tid Range Scan on a (cost=0.01..38.86 rows=2485 width=0)
(actual time=0.019..0.188 rows=2486 loops=1)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid))
Buffers: shared hit=11
If I put that code back in, how many pages are read depends on the
number of parallel workers as workers will keep running with higher
page numbers and heap_getnextslot_tidrange() will just (inefficiently)
filter those out.
max_parallel_workers_per_gather=2;
-> Parallel Tid Range Scan on a (cost=0.01..16.19
rows=1035 width=0) (actual time=0.191..0.310 rows=829 loops=3)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared read=17
max_parallel_workers_per_gather=3;
-> Parallel Tid Range Scan on a (cost=0.01..12.54
rows=802 width=0) (actual time=0.012..0.114 rows=622 loops=4)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared hit=19
max_parallel_workers_per_gather=4;
-> Parallel Tid Range Scan on a (cost=0.01..9.72
rows=621 width=0) (actual time=0.014..0.135 rows=497 loops=5)
TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
Buffers: shared hit=21
To fix this you need to make table_block_parallelscan_nextpage obey
the limits imposed by heap_setscanlimits().
The equivalent code in the non-parallel version is in
heapgettup_advance_block().
/* check if the limit imposed by heap_setscanlimits() is met */
if (scan->rs_numblocks != InvalidBlockNumber)
{
if (--scan->rs_numblocks == 0)
return InvalidBlockNumber;
}
I've not studied exactly how you'd get the rs_numblock information
down to the parallel scan descriptor. But when you figure that out,
just remember that you can't do the --scan->rs_numblocks from
table_block_parallelscan_nextpage() as that's not parallel safe. You
might be able to add an or condition to: "if (nallocated >=
pbscan->phs_nblocks)" to make it "if (nallocated >=
pbscan->phs_nblocks || nallocated >= pbscan->phs_numblocks)",
although the field names don't seem very intuitive there. It would be
nicer if the HeapScanDesc field was called rs_blocklimit rather than
rs_numblocks. It's not for this patch to go messing with that,
however.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Burbery | 2024-05-04 01:49:38 | Proposal for CREATE OR REPLACE EVENT TRIGGER in PostgreSQL |
Previous Message | Joe Conway | 2024-05-03 22:36:05 | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |