Re: impact join syntax ?? and gist index ??

From: Marc Millas <marc(dot)millas(at)mokadb(dot)com>
To: Erik Wienhold <ewie(at)ewie(dot)name>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: impact join syntax ?? and gist index ??
Date: 2023-01-07 20:57:43
Message-ID: CADX_1abdY44bNmLVW07u0GqWqJWMhH6q+MHfByP=macs7wur+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

here they are: (I replace the column and table names) also I post 2 more
remarks, one on left join, and one on the test I did on postgres 15 postgis
3.3...

2023=# explain (analyze, buffers) select A.x, count(B.x) from A left join B
on st_within(B.geom, A.geom) group by A.x;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=212638398.98..212701792.16 rows=20 width=16) (actual
time=86717.857..86757.820 rows=11 loops=1)
Group Key: A.x
Buffers: shared hit=4243867
-> Sort (cost=212638398.98..212659529.97 rows=8452398 width=16)
(actual time=86717.851..86727.334 rows=421307 loops=1)
Sort Key: A.x
Sort Method: quicksort Memory: 37963kB
Buffers: shared hit=4243867
-> Nested Loop Left Join (cost=0.00..211521459.31 rows=8452398
width=16) (actual time=17.473..86642.332 rows=421307 loops=1)
Join Filter: st_within(B.geom, A.geom)
Rows Removed by Join Filter: 4229377
Buffers: shared hit=4243867
-> Seq Scan on A (cost=0.00..9.20 rows=20 width=17752)
(actual time=0.009..0.043 rows=11 loops=1)
Buffers: shared hit=9
-> Materialize (cost=0.00..22309.83 rows=422789 width=40)
(actual time=0.001..23.392 rows=422789 loops=11)
Buffers: shared hit=15968
-> Seq Scan on B (cost=0.00..20195.89 rows=422789
width=40) (actual time=0.006..57.651 rows=422789 loops=1)
Buffers: shared hit=15968
Planning Time: 0.693 ms
Execution Time: 86763.087 ms
(19 lignes)

2023=# explain (analyze, buffers) select A.x, count(B.x) from A, B where
st_within(B.geom, A.geom) group by A.x;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=6301606.00..6301608.60 rows=20 width=16)
(actual time=11857.363..11863.212 rows=6 loops=1)
Group Key: A.x
Buffers: shared hit=2128836
-> Gather Merge (cost=6301606.00..6301608.30 rows=20 width=16) (actual
time=11857.359..11863.207 rows=12 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=2128836
-> Sort (cost=6300605.99..6300606.04 rows=20 width=16) (actual
time=11840.355..11840.356 rows=6 loops=2)
Sort Key: A.x
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=2128836
-> Partial HashAggregate (cost=6300605.36..6300605.56
rows=20 width=16) (actual time=11840.331..11840.332 rows=6 loops=2)
Group Key: A.x
Buffers: shared hit=2128825
-> Nested Loop (cost=0.13..6275745.36 rows=4971999
width=16) (actual time=0.505..11781.817 rows=210651 loops=2)
Buffers: shared hit=2128825
-> Parallel Seq Scan on B (cost=0.00..18454.99
rows=248699 width=40) (actual time=0.005..22.859 rows=211395 loops=2)
Buffers: shared hit=15968
-> Index Scan using A_geom_idx on A
(cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1
loops=422789)
Index Cond: (geom ~ B.geom)
Filter: st_within(B.geom, geom)
Rows Removed by Filter: 0
Buffers: shared hit=2112857
Planning Time: 0.252 ms
Execution Time: 11863.357 ms
(26 lignes)

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

On Sat, Jan 7, 2023 at 9:40 PM Erik Wienhold <ewie(at)ewie(dot)name> wrote:

> > On 07/01/2023 20:46 CET Marc Millas <marc(dot)millas(at)mokadb(dot)com> wrote:
> >
> > Hi,
> >
> > postgres 12, postgis 3.0
> >
> > I have a small table A, 11 rows with a varchar column x and a geometry
> column y.
> > gist index on the geometry column.
> > the geometry do contains multipolygons (regions on a map)
> > I have a second table B , same structure, around 420 000 rows.
> > no index,
> > the geometry do contains points.
> > all geometries are on 4326 srid.
> >
> > If i ask to count points in each multipolygons:
> >
> > select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x;
> > it takes 11 seconds (everything in shared buffers).
> > If I do the very same thing as:
> > select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group
> by A.x;
> > same result, but 85 seconds (every thing in shared buffers, again)
> > if I redo asking with explain analyze, buffers, the plan is very
> different.
> >
> >
> > if I do create a gist index on geometry column of the big table,
> both syntax takes 21 seconds.
> >
> > I get the feeling I am missing something.. (at least 2 things...)
> > can someone shed some light ??
>
> Please provide the executions plans for both queries with and without the
> index on B.y.
>
> --
> Erik
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-01-07 21:32:57 Re: impact join syntax ?? and gist index ??
Previous Message Erik Wienhold 2023-01-07 20:40:09 Re: impact join syntax ?? and gist index ??