Re: Poor performance due to parallel seq scan on indexed date field

From: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To: Holger Jakobs <holger(at)jakobs(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Poor performance due to parallel seq scan on indexed date field
Date: 2023-06-21 18:39:39
Message-ID: CAOC+FBW0qPfgkcQ1D=LT6Kxp3gPjrpQyXcBr2bKqb0QvDLQiDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

It's just this.

CREATE OR REPLACE VIEW vw_pitches AS
SELECT
p.year,
p.game_id,
p.game_date,
p.game_level,
...
from synergy.pitches as p
join alias.identity as idpitcher
on p.pitcher_identity_id = idpitcher.identity_id
left join alias.identity as idcatcher
on p.catcher_identity_id = idcatcher.identity_id
left join alias.identity as idbatter
on p.batter_identity_id = idbatter.identity_id;

The alias.identity.identity_id column is indexed.

The main issue is SELECT COUNT(*) over wide date ranges, which is something
we'd like to do frequently.

On Wed, Jun 21, 2023 at 11:37 AM Holger Jakobs <holger(at)jakobs(dot)com> wrote:

>
> Am 21.06.23 um 20:31 schrieb Wells Oliver:
> > Dead simple date scan across a big-ish table (est. 23,153,666 rows)
> >
> > explain analyze select count(*) from vw_pitches where game_date >=
> > '2022-06-21' and game_date <= '2023-06-21';
> >
> > The view does do some joins but those don't seem to be the issue to me.
> >
> > Planner does:
> >
> > Finalize Aggregate (cost=3596993.88..3596993.89 rows=1 width=8)
> > (actual time=69980.491..69982.076 rows=1 loops=1)
> > -> Gather (cost=3596993.46..3596993.87 rows=4 width=8) (actual
> > time=69979.137..69982.071 rows=5 loops=1)
> > Workers Planned: 4
> > Workers Launched: 4
> > -> Partial Aggregate (cost=3595993.46..3595993.47 rows=1
> > width=8) (actual time=69975.136..69975.137 rows=1 loops=5)
> > -> Nested Loop (cost=0.44..3591408.37 rows=1834034
> > width=0) (actual time=0.882..69875.934 rows=1458419 loops=5)
> > -> Parallel Seq Scan on pitches p
> > (cost=0.00..3537431.89 rows=1834217 width=12) (actual
> > time=0.852..68914.256 rows=1458419 loops=5)
> > Filter: ((game_date >= '2022-06-21'::date)
> > AND (game_date <= '2023-06-21'::date))
> > Rows Removed by Filter: 3212310
> > -> Memoize (cost=0.44..0.47 rows=1 width=4)
> > (actual time=0.000..0.000 rows=1 loops=7292095)
> > Cache Key: p.pitcher_identity_id
> > Cache Mode: logical
> > Hits: 1438004 Misses: 21042 Evictions: 0
> > Overflows: 0 Memory Usage: 2138kB
> > Worker 0: Hits: 1429638 Misses: 21010
> > Evictions: 0 Overflows: 0 Memory Usage: 2134kB
> > Worker 1: Hits: 1456755 Misses: 21435
> > Evictions: 0 Overflows: 0 Memory Usage: 2177kB
> > Worker 2: Hits: 1433557 Misses: 21201
> > Evictions: 0 Overflows: 0 Memory Usage: 2154kB
> > Worker 3: Hits: 1428727 Misses: 20726
> > Evictions: 0 Overflows: 0 Memory Usage: 2105kB
> > -> Index Only Scan using identity_pkey on
> > identity idpitcher (cost=0.43..0.46 rows=1 width=4) (actual
> > time=0.007..0.007 rows=1 loops=105414)
> > Index Cond: (identity_id =
> > p.pitcher_identity_id)
> > Heap Fetches: 83
> > Planning Time: 1.407 ms
> > Execution Time: 69982.927 ms
> >
> > Is there something to be done here? Kind of a frequent style of query
> > and quite slow.
> >
> Could you provide the definition of the view(s) down to the base tables?
>
> --
> Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
>
>

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ron 2023-06-21 19:14:46 Re: Poor performance due to parallel seq scan on indexed date field
Previous Message Holger Jakobs 2023-06-21 18:37:29 Re: Poor performance due to parallel seq scan on indexed date field