From: | Wells Oliver <wells(dot)oliver(at)gmail(dot)com> |
---|---|
To: | Holger Jakobs <holger(at)jakobs(dot)com> |
Cc: | pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Re: Poor performance due to parallel seq scan on indexed date field |
Date: | 2023-06-21 18:39:39 |
Message-ID: | CAOC+FBW0qPfgkcQ1D=LT6Kxp3gPjrpQyXcBr2bKqb0QvDLQiDg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
It's just this.
CREATE OR REPLACE VIEW vw_pitches AS
SELECT
p.year,
p.game_id,
p.game_date,
p.game_level,
...
from synergy.pitches as p
join alias.identity as idpitcher
on p.pitcher_identity_id = idpitcher.identity_id
left join alias.identity as idcatcher
on p.catcher_identity_id = idcatcher.identity_id
left join alias.identity as idbatter
on p.batter_identity_id = idbatter.identity_id;
The alias.identity.identity_id column is indexed.
The main issue is SELECT COUNT(*) over wide date ranges, which is something
we'd like to do frequently.
On Wed, Jun 21, 2023 at 11:37 AM Holger Jakobs <holger(at)jakobs(dot)com> wrote:
>
> Am 21.06.23 um 20:31 schrieb Wells Oliver:
> > Dead simple date scan across a big-ish table (est. 23,153,666 rows)
> >
> > explain analyze select count(*) from vw_pitches where game_date >=
> > '2022-06-21' and game_date <= '2023-06-21';
> >
> > The view does do some joins but those don't seem to be the issue to me.
> >
> > Planner does:
> >
> > Finalize Aggregate (cost=3596993.88..3596993.89 rows=1 width=8)
> > (actual time=69980.491..69982.076 rows=1 loops=1)
> > -> Gather (cost=3596993.46..3596993.87 rows=4 width=8) (actual
> > time=69979.137..69982.071 rows=5 loops=1)
> > Workers Planned: 4
> > Workers Launched: 4
> > -> Partial Aggregate (cost=3595993.46..3595993.47 rows=1
> > width=8) (actual time=69975.136..69975.137 rows=1 loops=5)
> > -> Nested Loop (cost=0.44..3591408.37 rows=1834034
> > width=0) (actual time=0.882..69875.934 rows=1458419 loops=5)
> > -> Parallel Seq Scan on pitches p
> > (cost=0.00..3537431.89 rows=1834217 width=12) (actual
> > time=0.852..68914.256 rows=1458419 loops=5)
> > Filter: ((game_date >= '2022-06-21'::date)
> > AND (game_date <= '2023-06-21'::date))
> > Rows Removed by Filter: 3212310
> > -> Memoize (cost=0.44..0.47 rows=1 width=4)
> > (actual time=0.000..0.000 rows=1 loops=7292095)
> > Cache Key: p.pitcher_identity_id
> > Cache Mode: logical
> > Hits: 1438004 Misses: 21042 Evictions: 0
> > Overflows: 0 Memory Usage: 2138kB
> > Worker 0: Hits: 1429638 Misses: 21010
> > Evictions: 0 Overflows: 0 Memory Usage: 2134kB
> > Worker 1: Hits: 1456755 Misses: 21435
> > Evictions: 0 Overflows: 0 Memory Usage: 2177kB
> > Worker 2: Hits: 1433557 Misses: 21201
> > Evictions: 0 Overflows: 0 Memory Usage: 2154kB
> > Worker 3: Hits: 1428727 Misses: 20726
> > Evictions: 0 Overflows: 0 Memory Usage: 2105kB
> > -> Index Only Scan using identity_pkey on
> > identity idpitcher (cost=0.43..0.46 rows=1 width=4) (actual
> > time=0.007..0.007 rows=1 loops=105414)
> > Index Cond: (identity_id =
> > p.pitcher_identity_id)
> > Heap Fetches: 83
> > Planning Time: 1.407 ms
> > Execution Time: 69982.927 ms
> >
> > Is there something to be done here? Kind of a frequent style of query
> > and quite slow.
> >
> Could you provide the definition of the view(s) down to the base tables?
>
> --
> Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
>
>
--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
From | Date | Subject | |
---|---|---|---|
Next Message | Ron | 2023-06-21 19:14:46 | Re: Poor performance due to parallel seq scan on indexed date field |
Previous Message | Holger Jakobs | 2023-06-21 18:37:29 | Re: Poor performance due to parallel seq scan on indexed date field |