Re: Poor performance due to parallel seq scan on indexed date field

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Poor performance due to parallel seq scan on indexed date field
Date: 2023-06-21 19:14:46
Message-ID: 208a4de5-44a3-2ea6-5472-02887ec9a118@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Is there an index on synergy.pitches.game_date?

It's no guarantee* that will speed up the query, but "index on WHERE clause
fields" is the first thing to look for.

* The problem with "over wide date ranges" is that the query optimizer might
decide that the date range is so wide that it's just as cheap -- or cheaper
-- to scan the whole table.

On 6/21/23 13:39, Wells Oliver wrote:
> It's just this.
>
> CREATE OR REPLACE VIEW vw_pitches AS
> SELECT
>     p.year,
>     p.game_id,
>     p.game_date,
>     p.game_level,
>     ...
> from synergy.pitches as p
> join alias.identity as idpitcher
>     on p.pitcher_identity_id = idpitcher.identity_id
> left join alias.identity as idcatcher
>     on p.catcher_identity_id = idcatcher.identity_id
> left join alias.identity as idbatter
>     on p.batter_identity_id = idbatter.identity_id;
>
> The alias.identity.identity_id column is indexed.
>
> The main issue is SELECT COUNT(*) over wide date ranges, which is
> something we'd like to do frequently.
>
>
> On Wed, Jun 21, 2023 at 11:37 AM Holger Jakobs <holger(at)jakobs(dot)com> wrote:
>
>
> Am 21.06.23 um 20:31 schrieb Wells Oliver:
> > Dead simple date scan across a big-ish table (est. 23,153,666 rows)
> >
> > explain analyze select count(*) from vw_pitches where game_date >=
> > '2022-06-21' and game_date <= '2023-06-21';
> >
> > The view does do some joins but those don't seem to be the issue to me.
> >
> > Planner does:
> >
> > Finalize Aggregate  (cost=3596993.88..3596993.89 rows=1 width=8)
> > (actual time=69980.491..69982.076 rows=1 loops=1)
> >   ->  Gather  (cost=3596993.46..3596993.87 rows=4 width=8) (actual
> > time=69979.137..69982.071 rows=5 loops=1)
> >         Workers Planned: 4
> >         Workers Launched: 4
> >         ->  Partial Aggregate  (cost=3595993.46..3595993.47 rows=1
> > width=8) (actual time=69975.136..69975.137 rows=1 loops=5)
> >               ->  Nested Loop  (cost=0.44..3591408.37 rows=1834034
> > width=0) (actual time=0.882..69875.934 rows=1458419 loops=5)
> >                     ->  Parallel Seq Scan on pitches p
> >  (cost=0.00..3537431.89 rows=1834217 width=12) (actual
> > time=0.852..68914.256 rows=1458419 loops=5)
> >                           Filter: ((game_date >= '2022-06-21'::date)
> > AND (game_date <= '2023-06-21'::date))
> >                           Rows Removed by Filter: 3212310
> >                     ->  Memoize  (cost=0.44..0.47 rows=1 width=4)
> > (actual time=0.000..0.000 rows=1 loops=7292095)
> >                           Cache Key: p.pitcher_identity_id
> >                           Cache Mode: logical
> >                           Hits: 1438004  Misses: 21042  Evictions: 0
> >  Overflows: 0  Memory Usage: 2138kB
> >                           Worker 0:  Hits: 1429638  Misses: 21010
> >  Evictions: 0  Overflows: 0  Memory Usage: 2134kB
> >                           Worker 1:  Hits: 1456755  Misses: 21435
> >  Evictions: 0  Overflows: 0  Memory Usage: 2177kB
> >                           Worker 2:  Hits: 1433557  Misses: 21201
> >  Evictions: 0  Overflows: 0  Memory Usage: 2154kB
> >                           Worker 3:  Hits: 1428727  Misses: 20726
> >  Evictions: 0  Overflows: 0  Memory Usage: 2105kB
> >                           ->  Index Only Scan using identity_pkey on
> > identity idpitcher  (cost=0.43..0.46 rows=1 width=4) (actual
> > time=0.007..0.007 rows=1 loops=105414)
> >                                 Index Cond: (identity_id =
> > p.pitcher_identity_id)
> >                                 Heap Fetches: 83
> > Planning Time: 1.407 ms
> > Execution Time: 69982.927 ms
> >
> > Is there something to be done here? Kind of a frequent style of query
> > and quite slow.
> >
> Could you provide the definition of the view(s) down to the base tables?
>
> --
> Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
>
>
>
> --
> Wells Oliver
> wells(dot)oliver(at)gmail(dot)com <mailto:wellsoliver(at)gmail(dot)com>

--
Born in Arizona, moved to Babylonia.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message B M 2023-06-21 19:54:43 large scale reliable software system using PostgreSQL
Previous Message Wells Oliver 2023-06-21 18:39:39 Re: Poor performance due to parallel seq scan on indexed date field