From: | Ron <ronljohnsonjr(at)gmail(dot)com> |
---|---|
To: | pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Re: Poor performance due to parallel seq scan on indexed date field |
Date: | 2023-06-21 19:14:46 |
Message-ID: | 208a4de5-44a3-2ea6-5472-02887ec9a118@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Is there an index on synergy.pitches.game_date?
It's no guarantee* that will speed up the query, but "index on WHERE clause
fields" is the first thing to look for.
* The problem with "over wide date ranges" is that the query optimizer might
decide that the date range is so wide that it's just as cheap -- or cheaper
-- to scan the whole table.
On 6/21/23 13:39, Wells Oliver wrote:
> It's just this.
>
> CREATE OR REPLACE VIEW vw_pitches AS
> SELECT
> p.year,
> p.game_id,
> p.game_date,
> p.game_level,
> ...
> from synergy.pitches as p
> join alias.identity as idpitcher
> on p.pitcher_identity_id = idpitcher.identity_id
> left join alias.identity as idcatcher
> on p.catcher_identity_id = idcatcher.identity_id
> left join alias.identity as idbatter
> on p.batter_identity_id = idbatter.identity_id;
>
> The alias.identity.identity_id column is indexed.
>
> The main issue is SELECT COUNT(*) over wide date ranges, which is
> something we'd like to do frequently.
>
>
> On Wed, Jun 21, 2023 at 11:37 AM Holger Jakobs <holger(at)jakobs(dot)com> wrote:
>
>
> Am 21.06.23 um 20:31 schrieb Wells Oliver:
> > Dead simple date scan across a big-ish table (est. 23,153,666 rows)
> >
> > explain analyze select count(*) from vw_pitches where game_date >=
> > '2022-06-21' and game_date <= '2023-06-21';
> >
> > The view does do some joins but those don't seem to be the issue to me.
> >
> > Planner does:
> >
> > Finalize Aggregate (cost=3596993.88..3596993.89 rows=1 width=8)
> > (actual time=69980.491..69982.076 rows=1 loops=1)
> > -> Gather (cost=3596993.46..3596993.87 rows=4 width=8) (actual
> > time=69979.137..69982.071 rows=5 loops=1)
> > Workers Planned: 4
> > Workers Launched: 4
> > -> Partial Aggregate (cost=3595993.46..3595993.47 rows=1
> > width=8) (actual time=69975.136..69975.137 rows=1 loops=5)
> > -> Nested Loop (cost=0.44..3591408.37 rows=1834034
> > width=0) (actual time=0.882..69875.934 rows=1458419 loops=5)
> > -> Parallel Seq Scan on pitches p
> > (cost=0.00..3537431.89 rows=1834217 width=12) (actual
> > time=0.852..68914.256 rows=1458419 loops=5)
> > Filter: ((game_date >= '2022-06-21'::date)
> > AND (game_date <= '2023-06-21'::date))
> > Rows Removed by Filter: 3212310
> > -> Memoize (cost=0.44..0.47 rows=1 width=4)
> > (actual time=0.000..0.000 rows=1 loops=7292095)
> > Cache Key: p.pitcher_identity_id
> > Cache Mode: logical
> > Hits: 1438004 Misses: 21042 Evictions: 0
> > Overflows: 0 Memory Usage: 2138kB
> > Worker 0: Hits: 1429638 Misses: 21010
> > Evictions: 0 Overflows: 0 Memory Usage: 2134kB
> > Worker 1: Hits: 1456755 Misses: 21435
> > Evictions: 0 Overflows: 0 Memory Usage: 2177kB
> > Worker 2: Hits: 1433557 Misses: 21201
> > Evictions: 0 Overflows: 0 Memory Usage: 2154kB
> > Worker 3: Hits: 1428727 Misses: 20726
> > Evictions: 0 Overflows: 0 Memory Usage: 2105kB
> > -> Index Only Scan using identity_pkey on
> > identity idpitcher (cost=0.43..0.46 rows=1 width=4) (actual
> > time=0.007..0.007 rows=1 loops=105414)
> > Index Cond: (identity_id =
> > p.pitcher_identity_id)
> > Heap Fetches: 83
> > Planning Time: 1.407 ms
> > Execution Time: 69982.927 ms
> >
> > Is there something to be done here? Kind of a frequent style of query
> > and quite slow.
> >
> Could you provide the definition of the view(s) down to the base tables?
>
> --
> Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
>
>
>
> --
> Wells Oliver
> wells(dot)oliver(at)gmail(dot)com <mailto:wellsoliver(at)gmail(dot)com>
--
Born in Arizona, moved to Babylonia.
From | Date | Subject | |
---|---|---|---|
Next Message | B M | 2023-06-21 19:54:43 | large scale reliable software system using PostgreSQL |
Previous Message | Wells Oliver | 2023-06-21 18:39:39 | Re: Poor performance due to parallel seq scan on indexed date field |