Re: Analyze results in more expensive query plan

From: Jeremy Altavilla <jeremyaltavilla(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Analyze results in more expensive query plan
Date: 2019-05-20 20:19:48
Message-ID: CAG3Nut8Xaak-qy3cLT0WGDRBtbNPgGxVEM_GN9vCmNQiAUNu+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the suggestion. I created extended statistics objects for the
two tables in question. Unfortunately the resulting plan was the same (and
had the same estimates). It looks like the extended stats discovered a
potentially useful correlation on bag: "2, 3 => 1" (owner_id, bag_type_id
=> id). I'm guessing this wasn't usable because the docs state "They are
not used to improve estimates for equality conditions comparing two
columns".

I created functional dependency extended stats (none of our queries use
group by), and ran analyze. The resulting objects are below. For
correlations of 1, the results seemed logically correct (I'm not sure how
to interpret the .966 values). The limitations section said that extended
stats are only applied for simple equality conditions, so I modified the
query to use equality instead of any. That still resulted in the same plan
and estimate. Just to be thorough, I tried with all permutations of zero,
one or both stats objects. In all cases the resulting plan and estimates
didn't change from the slow hash join.

create statistics bag_type_stats (dependencies) on id, name, game from
bag_type;
analyze bag_type;
create statistics bag_stats (dependencies) on id, owner_id, bag_type_id
from bag;
analyze bag;

select * from pg_statistic_ext;
-[ RECORD 1
]---+------------------------------------------------------------------------------------------------------------------------------------------------------
stxrelid | 16411
stxname | bag_stats
stxnamespace | 2200
stxowner | 10
stxkeys | 1 2 3
stxkind | {f}
stxndistinct |
stxdependencies | {"1 => 2": 1.000000, "1 => 3": 1.000000, "2 => 1":
0.966567, "2 => 3": 0.966567, "1, 2 => 3": 1.000000, "1, 3 => 2": 1.000000,
"2, 3 => 1": 1.000000}
-[ RECORD 2
]---+------------------------------------------------------------------------------------------------------------------------------------------------------
stxrelid | 16398
stxname | bag_type_stats
stxnamespace | 2200
stxowner | 10
stxkeys | 1 2 4
stxkind | {f}
stxndistinct |
stxdependencies | {"1 => 2": 1.000000, "1 => 4": 1.000000, "2 => 1":
1.000000, "2 => 4": 1.000000, "1, 2 => 4": 1.000000, "1, 4 => 2": 1.000000,
"2, 4 => 1": 1.000000}

For bag keys 1, 2, 3 are id, owner_id and bag_type_id. For bag_type 1, 2, 4
are id, name and game.

--Thanks
--Jeremy

On Fri, May 17, 2019 at 9:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Jeremy Altavilla <jeremyaltavilla(at)gmail(dot)com> writes:
> > We have several select statements whose performance is greatly improved
by
> > deleting some stats from pg_statistic.
>
> You might have better results by setting up some "extended stats" for
> the combination of bag_type columns that this query depends on. Per your
> description, there's a fair amount of cross-column correlation, which
> the planner will not expect without some extended stats to tell it so.
>
>
https://www.postgresql.org/docs/10/planner-stats.html#PLANNER-STATS-EXTENDED
>
> regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2019-05-20 20:55:53 Re: Trying to handle db corruption 9.6
Previous Message Mariel Cherkassky 2019-05-20 17:20:33 Re: Trying to handle db corruption 9.6