From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | David Geier <geidav(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes |
Date: | 2022-11-18 00:27:45 |
Message-ID: | 2f30ba9c-b707-55ae-0afd-68eb6a8a7708@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/14/22 10:19, David Geier wrote:
> Hi Tom,
>> There won't *be* any MCV stats for a column that ANALYZE perceives to
>> be unique, so I'm not quite sure where the claimed savings comes from.
>
> We save if one join attribute is unique while the other isn't. In that
> case stored MCV stats are read for the non-unique attribute but then
> never used. This is because MCV stats in join selectivity estimation are
> only used if they're present on both columns
>
Right - if we only have MCV on one side of the join, we currently end up
loading the MCV we have only to not use it anyway. The uniqueness is a
simple way to detect some of those cases. I'd bet the savings can be
quite significant for small joins and/or cases with large MCV.
I wonder if we might be yet a bit smarter, though.
For example, assume the first attribute is not defined as "unique" but
we still don't have a MCV (it may be unique - or close to unique - in
practice, or maybe it's just uniform distribution). We end up with
have_mcvs1 = false
Can't we just skip trying to load the second MCV? So we could do
if (have_mcvs1 && HeapTupleIsValid(vardata2.statsTuple))
{ ... try loading mcv2 ... }
Or perhaps what if we have a function that quickly determines if the
attribute has MCV, without loading it? I'd bet the expensive part of
get_attstatslot() is the deconstruct_array().
We could have a function that only does the first small loop over slots,
and returns true/false if we have a slot of the requested stakind. It
might even check the isunique flag first, to make it more convenient.
And only if both sides return "true" we'd load the MCV, deconstruct the
array and all that.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2022-11-18 00:35:44 | Re: Perform streaming logical transactions by background workers and parallel apply |
Previous Message | Jeff Davis | 2022-11-18 00:24:24 | Re: allowing for control over SET ROLE |