| From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
|---|---|
| To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
| Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: PoC/WIP: Extended statistics on expressions |
| Date: | 2020-12-11 20:17:40 |
| Message-ID: | 958870c8-65e0-31b1-4591-b0b10e807dd9@enterprisedb.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 12/11/20 1:58 PM, Dean Rasheed wrote:
> On Tue, 8 Dec 2020 at 12:44, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> Possibly. But I don't think it's worth the extra complexity. I don't
>> expect people to have a lot of overlapping stats, so the amount of
>> wasted space and CPU time is expected to be fairly limited.
>>
>> So I don't think it's worth spending too much time on this now. Let's
>> just do what you proposed, and revisit this later if needed.
>>
>
> Yes, I think that's a reasonable approach to take. As long as the
> documentation makes it clear that building MCV stats also causes
> standard expression stats to be built on any expressions included in
> the list, then the user will know and can avoid duplication most of
> the time. I don't think there's any need for code to try to prevent
> that -- just as we don't bother with code to prevent a user building
> multiple indexes on the same column.
>
> The only case where duplication won't be avoidable is where there are
> multiple MCV stats sharing the same expression, but that's probably
> quite unlikely in practice, and it seems acceptable to leave improving
> that as a possible future optimisation.
>
OK. Attached is an updated version, reworking it this way.
I tried tweaking the grammar to differentiate these two syntax variants,
but that led to shift/reduce conflicts with the existing ones. I tried
fixing that, but I ended up doing that in CreateStatistics().
The other thing is that we probably can't tie this to just MCV, because
functional dependencies need the per-expression stats too. So I simply
build expression stats whenever there's at least one expression.
I also decided to keep the "expressions" statistics kind - it's not
allowed to specify it in CREATE STATISTICS, but it's useful internally
as it allows deciding whether to build the stats in a single place.
Otherwise we'd need to do that every time we build the statistics, etc.
I added a brief explanation to the sgml docs, not sure if that's good
enough - maybe it needs more details.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-bootstrap-convert-Typ-to-a-List-20201211.patch | text/x-patch | 3.7 KB |
| 0002-Allow-composite-types-in-bootstrap-20201211.patch | text/x-patch | 1.4 KB |
| 0003-Extended-statistics-on-expressions-20201211.patch | text/x-patch | 234.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2020-12-11 20:27:03 | Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly |
| Previous Message | Fabien COELHO | 2020-12-11 19:45:12 | Re: PG vs LLVM 12 on seawasp, next round |