From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Tatsuro Yamada <yamatattsu(at)gmail(dot)com>, Masahiro(dot)Ikeda(at)nttdata(dot)com |
Cc: | tomas(dot)vondra(at)enterprisedb(dot)com, tatsuro(dot)yamada(at)ntt(dot)com, pgsql-hackers(at)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com |
Subject: | Re: Showing applied extended statistics in explain Part 2 |
Date: | 2025-02-08 13:50:57 |
Message-ID: | 8dd61fdb-3bbd-4859-93c7-7ba9ea1e1995@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 1/24/25 11:17, Andrei Lepikhov wrote:
> On 11/1/24 12:22, Tatsuro Yamada wrote:
>> Hi All,
>>
>> I apologize for not being able to continue development due to various
>> circumstances.
>> The attached file is the rebased patch.
>> I will now catch up on the discussion and try to revise the patch.
> I wonder why it’s so important to know exactly where and who has used
> extended statistics.
>
> I often use SQL Server to compare execution plans generated by
> PostgreSQL, and I appreciate how they display the usage of extended
> statistics. They clearly identify which statistics were utilized during
> query planning and include them in the summary section of query plan.
>
I admit not knowing what exactly SQL Server shows in the explain, but it
seems helpful to know which part of the plan used which statistics, no?
Imagine we only knew an index was used, but not which node used it and
for what keys. That would be a bit ... useless.
Or what info does the SQL server include in the plan, exactly? Can you
share an example?
> I find this method much easier to implement, as it allows us to see any
> usage points - remember that `estimate_num_groups` may be applied in
> multiple places and may not always correspond to a node clause.
>
I may be wrong, but I don't quite see why would that be significantly
easier to implement. You still need to track the information somewhere,
and it probably needs for individual Path nodes. Because while building
the plan you don't know which paths will get selected.
Although, maybe the selectivity estimation is sufficiently independent
from the exact paths? In which case we might track it at the plan level.
Still, I don't think that really makes this much easier to implement.
The code would only move a little bit to a different place, but other
than that it would remain the same.
> It’s worth noting that some clauses may be transformed during the
> planning process, and the scan filter may not align with the estimated
> clause. It’s possible that certain clauses might not appear in the final
> estimated plan based on the extended statistics.
>
> If necessary, we could add an `extstat_ID` to the summary to reference
> it in the plan nodes.
>
Not sure, but I'd prefer not to add "indirection" the people would have
to follow in the explain plan. I'd much rather duplicate the same info
(which should be rare anyway, we usually don't have the same statistics
used in multiple places in one query).
The point about estimate_num_groups is good - I think there will be more
cases where linking the extended statistics to a clause will be hard.
But maybe let's not block the whole patch because of that?
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema-Nio | 2025-02-08 13:54:14 | Re: RFC: Allow EXPLAIN to Output Page Fault Information |
Previous Message | Michail Nikolaev | 2025-02-08 13:46:00 | Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM? |