From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me>, Tatsuro Yamada <yamatattsu(at)gmail(dot)com>, Masahiro(dot)Ikeda(at)nttdata(dot)com |
Cc: | tomas(dot)vondra(at)enterprisedb(dot)com, tatsuro(dot)yamada(at)ntt(dot)com, pgsql-hackers(at)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com |
Subject: | Re: Showing applied extended statistics in explain Part 2 |
Date: | 2025-04-24 14:31:14 |
Message-ID: | 80e1a20a-19e1-427a-87d7-a39d012ff843@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/10/25 16:56, Tomas Vondra wrote:
> On 2/10/25 10:09, Andrei Lepikhov wrote:
>> On 8/2/2025 20:50, Tomas Vondra wrote:
>> <StatisticsInfo LastUpdate="2024-09-09T21:55:04.43"
>> ModificationCount="0" SamplingPercent="17.9892"
>> Statistics="[_WA_Sys_00000001_0A55DF1D]" Table="[_InfoRgX]"
>> Schema="[dbo]" Database="[DB]"></StatisticsInfo>
>>
>> The main profit here - you see all the stats involved in estimations
>> (and their state), even if final plan doesn't contain estimated stuff at
>> all.
>
> OK, that seems very underwhelming. I still think we should show which
> clauses were estimated using which statistics object.
To understand how it may work, I employed the EXPLAIN extensibility
introduced in PG 18 to show the use of plain statistics [1]. It looks
like the following:
EXPLAIN (COSTS OFF, STAT ON)
SELECT * FROM sc_a WHERE x=1 AND y LIKE 'a';
Seq Scan on sc_a
Filter: ((y ~~ 'a'::text) AND (x = 1))
Statistics:
"sc_a.y: 1 times, stats: { MCV: 10 values, Correlation,
ndistinct: 10.0000, nullfrac: 0.0000, width: 5 }
"sc_a.x: 1 times, stats: { Histogram: 0 values, Correlation,
ndistinct: -1.0000, nullfrac: 0.0000, width: 4 }
As you can see, stat usage is summarised at the end of the EXPLAIN. It
contains information about the column, how many times it was used and
the parameters of statistic slots.
Of course, being an extension it is constrained a lot, but even there is
the profit:
1. You may see types of statistics exist on the column
2. Precision of the histogram or MCV (statistic_target) on a specific
table - some users forget to increase it on large (or partitioned) tables
3. You have basic stat like nullfrac, ndistinct without the necessity to
teach personnel how to gather it on a production instance safely.
Also, using it in real cases, I realised that it would be highly
profitable to specify which statistic type was used to estimate this
specific clause.
Of course, extended statistics have their own specifics, which may
require another output format. Just consider this example too.
[1] https://github.com/danolivo/pg_index_stats
--
regards, Andrei Lepikhov
From | Date | Subject | |
---|---|---|---|
Next Message | Vitaly Davydov | 2025-04-24 14:32:39 | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Previous Message | Japin Li | 2025-04-24 14:20:55 | Re: Disallow redundant indexes |