Re: Showing applied extended statistics in explain Part 2

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Tatsuro Yamada <yamatattsu(at)gmail(dot)com>, Masahiro(dot)Ikeda(at)nttdata(dot)com
Cc: tomas(dot)vondra(at)enterprisedb(dot)com, tatsuro(dot)yamada(at)ntt(dot)com, pgsql-hackers(at)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject: Re: Showing applied extended statistics in explain Part 2
Date: 2025-04-24 14:31:14
Message-ID: 80e1a20a-19e1-427a-87d7-a39d012ff843@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/10/25 16:56, Tomas Vondra wrote:
> On 2/10/25 10:09, Andrei Lepikhov wrote:
>> On 8/2/2025 20:50, Tomas Vondra wrote:
>> <StatisticsInfo LastUpdate="2024-09-09T21:55:04.43"
>> ModificationCount="0" SamplingPercent="17.9892"
>> Statistics="[_WA_Sys_00000001_0A55DF1D]" Table="[_InfoRgX]"
>> Schema="[dbo]" Database="[DB]"></StatisticsInfo>
>>
>> The main profit here - you see all the stats involved in estimations
>> (and their state), even if final plan doesn't contain estimated stuff at
>> all.
>
> OK, that seems very underwhelming. I still think we should show which
> clauses were estimated using which statistics object.
To understand how it may work, I employed the EXPLAIN extensibility
introduced in PG 18 to show the use of plain statistics [1]. It looks
like the following:

EXPLAIN (COSTS OFF, STAT ON)
SELECT * FROM sc_a WHERE x=1 AND y LIKE 'a';

Seq Scan on sc_a
Filter: ((y ~~ 'a'::text) AND (x = 1))
Statistics:
"sc_a.y: 1 times, stats: { MCV: 10 values, Correlation,
ndistinct: 10.0000, nullfrac: 0.0000, width: 5 }
"sc_a.x: 1 times, stats: { Histogram: 0 values, Correlation,
ndistinct: -1.0000, nullfrac: 0.0000, width: 4 }

As you can see, stat usage is summarised at the end of the EXPLAIN. It
contains information about the column, how many times it was used and
the parameters of statistic slots.
Of course, being an extension it is constrained a lot, but even there is
the profit:
1. You may see types of statistics exist on the column
2. Precision of the histogram or MCV (statistic_target) on a specific
table - some users forget to increase it on large (or partitioned) tables
3. You have basic stat like nullfrac, ndistinct without the necessity to
teach personnel how to gather it on a production instance safely.

Also, using it in real cases, I realised that it would be highly
profitable to specify which statistic type was used to estimate this
specific clause.

Of course, extended statistics have their own specifics, which may
require another output format. Just consider this example too.

[1] https://github.com/danolivo/pg_index_stats

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Davydov 2025-04-24 14:32:39 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Previous Message Japin Li 2025-04-24 14:20:55 Re: Disallow redundant indexes