Re: Showing applied extended statistics in explain Part 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tatsuro Yamada <yamatattsu(at)gmail(dot)com>
Cc: Tatsuro Yamada <tatsuro(dot)yamada(at)ntt(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Showing applied extended statistics in explain Part 2
Date: 2024-07-15 14:03:34
Message-ID: f74b4577-f80d-48a0-aff0-92c76c9bbe34@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/26/24 11:06, Tatsuro Yamada wrote:
> Hi Tomas!
>
> Thanks for the comments!
>
> 1) The patch is not added to the CF app, which I think is a mistake. Can
>> you please add it to the 2024-07 commitfest? Otherwise people may not be
>> aware of it, won't do reviews etc. It'll require posting a rebased
>> patch, but should not be a big deal.
>>
>
> I added the patch to the 2024-07 commitfest today.
>
>
> 2) Not having the patch in a CF also means cfbot is not running tests on
>> it. Which is unfortunate, because the patch actually has an a bug cfbot
>> would find - I've noticed it after running the tests through the github
>> CI, see [2].
>> 3) The bug has this symptom:
>> ERROR: unrecognized node type: 268
>> CONTEXT: PL/pgSQL function check_estimated_rows(text) line 7 ...
>> STATEMENT: SELECT * FROM check_estimated_rows('SELECT a, b FROM ...
>> 4) I can think of two basic ways to fix this issue - either allow
>> copying of the StatisticExtInto node, or represent the information in a
>> different way (e.g. add a new node for that purpose, or use existing
>> nodes to do that).
>>
>
> Thanks for the info. I'll investigate using cfbot.
> To fix the problem, I understand we need to create a new struct like
> (statistics OID, list of clauses, flag is_or).
>

Yes, something like that, in the plannodes.h.

>
> 5) In [3] Tom raised two possible issues with doing this - cost of
>> copying the information, and locking. For the performance concerns, I
>> think the first thing we should do is measuring how expensive it is. I
>> suggest measuring the overhead for about three basic cases:
>>
>
> Okay, I'll measure it once the patch is completed and check the overhead.
> I read [3][4] and in my opinion I agree with Robert.
> As with indexes, there should be a mechanism for determining whether
> extended statistics are used or not. If it were available, users would be
> able to
> tune using extended statistics and get better execution plans.
>

I do agree with that, but I also understand Tom's concerns about the
costs. His concern is that to make this work, we have to keep/copy the
information for all queries, even if that user never does explain.

Yes, we do the same thing (copy of some pieces) for indexes, and from
this point of view it's equally reasonable. But there's the difference
that for indexes it's always been done this way, hence it's considered
"the baseline", while for extended stats we've not copied the data until
this patch, so it'd be seen as a regression.

I think there are two ways to deal with this - ideally, we'd show that
the overhead is negligible (~noise). And if it's measurable, we'd need
to argue that it's worth it - but that's much harder, IMHO.

So I'd suggest you try to measure the overhead on a couple cases (simple
query with 0 or more statistics applied).

>
>
>> 6) I'm not sure we want to have this under EXPLAIN (VERBOSE). It's what
>> I did in the initial PoC patch, but maybe we should invent a new flag
>> for this purpose, otherwise VERBOSE will cover too much stuff? I'm
>> thinking about "STATS" for example.
>>
>> This would probably mean the patch should also add a new auto_explain
>> "log_" flag to enable/disable this.
>>
>
> I thought it might be better to do this, so I'll fix it.
>

OK

>
>
>> 7) The patch really needs some docs - I'd mention this in the EXPLAIN
>> docs, probably. There's also a chapter about estimates, maybe that
>> should mention this too? Try searching for places in the SGML docs
>> mentioning extended stats and/or explain, I guess.
>>
>
> I plan to create documentation after the specifications are finalized.
>

I'm, not sure that's a good approach. Maybe it doesn't need to be
mentioned in the section explaining how estimates work, but it'd be good
to have it at least in the EXPLAIN command docs. The thing is - docs are
a nice way for reviewers to learn about how the feature is expected to
work / be used. Yes, it may need to be adjusted if the patch changes,
but it's likely much easier than changing the code.

>
>
>> For tests, I guess stats_ext is the right place to test this. I'm not
>> sure what's the best way to do this, though. If it's covered by VERBOSE,
>> that seems it might be unstable - and that would be an issue. But maybe
>> we might add a function similar to check_estimated_rows(), but verifying
>> the query used the expected statistics to estimate expected clauses.
>>
>
> As for testing, I think it's more convenient for reviewers to include it in
> the patch,
> so I'm thinking of including it in the next patch.
>

I'm not sure I understand what you mean - what is more convenient to
include in the patch & you plan to include in the next patch version?

My opinion is that there clearly need to be some regression tests, be it
in stats_ext.sql or in some other script. But to make it easier, we
might have a function similar to check_estimated_rows() which would
extract just the interesting part of the plan.

>
> So there's stuff to do to make this committable, but hopefully this
>> review gives you some guidance regarding what/how ;-)
>>
>
> Thank you! It helps me a lot!
>
> The attached patch does not correspond to the above comment.
> But it does solve some of the issues mentioned in previous threads.
>
> The next patch is planned to include:
> 6) Add stats option to explain command
> 8) Add regression test (stats_ext.sql)
> 4) Add new node (resolve errors in cfbot and prepared statement)
>

Sounds good.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-07-15 14:15:36 Re: Showing applied extended statistics in explain Part 2
Previous Message Matthias van de Meent 2024-07-15 12:12:54 Re: Make tuple deformation faster