Re: Showing applied extended statistics in explain Part 2

From: Tatsuro Yamada <yamatattsu(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tatsuro Yamada <tatsuro(dot)yamada(at)ntt(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Showing applied extended statistics in explain Part 2
Date: 2024-06-26 09:06:43
Message-ID: CAOKkKFtaevmhnNxKenXrv+7KRYUOu=oVSjreeuP5Smg5FzqGOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas!

Thanks for the comments!

1) The patch is not added to the CF app, which I think is a mistake. Can
> you please add it to the 2024-07 commitfest? Otherwise people may not be
> aware of it, won't do reviews etc. It'll require posting a rebased
> patch, but should not be a big deal.
>

I added the patch to the 2024-07 commitfest today.

2) Not having the patch in a CF also means cfbot is not running tests on
> it. Which is unfortunate, because the patch actually has an a bug cfbot
> would find - I've noticed it after running the tests through the github
> CI, see [2].
> 3) The bug has this symptom:
> ERROR: unrecognized node type: 268
> CONTEXT: PL/pgSQL function check_estimated_rows(text) line 7 ...
> STATEMENT: SELECT * FROM check_estimated_rows('SELECT a, b FROM ...
> 4) I can think of two basic ways to fix this issue - either allow
> copying of the StatisticExtInto node, or represent the information in a
> different way (e.g. add a new node for that purpose, or use existing
> nodes to do that).
>

Thanks for the info. I'll investigate using cfbot.
To fix the problem, I understand we need to create a new struct like
(statistics OID, list of clauses, flag is_or).

5) In [3] Tom raised two possible issues with doing this - cost of
> copying the information, and locking. For the performance concerns, I
> think the first thing we should do is measuring how expensive it is. I
> suggest measuring the overhead for about three basic cases:
>

Okay, I'll measure it once the patch is completed and check the overhead.
I read [3][4] and in my opinion I agree with Robert.
As with indexes, there should be a mechanism for determining whether
extended statistics are used or not. If it were available, users would be
able to
tune using extended statistics and get better execution plans.

> 6) I'm not sure we want to have this under EXPLAIN (VERBOSE). It's what
> I did in the initial PoC patch, but maybe we should invent a new flag
> for this purpose, otherwise VERBOSE will cover too much stuff? I'm
> thinking about "STATS" for example.
>
> This would probably mean the patch should also add a new auto_explain
> "log_" flag to enable/disable this.
>

I thought it might be better to do this, so I'll fix it.

> 7) The patch really needs some docs - I'd mention this in the EXPLAIN
> docs, probably. There's also a chapter about estimates, maybe that
> should mention this too? Try searching for places in the SGML docs
> mentioning extended stats and/or explain, I guess.
>

I plan to create documentation after the specifications are finalized.

> For tests, I guess stats_ext is the right place to test this. I'm not
> sure what's the best way to do this, though. If it's covered by VERBOSE,
> that seems it might be unstable - and that would be an issue. But maybe
> we might add a function similar to check_estimated_rows(), but verifying
> the query used the expected statistics to estimate expected clauses.
>

As for testing, I think it's more convenient for reviewers to include it in
the patch,
so I'm thinking of including it in the next patch.

So there's stuff to do to make this committable, but hopefully this
> review gives you some guidance regarding what/how ;-)
>

Thank you! It helps me a lot!

The attached patch does not correspond to the above comment.
But it does solve some of the issues mentioned in previous threads.

The next patch is planned to include:
6) Add stats option to explain command
8) Add regression test (stats_ext.sql)
4) Add new node (resolve errors in cfbot and prepared statement)

Regards,
Tatsuro Yamada

> [1]
>
> https://www.postgresql.org/message-id/TYYPR01MB82310B308BA8770838F681619E5E2%40TYYPR01MB8231.jpnprd01.prod.outlook.com
>
> [2] https://cirrus-ci.com/build/6436352672137216
>
> [3]
> https://www.postgresql.org/message-id/459863.1627419001%40sss.pgh.pa.us
>
> [4]
>
> https://www.postgresql.org/message-id/CA%2BTgmoZU34zo4%3DhyqgLH16iGpHQ6%2BQAesp7k5a1cfZB%3D%2B9xtsw%40mail.gmail.com
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

Attachment Content-Type Size
0001-show-stats-in-explain-rebased-on-15c9ac36.patch application/octet-stream 15.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-06-26 09:10:39 Re: Logical Replication of sequences
Previous Message Amit Kapila 2024-06-26 09:02:48 Re: Conflict Detection and Resolution