Re: Add support for (Var op Var) clause in extended MCV statistics

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add support for (Var op Var) clause in extended MCV statistics
Date: 2024-08-19 16:04:40
Message-ID: 2c9a4643-d32e-4139-980b-c3bf3859dcc8@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.8.24 19:25, Tomas Vondra wrote:

> Is TPC-B really interesting/useful for this patch? The queries are super
> simple, with only a single clause (so it may not even get to the code
> handling extended statistics). Did you create any extended stats?

No, it's not the case. I simply wanted to verify that other queries are
not slowed down after applying my patch.
>
> I think you'll need to construct a custom test, with queries that have
> multiple (var op var) clauses, extended stats created, etc. And
> benchmark that.

I used the test generator from a previous thread [1] and ran it with
|default_statistics_target = 1000| to achieve more accurate estimates
for 3000 rows. It would also be beneficial to run tests with 10,000 and
100,000 rows for a broader perspective. I've attached the python test.
Here’s a breakdown of the issues:

1. (A op A) Clause: Before applying my patch, there were poor estimates
for expressions like |(A op A)|. Currently, we only have correct
estimates for the |(A = A)| clause, which transforms into |A IS NOT
NULL|. Should I address this in this thread? I believe we should
extend the same correction to clauses like |(A != A)|, |(A < A)|,
and similar conditions. However, this issue is not for current thread.
2. AND Clauses: The estimates for AND clauses were inaccurate before my
patch. I noticed code segments where I could add something specific
for the |(Var op Var)| clause, but I'm unsure if I'm missing
anything crucial. If my understanding is incorrect, I'd appreciate
any guidance or corrections.

> FWIW I don't think it makes sense to benchmark the query execution - if
> the estimate improves, it's possible to get arbitrary speedup, but
> that's expected and mostly mostly irrelevant I think.
>
> What I'd focus on is benchmarking just the query planning - we need the
> overhead to be negligible (or at least small) so that it does not hurt
> people with already good plans.
>
> BTW can you elaborate why you are interested in this patch? Do you just
> think it's interesting/useful, or do you have a workload where it would
> actually help? I'm asking because me being uncertain how beneficial this
> is in practice (not just being nice in theory) was one of the reasons
> why I didn't do more work on this in 2021.
>
>
> regards
>

I have two reasons for pursuing this. Firstly, I've encountered some of
these queries in practice, although they are quite rare. While it might
be easy to dismiss these cases due to their infrequency, I believe that
we shouldn't overlook the opportunity to develop better handling for
them, regardless of how seldom they occur.

Secondly, I see that you're working on improving estimates for JOIN
clauses in thread [2]. I believe that enhancing estimates for these rare
cases could also benefit future work on JOIN queries, particularly those
with multiple |ON (T1.column = T2.column)| conditions, which are
essentially |(Var op Var)| clauses. My idea is to start with non-JOIN
queries, and then apply the same approach to improve JOIN estimates. Of
course, I might be wrong, but I think this approach has potential.

P.S. If I sent this mail twice I'm sorry. I wanted to sent results of
the test, and it was not sent to hackers because of big size of attached
file. Now I sent only test.

[1]:
https://www.postgresql.org/message-id/ecc0b08a-518d-7ad6-17ed-a5e962fc4f5f%40enterprisedb.com

[2]:
https://www.postgresql.org/message-id/flat/c8c0ff31-3a8a-7562-bbd3-78b2ec65f16c%40enterprisedb.com

--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Attachment Content-Type Size
test_generator.py text/x-python 4.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-19 16:10:17 Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
Previous Message Tom Lane 2024-08-19 16:02:33 Re: Report search_path value back to the client.