Re: Add support for (Var op Var) clause in extended MCV statistics

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add support for (Var op Var) clause in extended MCV statistics
Date: 2024-08-12 16:25:34
Message-ID: a8e84732-9cb8-43a5-8386-97f637e6cb18@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/12/24 17:57, Ilia Evdokimov wrote:
> On 12.8.24 14:53, Tomas Vondra wrote:
>
>> I agree, and I'm grateful someone picked up the original patch. I'll try
>> to help to keep it moving forward. If the thread gets stuck, feel free
>> to ping me to take a look.
> Good. Thank you!
>>> I started reviewing it and want to suggest some changes to better code:
>>> I think we should consider the case where the expression is not neither
>>> an OpExpr and VarOpVar expression.
>>>
>> Do you have some specific type of clauses in mind? Most of the extended
>> statistics only really handles this type of clauses, so I'm not sure
>> it's feasible to extend that - at least not in this patch.
>
> I agree with Alena that we need to consider the following clauses: (Expr
> op Var), (Var op Expr) and (Expr op Expr). And we need to return false
> in these cases because we did it before my patch in
>
>         /* Check if the expression has the right shape */
>         if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
>             return false;
>
> In is_opclause_var_op_var() function it is really useless local Node
> *expr_left, *expr_right variables. However, we can't assign them NULL at
> the begin because if I passed not-null pointers I have to return the
> values. Otherwise remain them NULL.
>
> Nevertheless, thank you for review, Alena.
>

Ah, right. I agree we should handle clauses with expressions.

I don't recall why I wrote is_opclause_var_op_var() like this, but I
believe this was before we allowed extended statistics on expressions
(which was added in 2021, the patch is from 2020). I don't see why it
could not return expressions, but I haven't tried.

>>> Have you tested this code with any benchmarks?
>>>
>> FWIW I think we need to test two things - that it (a) improves the
>> estimates and (b) does not have significant overhead.
> Yes, but only TPC-B. And the performance did not drop. In general, it'd
> be better to do more tests and those listed by Tomas with new attached
> patch.

Is TPC-B really interesting/useful for this patch? The queries are super
simple, with only a single clause (so it may not even get to the code
handling extended statistics). Did you create any extended stats?

I think you'll need to construct a custom test, with queries that have
multiple (var op var) clauses, extended stats created, etc. And
benchmark that.

FWIW I don't think it makes sense to benchmark the query execution - if
the estimate improves, it's possible to get arbitrary speedup, but
that's expected and mostly mostly irrelevant I think.

What I'd focus on is benchmarking just the query planning - we need the
overhead to be negligible (or at least small) so that it does not hurt
people with already good plans.

BTW can you elaborate why you are interested in this patch? Do you just
think it's interesting/useful, or do you have a workload where it would
actually help? I'm asking because me being uncertain how beneficial this
is in practice (not just being nice in theory) was one of the reasons
why I didn't do more work on this in 2021.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-08-12 16:28:29 Re: [patch] Imporve pqmq
Previous Message Ilia Evdokimov 2024-08-12 15:57:19 Re: Add support for (Var op Var) clause in extended MCV statistics