Re: pgsql: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Extended statistics on expressions
Date: 2021-03-31 22:07:43
Message-ID: c61cfe32-251f-e9cb-1be1-0e15218cda17@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On 3/31/21 7:54 PM, Tomas Vondra wrote:
> On 3/31/21 7:08 PM, Alvaro Herrera wrote:
>> On 2021-Mar-31, Tomas Vondra wrote:
>>
>>> Thanks for the report, I'll take a look. You're right this seems like an
>>> out-of-bounds access, but mcv_match_expression is only expected to be
>>> run on expressions we know are in the statistics (because we pick the
>>> statistics like that). Clearly, that does not happen here, not sure why.
>>>
>>> It's quite weird that we end up running textlike(), when the statistics
>>> is on (double precision, boolean) columns ...
>>
>> Uninitialized values somewhere? Maybe valgrind would help.
>>
>
> Unlikely, I've ran it through valgrind repeatedly, including right
> before commit (both on x86_64 and arm).
>
> FWIW I'm unable to reproduce it, so not sure what's going on. David,
> what configure option are you using? Anything special?
>
>
> It's a bit strange, because statext_mcv_clauselist_selectivity should
> only estimate "matching" clauses on the statistics. So how come this
> estimates such a complex expression using textlike(), when neither of
> those columns is text?
>
> It'd be interesting to know what's happening in the code after
>
> stat = choose_best_statistics(...);
>
> i.e. what clauses it considers "compatible" with the statistics and why.
> In fact, I wouldn't have expected the statistics to be used at all.
>

OK, I managed to reproduce/trigger the issue. The simplest query that
triggers the issue for me is this:

SELECT t1.c0 FROM ONLY t1 WHERE
(
upper('x') LIKE ('x'||('[0,1]'::int4range))
AND
(t1.c0 IN (0, 1) OR t1.c1)
)

I think the code matching clauses to the statistics gets a bit confused
when processing the AND clause. It extracts 2 attnums for the OR part,
but the first part should be "incompatible" with the statistics. But
after picking the statistics to apply, it gets confused and includes the
first expression (the whole LIKE clause) as compatible too.

The attached patch fixes this for me. David, can you check if this
resolves the issue for you?

I don't feel like I want to push a fix at midnight, and I'd like to
think about maybe making this part of the code a bit clearer tomorrow.
It's not very comprehensible, I'm afraid.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
ext-stats-rowley-fix.patch text/x-patch 1.2 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2021-03-31 22:22:07 pgsql: Initialize conn->Pfdebug to NULL when creating a connection
Previous Message Alvaro Herrera 2021-03-31 21:46:36 pgsql: Disable force_parallel_mode in libpq_pipeline