Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
Date: 2023-06-28 21:53:32
Message-ID: 63baadec-933b-53f8-5653-99f490dce09e@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/26/23 20:15, Alena Rybakina wrote:
> Hi, all!
>
> On 24.06.2023 14:23, Tomas Vondra wrote:
>> On 6/24/23 02:08, Tom Lane wrote:
>>> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>>>> The problem is that the selectivity for "IS NULL" is estimated using the
>>>> table-level statistics. But the LEFT JOIN entirely breaks the idea that
>>>> the null_frac has anything to do with NULLs in the join result.
>>> Right.
>>>
>>>> I wonder how to improve this, say by adjusting the IS NULL selectivity
>>>> when we know to operate on the outer side of the join. We're able to
>>>> do this for antijoins, so maybe we could do that here, somehow?
>>> This mess is part of the long-term plan around the work I've been doing
>>> on outer-join-aware Vars. We now have infrastructure that can let
>>> the estimator routines see "oh, this Var isn't directly from a scan
>>> of its table, it's been passed through a potentially-nulling outer
>>> join --- and I can see which one". I don't have more than vague ideas
>>> about what happens next, but that is clearly an essential step on the
>>> road to doing better.
>>>
>> I was wondering if that work on outer-join-aware Vars could help with
>> this, but I wasn't following it very closely. I agree the ability to
>> check if the Var could be NULL due to an outer join seems useful, as it
>> says whether applying raw attribute statistics makes sense or not.
>>
>> I was thinking about what to do for the case when that's not possible,
>> i.e. when the Var refers to nullable side of the join. Knowing that this
>> is happening is clearly not enough - we need to know how many new NULLs
>> are "injected" into the join result, and "communicate" that to the
>> estimation routines.
>>
>> Attached is a very ugly experimental patch doing that, and with it the
>> estimate changes to this:
>>
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Hash Left Join (cost=3.25..18179.88 rows=999900 width=16)
>> (actual time=0.528..596.151 rows=999900 loops=1)
>> Hash Cond: (large.id = small.id)
>> Filter: ((small.id IS NULL) OR
>> (large.a = ANY ('{1000,2000,3000,4000,5000}'::integer[])))
>> Rows Removed by Filter: 100
>> -> Seq Scan on large (cost=0.00..14425.00 rows=1000000 width=8)
>> (actual time=0.069..176.138 rows=1000000 loops=1)
>> -> Hash (cost=2.00..2.00 rows=100 width=8)
>> (actual time=0.371..0.373 rows=100 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 12kB
>> -> Seq Scan on small (cost=0.00..2.00 rows=100 width=8)
>> (actual time=0.032..0.146 rows=100 loops=1)
>> Planning Time: 3.845 ms
>> Execution Time: 712.405 ms
>> (10 rows)
>>
>> Seems nice, but. The patch is pretty ugly, I don't claim it works for
>> other queries or that this is exactly what we should do. It calculates
>> "unmatched frequency" next to eqjoinsel_inner, stashes that info into
>> sjinfo and the estimator (nulltestsel) then uses that to adjust the
>> nullfrac it gets from the statistics.
>>
>> The good thing is this helps even for IS NULL checks on non-join-key
>> columns (where we don't switch to an antijoin), but there's a couple
>> things that I dislike ...
>>
>> 1) It's not restricted to outer joins or anything like that (this is
>> mostly just my laziness / interest in one particular query, but also
>> something the outer-join-aware patch might help with).
>>
>> 2) We probably don't want to pass this kind of information through
>> sjinfo. That was the simplest thing for an experimental patch, but I
>> suspect it's not the only piece of information we may need to pass to
>> the lower levels of estimation code.
>>
>> 3) I kinda doubt we actually want to move this responsibility (to
>> consider fraction of unmatched rows) to the low-level estimation
>> routines (e.g. nulltestsel and various others). AFAICS this just
>> "introduces NULLs" into the relation, so maybe we could "adjust" the
>> attribute statistics (in examine_variable?) by inflating null_frac and
>> modifying the other frequencies in MCV/histogram.
>>
>> 4) But I'm not sure we actually want to do that in these low-level
>> selectivity functions. The outer join essentially produces output with
>> two subsets - one with matches on the outer side, one without them. But
>> the side without matches has NULLs in all columns. In a way, we know
>> exactly how are these columns correlated - if we do the usual estimation
>> (even with the null_frac adjusted), we just throw this information away.
>> And when there's a lot of rows without a match, that seems bad.
>>
>> So maybe we should split the join estimate into two parts, one for each
>> subset of the join result. One for the rows with a match (and then we
>> can just do what we do now, with the attribute stats we already have).
>> And one for the "unmatched part" where we know the values on the outer
>> side are NULL (and then we can easily "fake" stats with null_frac=1.0).
>>
>>
>> I really hope what I just wrote makes at least a little bit of sense.
>>
>>
>> regards
>>
> I am also interested in this problem.
>
> I did some refactoring of the source code in the patch, moved the
> calculation of unmatched_fraction to eqjoinsel_inner.
> I wrote myself in this commit as a co-author, if you don't mind, and I'm
> going to continue working.
>

Sure, if you want to take over the patch and continue working on it,
that's perfectly fine with me. I'm happy to cooperate on it, doing
reviews etc.

>
> On 26.06.2023 12:22, Andrey Lepikhov wrote:
>> On 24/6/2023 17:23, Tomas Vondra wrote:
>>> I really hope what I just wrote makes at least a little bit of sense.
>> Throw in one more example:
>>
>> SELECT i AS id INTO l FROM generate_series(1,100000) i;
>> CREATE TABLE r (id int8, v text);
>> INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
>> ANALYZE l,r;
>> EXPLAIN ANALYZE
>> SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;
>>
>> Here you can see the same kind of underestimation:
>> Hash Left Join  (... rows=500 width=14) (... rows=99999 ...)
>>
>> So the eqjoinsel_unmatch_left() function should be modified for the
>> case where nd1<nd2.
>>
> Unfortunately, this patch could not fix the cardinality calculation in
> this request, I'll try to look and figure out what is missing here.
>
> *postgres=# SELECT i AS id INTO l FROM generate_series(1,100000) i;
> CREATE TABLE r (id int8, v text);
> INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
> ANALYZE l,r;
> EXPLAIN ANALYZE
> SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;
> SELECT 100000
> CREATE TABLE
> INSERT 0 2
> ANALYZE
>                                                   QUERY
> PLAN                                                   
> ---------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=1.04..1819.07 rows=1 width=14) (actual
> time=0.143..114.792 rows=99999 loops=1)
>    Hash Cond: (l.id = r.id)
>    Filter: (r.v IS NULL)
>    Rows Removed by Filter: 1
>    ->  Seq Scan on l  (cost=0.00..1443.00 rows=100000 width=4) (actual
> time=0.027..35.278 rows=100000 loops=1)
>    ->  Hash  (cost=1.02..1.02 rows=2 width=10) (actual time=0.014..0.017
> rows=2 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  Seq Scan on r  (cost=0.00..1.02 rows=2 width=10) (actual
> time=0.005..0.007 rows=2 loops=1)
>  Planning Time: 0.900 ms
>  Execution Time: 126.180 ms
> (10 rows)*
>
>
> As in the previous query, even with applied the patch, the cardinality
> is calculated poorly here, I would even say that it has become worse:
>
> EXPLAIN ANALYZE
>   SELECT * FROM large FULL JOIN small ON (large.id = small.id)
> WHERE (large.a IS NULL);
>
> MASTER:
>
> *                                                              QUERY
> PLAN                                                         
> -----------------------------------------------------------------------------------------------------------------------------------
>  Merge Full Join  (cost=127921.69..299941.59 rows=56503 width=16)
> (actual time=795.092..795.094 rows=0 loops=1)
>    Merge Cond: (small.id = large.id)
>    Filter: (large.a IS NULL)
>    Rows Removed by Filter: 1000000
>    ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual
> time=0.038..0.046 rows=100 loops=1)
>          Sort Key: small.id
>          Sort Method: quicksort  Memory: 29kB
>          ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8)
> (actual time=0.013..0.022 rows=100 loops=1)
>    ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8)
> (actual time=363.016..649.103 rows=1000000 loops=1)
>          ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8)
> (actual time=363.012..481.480 rows=1000000 loops=1)
>                Sort Key: large.id
>                Sort Method: external merge  Disk: 17664kB
>                ->  Seq Scan on large  (cost=0.00..14425.50 rows=1000050
> width=8) (actual time=0.009..111.166 rows=1000000 loops=1)
>  Planning Time: 0.124 ms
>  Execution Time: 797.139 ms
> (15 rows)*
>
> With patch:
>
> *                                                      QUERY
> PLAN                                                      
> ----------------------------------------------------------------------------------------------------------------------
>  Hash Full Join  (cost=3.25..18179.25 rows=999900 width=16) (actual
> time=261.480..261.482 rows=0 loops=1)
>    Hash Cond: (large.id = small.id)
>    Filter: (large.a IS NULL)
>    Rows Removed by Filter: 1000000
>    ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8)
> (actual time=0.006..92.827 rows=1000000 loops=1)
>    ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual
> time=0.032..0.034 rows=100 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 12kB
>          ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8)
> (actual time=0.008..0.015 rows=100 loops=1)
>  Planning Time: 0.151 ms
>  Execution Time: 261.529 ms
> (10 rows)
> *
>
> In addition, I found a few more queries, where the estimation of
> cardinality with the patch has become better:
>
>

Yes, this does not surprise me at all - the patch was very early /
experimental code, and I only really aimed it at that single example
query. So don't hesitate to rethink what/how the patch works.

I do think collecting / constructing a wider set of queries with joins
of different types is going to be an important step in working on this
patch and making sure it doesn't break something.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-06-28 21:56:55 Re: Changing types of block and chunk sizes in memory contexts
Previous Message Tomas Vondra 2023-06-28 21:45:36 Re: POC, WIP: OR-clause support for indexes