Re: Wrapping a where clause to preserve rows with nulls

From: Adrian Garcia Badaracco <adrian(at)adriangb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Wrapping a where clause to preserve rows with nulls
Date: 2024-12-19 04:41:58
Message-ID: CAE8z92FTVnCfbS54F01st0QxeLMsgt1mcafnQAW94h-=6-sZ4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you for the great idea Tom. While yes I can't modify the original
WHERE clause I do think I'll be able to introspect it or get the system
generating it to tell me which columns it references and then add an OR x
is NULL OR y is NULL ...

For context, just in case it's interesting, I store Parquet statistics in a
Postgres table and run the output of this thing on them:
https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/physical-optimizer/src/pruning.rs#L146-L456
Hence why I can't really control the WHERE clause (at least not without
re-implementing a bunch of finicky error prone code).

On Wed, Dec 18, 2024 at 10:38 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> > On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
> > adrian(at)adriangb(dot)com> wrote:
> >> Is there any way to include the rows where the predicate evaluates to
> null
> >> while still using an index?
>
> > ... A btree index, which handles =, can’t be told to behave
> > differently and so cannot fulfill your desire to produce rows where the
> > stored value is null; it can only produce those equal to 5000.
>
> Not in a single scan, no. But multiple scans are possible:
>
> regression=# create table t (id int unique);
> CREATE TABLE
> regression=# explain select * from t where id = 5000 or id is null;
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------
> Bitmap Heap Scan on t (cost=8.42..18.98 rows=14 width=4)
> Recheck Cond: ((id IS NULL) OR (id = 5000))
> -> BitmapOr (cost=8.42..8.42 rows=14 width=0)
> -> Bitmap Index Scan on t_id_key (cost=0.00..4.25 rows=13
> width=0)
> Index Cond: (id IS NULL)
> -> Bitmap Index Scan on t_id_key (cost=0.00..4.16 rows=1
> width=0)
> Index Cond: (id = 5000)
> (7 rows)
>
> The OP was quite unclear about what semantics he wants for
> multiple-variable WHERE clauses, but maybe something like this
> would work:
>
> WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...
>
> where each variable mentioned in original-clause is allowed
> to also be NULL. Or perhaps what is wanted is
>
> WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)
>
> ??
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2024-12-19 04:53:12 Re: How to deal with dangling files after aborted `pg_restore`?
Previous Message Tom Lane 2024-12-19 04:38:08 Re: Wrapping a where clause to preserve rows with nulls