Re: POC, WIP: OR-clause support for indexes

From: "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-08-29 03:37:51
Message-ID: d3db5422-bef3-8009-0f4c-5fdaea849184@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your interest in this problem and help, and I'm sorry that
I didn't respond to this email for a long time. To be honest, I wanted
to investigate the problems in more detail and already answer more
clearly, but unfortunately I have not found anything more significant yet.

On 21.08.2023 01:26, Peter Geoghegan wrote:
> There was actually support for OR lists in index AMs prior to
> ScalarArrayOpExpr. Even though ScalarArrayOpExpr don't really seem all
> that related to bitmap scans these days (since at least nbtree knows
> how to execute them "natively"), that wasn't always the case.
> ScalarArrayOpExpr were invented the same year that bitmap index scans
> were first added (2005), and seem more or less related to that work.
> See commits bc843d39, 5b051852, 1e9a6ba5, and 290166f9 (all from
> 2005). Particularly the last one, which has a commit message that
> heavily suggests that my interpretation is correct.
>
> Back in 2003, commit 9888192f removed (or at least simplified) what
> were then called "CNF/DNF CONVERSION ROUTINES". Prior to that point
> the optimizer README had something about leaving clause lists
> un-normalized leading to selectivity estimation problems. Bear in mind
> that this is a couple of years before ScalarArrayOpExpr was first
> invented. Apparently even back then "The OR-of-ANDs format is useful
> for indexscan implementation". It's possible that that old work will
> offer some hints on what to do now.
> In a way it's not surprising that work in this area would have some
> impact on selectivies. The surprising part is the extent of the
> problem, I suppose.
>
> I see that a lot of the things in this area are just used by BitmapOr
> clauses, such as build_paths_for_OR() -- but you're not necessarily
> able to use any of that stuff. Also, choose_bitmap_and() has some
> stuff about how it compensates to avoid "too-small selectivity that
> makes a redundant AND step look like it reduces the total cost". It
> also mentions some problems with match_join_clauses_to_index() +
> extract_restriction_or_clauses(). Again, this might be a good place to
> look for more clues.
I agree with your assumption about looking at the source of the error
related to selectivity in these places. But honestly, no matter how many
times I looked, until enough sensible thoughts appeared, which could
cause a problem. I keep looking, maybe I'll find something.
> EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND
> (tenthous = 1 OR tenthous = 3 OR tenthous = 42); - QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
> - Bitmap Heap Scan on tenk1 - Recheck Cond: (((thousand = 42) AND
> (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand
> = 42) AND (tenthous = 42))) - -> BitmapOr - -> Bitmap Index Scan on
> tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous =
> 1)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond:
> ((thousand = 42) AND (tenthous = 3)) - -> Bitmap Index Scan on
> tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous =
> 42)) -(9 rows) + QUERY PLAN
> +------------------------------------------------------------------------
> + Index Scan using tenk1_thous_tenthous on tenk1 + Index Cond:
> ((thousand = 42) AND (tenthous = ANY (ARRAY[1, 3, 42]))) +(2 rows)
>
> I think that we currently over-rely on BitmapOr for OR clauses. It's
> useful that they're so general, of course, but ISTM that we shouldn't
> even try to use a BitmapOr in simple cases. Things like the "WHERE
> thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42)"
> tenk1 query that you brought up probably shouldn't even have a
> BitmapOr path (which I guess they don't with you patch). Note that I
> recently discussed the same query at length with Tomas Vondra on the
> ongoing thread for his index filter patch (you probably knew that
> already).
I think so too, but it's still quite difficult to find a stable enough
optimization to implement this, in my opinion. But I will finish the
current optimization with OR->ANY, given that something interesting has
appeared.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-08-29 04:18:11 Re: subscription/015_stream sometimes breaks
Previous Message John Naylor 2023-08-29 03:26:27 Re: Debian 12 gcc warning