Re: bug or lacking doc hint

From: Marc Millas <marc(dot)millas(at)mokadb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: bug or lacking doc hint
Date: 2023-06-25 22:26:55
Message-ID: CADX_1abrHNz1pN1JvRLCMSopMsRWyKfuWVDneSKxGH0A5KOfEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Jun 25, 2023 at 11:48 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > The problem is that out of the 3 methods PostgreSQL uses to join
> > tables, only 1 of them supports join conditions with an OR clause.
> > Merge Join cannot do this because results can only be ordered one way
> > at a time. Hash Join technically could do this, but it would require
> > that it built multiple hash tables. Currently, it only builds one
> > table. That leaves Nested Loop as the join method to implement joins
> > with OR clauses. Unfortunately, nested loops are quadratic and the
> > join condition must be evaluated once per each cartesian product row.
>
> We can do better than that if the OR'd conditions are each amenable
> to an index scan on one of the tables: then it can be a nestloop with
> a bitmap-OR'd inner index scan. I thought the upthread advice to
> convert the substr() condition into something that could be indexed
> was on-point.
>
ok. but one of the tables within the join(s) tables is 10 billions rows,
splitted in 120 partitions. Creating something like 20 more indexes to
fulfill that condition do have its own problems.

>
> > Tom Lane did start some work [1] to allow the planner to convert some
> > queries to use UNION instead of evaluating OR clauses, but, if I
> > remember correctly, it didn't handle ORs in join conditions, though
> > perhaps having it do that would be a natural phase 2. I don't recall
> > why the work stopped.
>
> As I recall, I was having difficulty convincing myself that
> de-duplication of results (for cases where the same row satisfies
> more than one of the OR'd conditions) would work correctly.
> You can't just blindly make it a UNION because that might remove
> identical rows that *should* appear more than once in the result.
>

I did rewrite the query using a cte and union(s). For that query, no dedup
point.
But my pb is that that DB will be used by a bunch of people writing raw
SQL queries, and I cannot let them write queries that are going to go on
for ages, and eventually crash over temp_file_limit after hours every now
and then.
So, my understanding of the above is that I must inform the users NOT to
use OR clauses into joins.
which maybe a pb by itself.
regards
Marc

> regards, tom lane
>

Marc MILLAS

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Avin Kavish 2023-06-26 03:47:43 Re: bug or lacking doc hint
Previous Message Tom Lane 2023-06-25 21:47:55 Re: bug or lacking doc hint