Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Jim Nasby <jim(dot)nasby(at)openscg(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Date: 2019-03-18 00:09:42
Message-ID: 20190318000942.GA2804728@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote:
> >> FWIW, my problem with this patch is that I remain unconvinced of the basic
> >> correctness of the transform (specifically the unique-ification approach).
> >> Noah's points would be important to address if we were moving the patch
> >> towards commit, but I don't see much reason to put effort into it until
> >> we can think of a way to prove whether that works.
>
> > Not even effort to fix the assertion failures I reported?
>
> If it seemed relevant to the proof-of-correctness problem, I would look
> into it, but ...

I put some hours into theoretical study of the proof, and I didn't find any
holes. When the planner removes "l0 LEFT JOIN r1", it does that because
there's one output row per l0.ctid regardless of the rows of r1. Hence,
deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid,
r1.ctid). In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be sufficient
as a key, but we're out of luck because removing the join makes us have only
l0.ctid for some tuples and only r1.ctid for other tuples.

If PostgreSQL ever gets inner join removal, it would be harder to preserve
this optimization in this form. At that point, perhaps we'd cost the path
that retains the join for the benefit of $SUBJECT. Given the performance test
results, $SUBJECT may already need a cost-based decision on whether to use it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-18 00:14:56 Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ
Previous Message Tom Lane 2019-03-17 23:43:44 Re: Rare SSL failures on eelpout