From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | ma lz <ma100(at)hotmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Why not do distinct before SetOp |
Date: | 2024-11-05 22:11:51 |
Message-ID: | CAApHDvod=mg8xKbBTFTu7HPWzLk+UhHVdXXwQEj9eNh5buVaiQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 5 Nov 2024 at 04:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> A different idea that occurred to me while looking at this is:
> why have we got all this machinery to add and check a flag
> column, rather than arranging things so that the two input
> relations are "outer" and "inner" children of the SetOp?
I've no idea why it's not like that. The current design is quite
strange and feels dated. It might be worth making that change as even
if we gave joins better support for IS NOT DISTINCT FROM and made
INTERSECT use INNER JOIN instead and EXCEPT use anti join, we'd still
need nodeSetOp.c for INTERSECT ALL and EXCEPT ALL.
> It's possible some of the performance difference reported here
> is due to having to pass more tuples through the SubqueryScan
> node (with its projection to add the flag) and Append node,
> but we could remove those steps entirely.
Seems plausible.
> > If we did want to improve this area, I think the first thing we'd want
> > to do is use standard join types rather than HashSetOp Intersect to
> > implement INTERSECT (without ALL). To do that efficiently, we'd need
> > to do a bit more work on the standard join types to have them
> > efficiently support IS NOT DISTINCT FROM clauses as the join keys.
>
> Maybe. It'd be a big project, but we do get complaints every so
> often about IS NOT DISTINCT FROM predicates not being efficient,
> so the benefits would be wider than just INTERSECT.
Yeah, I agree. I think that's step 1 towards making INTERSECT (without
ALL) and EXCEPT (without ALL) better and it would probably make a few
other people happy who use IS NOT DISTINCT FROM in their join
conditions.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Moench-Tegeder | 2024-11-05 22:22:15 | Re: adsrc |
Previous Message | Matt Zagrabelny | 2024-11-05 21:45:15 | adsrc |