Quick Links

Re: Why not do distinct before SetOp

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	ma lz <ma100(at)hotmail(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Why not do distinct before SetOp
Date:	2024-11-04 11:09:35
Message-ID:	CAApHDvqdDwEXxhZLTwsHkWnvpvVSYT2OXSzfxRrs2p5xudr9fw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, 4 Nov 2024 at 22:52, ma lz <ma100(at)hotmail(dot)com> wrote:
>
> some sql like ' select a from t1 intersect select a from t1 '
>
> if t1 has large number rows but has few distinct rows
>
> select distinct a from t1 intersect select distinct a from t1; — this is faster than origin sql
>
> can postgres do this optimize during plan-queries?

No, the planner does not attempt that optimisation. INTERSECT really
isn't very well optimised.

If we did want to improve this area, I think the first thing we'd want
to do is use standard join types rather than HashSetOp Intersect to
implement INTERSECT (without ALL). To do that efficiently, we'd need
to do a bit more work on the standard join types to have them
efficiently support IS NOT DISTINCT FROM clauses as the join keys.
There's a fair bit of work to do and it's likely not been done as
INTERSECT isn't used that commonly.

There was a bit of work done in PG17 to teach the query planner some
new tricks around UNION. I think UNION is a much more commonly used
setop than INTERSECT, so you might have to wait a while. For now, it's
best to adjust your query.

David

In response to

Why not do distinct before SetOp at 2024-11-04 09:52:12 from ma lz

Responses

Re: Why not do distinct before SetOp at 2024-11-04 15:18:04 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Paul Brindusa	2024-11-04 11:53:41	Re: pg_wal folder high disk usage
Previous Message	ma lz	2024-11-04 09:52:12	Why not do distinct before SetOp