Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Date: 2022-10-12 20:41:11
Message-ID: CAApHDvpZRWiWE8N8BW+cYndYuWTZ15wrg941mRYNnSkMBWBu7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 13 Oct 2022 at 00:30, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> I also have concerns about the 2 Limit nodes pointed by the comment
> inside the patch. Maybe we can check with limit_needed() and manually
> add the limit node only if there is no LIMIT clause in the origin query?

I wasn't hugely concerned about this. I think we're a little limited
to what we can actually do about it too.

It seems easy enough to skip adding the LimitPath in
create_final_distinct_paths() if the existing query already has
exactly LIMIT 1. However, if they've written LIMIT 10 or LIMIT
random()*1234 then we must add the LimitPath to ensure we only get 1
row rather than 10 or some random number.

As for getting rid of the LIMIT 10 / LIMIT random()*1234, we store the
LIMIT clause information in the parse and currently that's what the
planner uses when creating the LimitPath for the LIMIT clause. I'd
quite like to avoid making any adjustments to the parse fields here.
(There's a general project desire to move away from the planner
modifying the parse. If we didn't do that we could do things like
re-plan queries with stronger optimization levels when they come out
too costly.)

We could do something like set some bool flag in PlannerInfo to tell
the planner not to bother adding the final LimitPath as we've already
added another which does the job, but is it really worth adding that
complexity for this patch? You already mentioned that "this patch is
very straightforward". I don't think it would be if we added code to
avoid the LimitPath duplication.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-10-12 20:50:30 Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Previous Message Matthias van de Meent 2022-10-12 20:05:30 Re: problems with making relfilenodes 56-bits