Re: Query planner skipping index depending on DISTINCT parameter order (2)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dilyan Palauzov <dilyan(dot)palauzov(at)aegee(dot)org>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Query planner skipping index depending on DISTINCT parameter order (2)
Date: 2017-09-19 22:01:32
Message-ID: 580b8874-4655-1454-2459-c1cd3432c61b@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 09/19/2017 05:40 PM, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> On 09/17/2017 07:15 PM, Tom Lane wrote:
>>> For the DISTINCT ON case, the user-visible semantics are actually pretty
>>> tightly tied to ORDER BY, so that it would not be very reasonable to
>>> consider any other orderings than the given column order anyway.
>
>> Tied in what sense? In the docs we explicitly say this:
>
>> https://www.postgresql.org/docs/10/static/sql-select.html#sql-distinct
>
>> SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
>> each set of rows where the given expressions evaluate to equal. The
>> DISTINCT ON expressions are interpreted using the same rules as for
>> ORDER BY (see above). Note that the “first row” of each set is
>> unpredictable unless ORDER BY is used to ensure that the desired row
>> appears first.
>
> Right, so the behavior is undefined unless you have an ORDER BY
> clause that includes the DISTINCT ON columns plus some more columns.
> That's pretty tightly tied in my book.
>

Ah, OK. I thought you're suggesting we're required to produce the data
sorted by the DISTINCT ON columns. But you meant that ORDER BY clause is
required for well-defined result, which limits our options when picking
an index. Right?

>> So if an ORDER BY is not specified, why couldn't we pick an
>> arbitrary ordering matching based on available indexes?
>
> The case is not of any real-world use, and so I'm unwilling to
> expend the large amount of coding effort that would be needed to make
> the planner behave this way.
>

I don't think the "no real-world use" is so clear. Imagine for example a
denormalized table where the user knows that for a given ID, the other
columns match too. Then

SELECT DISTINCT ON (id), x, y, z FROM t;
SELECT DISTINCT ON (id), x, y, z FROM t ORDER BY id, x, y, z;
SELECT DISTINCT id, x, y, z, FROM t;

will all produce the same result, except that the first case only needs
compare values in "id". If sorts on "x", "y" and "z" are expensive (say,
because those are text columns with non-C collations), that may be quite
a difference.

That being said, I'm not sure how much code would this be, and how much
overhead would it mean.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message matioli.matheus 2017-09-20 09:26:09 BUG #14820: Standby crash with "could not access status of transaction" (for track_commit_timestamp)
Previous Message Takhir Fakhrutdinov 2017-09-19 20:13:33 Re: BUG #14819: postgres_fwd could not load library