Re: NULLS LAST performance

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Mathieu De Zutter <mathieu(at)dezutter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: NULLS LAST performance
Date: 2011-03-11 04:36:27
Message-ID: AANLkTi=kCEbDLkF-iC6fDsD2v9k8jB52p97q39AUr8ev@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 10, 2011 at 11:32 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Mar 10, 2011 at 9:55 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Mar 9, 2011 at 6:01 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>> Unfortunately, I don't think the planner actually has that level of knowledge.
>>
>> Actually, I don't think it would be that hard to teach the planner
>> about that special case...
>>
>>> A more reasonable fix might be to teach the executor that it can do 2 scans of the index: one to get non-null data and a second to get null data. I don't know if the use case is prevalent enough to warrant the extra code though.
>>
>> That would probably be harder, but useful.  I thought about working on
>> it before but got sidetracked onto other things.
>
> ISTM this isn't all that different from the case of composite indexes
> where you are missing the left most term, or you have an index on
> a,b,c (which the server already handles) but user asks for a,b desc,
> c. If cardinality on b is low it might pay to loop and break up the
> scan.

Yeah, there are a couple of refinements possible here. One
possibility is that you might ask for ORDER BY a, b and the only
relevant index is on a. In that case, it'd be a good idea to consider
scanning the index and sorting each equal group on b. I've seen quite
a few queries that would benefit from this. A second possibility is
that you might ask for ORDER BY a, b and the only relevant index is on
a, b DESC. In that case, you could do three things:

- Scan the index and sort each group that's equal on a by b desc, just
as if the index were only on a.
- Scan the index and reverse each group.
- Scan the index in a funny order - for each value of a, find the
highest value of b and scan backwards until the a value changes; then
repeat for the next a-value.

And similarly with the case where you have ORDER BY a NULLS FIRST and
an index on a NULLS LAST, you could either:

- Detect when the column is NOT NULL and ignore the NULLS FIRST/LAST
property for purposes of matching the index in such cases, or
- Scan the index in a funny order - traverse the index to find the
first non-NULL entry at whichever end of the index has the nulls, go
from there to the end, and then "wrap around" to pick up the null
entries

The tricky part, at least IMO, is that you've got to not only teach
the planner to recognize these conditions when they occur, but also
find some way of passing it down to the index AM, which you also have
to modify to know how to do all this stuff. The worst part about
making modifications of this type is that it's really hard to unit
test them - the planner, executor, and index AM all have to cooperate
before you can get off the ground.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message hans wulf 2011-03-11 15:32:05 ANTI-JOIN needs table, index scan not possible?
Previous Message Dan Ancona 2011-03-11 00:43:09 Re: big joins not converging