Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Markus Winand <markus(dot)winand(at)winand(dot)at>
Cc: Behrang Saeedzadeh <behrangsa(at)gmail(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>, Thomas Kellerer <spam_eater(at)gmx(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?
Date: 2014-03-14 13:18:27
Message-ID: CAFj8pRBdV70fCJG6mT1x-Awbc_KJKJQphCQMY1_OqRUBWLVFPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2014-03-14 13:02 GMT+01:00 Markus Winand <markus(dot)winand(at)winand(dot)at>:

> Hi!
>
> I'd like to clarify this as the original author of the page in question.
>
> Fist of all, I also recommend the row-value syntax as you can see on the
> "previous" page:
> http://use-the-index-luke.com/sql/partial-results/fetch-next-page
>
> I've also explained this procedure at conferences. Here are the slides:
>
> http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way
>
>
> And now about this quote:
> > However, only SQL Server and the Oracle database can use them for a
> pipelined top-N query. PostgreSQL does not use indexes for those queries
> and therefore executes them very inefficiently.
>
> The very important thing here is "pipelined top-N query". This term is
> introduced two pages earlier:
> http://use-the-index-luke.com/sql/partial-results/top-n-queries
>
> The pipelined top-n query has two very important properties:
> (1) it utilizes the index order to avoid the sort operation required to
> satisfy ORDER BY clause
> (2) it realizes that it can stop processing as soon as it has delivered
> enough rows.
>
> The execution plan from Thomas Kellerer sees to fulfill requirement (1)
> but definitively not (2).
>
> Even with 9.3.2, I were not able to reproduce the result of Thomas (not
> showing any sort operation in the execution plan) with the test data I also
> published at my website:
>
> http://use-the-index-luke.com/sql/example-schema/postgresql/partial-results
>
> Then I started fiddling around with the planner's cost settings and
> finally managed to get a plan similar to Thomas' when setting
> random_page_cost to 0.1 (setting it to 1, equal to seq_page_cost was not
> enough). However, that proves the point that PostgreSQL can use an index to
> avoid the sort operation caused by order by (even for window functions).
> I'd be curious what settings caused Thomas to get this result.
>
> The second requirement for pipelined top-n queries is not satisfied in
> Thomas' execution plan: it does read the full index (actual rows=1000000),
> and applies the window function over all rows. Only in the end it throws
> away all non-confirming rows (Rows Removed by Filter: 999899). A pipelined
> top-n execution would not cause more than 300 rows read from the index, and
> only 200 rows removed by filter. That's what the Oracle DB and SQL Server
> manage to do (Thomas: ping me to track down why Oracle didn't for you).
> Considering that PG can use the index order to avoid the sort, it still
> doesn't make very much sense if it cannot abort the index scan after
> fetching enough rows. So, not using the index might even be right choice in
> unfiltered cases like this.
>
> Interestingly, I did not even get an Index Only Scan when just selecting
> column from the index with random_page_cost=1 and seq_page_cost=1. Again I
> had to reduce random_page_cost further down (0.1) to get an Index Only
> Scan. That does not make sense to me. When both _page_costs are same, the
> Index Only Scan should get lower costs because the index is smaller (338mb
> vs. 31 mb in my case). On top of that, the Index Only Scan avoids the sort.
> However, with 9.3.2 I get the same cost for Index Only Scan as for Index
> Scan (had to enable_seqscan=off and enable_bitmapscan=off to get that).
>
> So, I have to change my page (&book) to say something like this:
>
> > PostgreSQL does not abort the index scan after fetching enough rows for
> those queries and therefore executes them very inefficiently.
>

I am thinking so LIMIT is not propagated to down - so window function
should be calculated in full range.

Regards

Pavel

>
>
> Thanks for the hint and always feel free to put my on CC regarding
> questions about stuff on Use The Index, Luke!
>
> -markus
>
> ps.: It's perfectly possible that PG could use indexes for
> window-functions before 9.3. I did definitively not fiddle around with cost
> settings at that time to force it into this plan.
> pps.: sorry for the delay, I'm not subscribed (just too much) but somebody
> was nice enough to ping me about this.
> ppps: then I wondered how to properly reply without having the original
> messages. So I downloaded the .mbox from the archive and pushed reply
> there. Hope it ends up in the right thread :)
>
> Markus Winand
> markus(dot)winand(at)winand(dot)at
> T +43 1 9444047
>
> > "A wonderful book…I highly recommend it." -Anders Janmyr
> > http://sql-performance-explained.com/
>
> Maderspergerstr. 1-3/9/11
> 1160 Wien
> AUSTRIA
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2014-03-14 13:26:53 Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?
Previous Message Markus Winand 2014-03-14 12:02:47 Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?