Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?

From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?
Date: 2014-03-14 13:26:53
Message-ID: lfv02f$kg0$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Markus,

thanks for your reply.

> The pipelined top-n query has two very important properties:
> (1) it utilizes the index order to avoid the sort operation required to satisfy ORDER BY clause
> (2) it realizes that it can stop processing as soon as it has delivered enough rows.
>
> The execution plan from Thomas Kellerer sees to fulfill requirement
> (1) but definitively not (2).
>
> Even with 9.3.2, I were not able to reproduce the result of Thomas
> (not showing any sort operation in the execution plan) with the test
> data I also published at my website:
> http://use-the-index-luke.com/sql/example-schema/postgresql/partial-results
>
> Then I started fiddling around with the planner's cost settings and
> finally managed to get a plan similar to Thomas' when setting
> random_page_cost to 0.1 (setting it to 1, equal to seq_page_cost was
> not enough). However, that proves the point that PostgreSQL can use
> an index to avoid the sort operation caused by order by (even for
> window functions). I'd be curious what settings caused Thomas to get
> this result.

Good point, I should adopt the habit to mention config settings for this kind of things:

shared_buffers = 2048MB
temp_buffers = 16MB
work_mem = 32MB
seq_page_cost = 1.0
random_page_cost = 1.5
cpu_tuple_cost = 0.001
cpu_index_tuple_cost = 0.001
effective_cache_size = 2048MB

> The second requirement for pipelined top-n queries is not satisfied
> in Thomas' execution plan: it does read the full index (actual
> rows=1000000), and applies the window function over all rows. Only in
> the end it throws away all non-confirming rows (Rows Removed by
> Filter: 999899). A pipelined top-n execution would not cause more
> than 300 rows read from the index, and only 200 rows removed by
> filter. That's what the Oracle DB and SQL Server manage to do
> (Thomas: ping me to track down why Oracle didn't for you).
> Considering that PG can use the index order to avoid the sort, it
> still doesn't make very much sense if it cannot abort the index scan
> after fetching enough rows. So, not using the index might even be
> right choice in unfiltered cases like this.

I probably should have read the definition of "index usage" more carefully, thanks for the clarification.

I created the testdata from your webpage locally and then ran the window function statement from
http://use-the-index-luke.com/sql/example-schema/postgresql/partial-results

SELECT *
FROM ( SELECT sales.*
, ROW_NUMBER() OVER (ORDER BY sale_date DESC
, sale_id DESC) rn
FROM sales
) tmp
WHERE rn between 11 and 20
ORDER BY sale_date DESC, sale_id DESC;

This does an "Index Scan Backward" on an index defined as (sale_date, sale_id) but takes nearly 2.5 seconds on my computer.

The version using OFFSET .. LIMIT took about 0.05 seconds and increases when the offset increases which is to be expected - whereas the window function version is pretty much constant even with a startpoint of 100001 - but the offset version is still *much* faster then.

Regards
Thomas

P.S.: Btw: thanks for your book and your webpage, both are very insipring.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jakub Can 2014-03-14 13:56:38 pg_dump fails pg_rewrite entry not found
Previous Message Pavel Stehule 2014-03-14 13:18:27 Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?