Re: Using window functions to get the unpaginated count for paginated queries

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Clemens Park *EXTERN*" <clemens(dot)park(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using window functions to get the unpaginated count for paginated queries
Date: 2012-11-14 08:42:29
Message-ID: D960CB61B694CF459DCFB4B0128514C208AF0960@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Clemens Park wrote:
> Recently, during a performance improvement sweep for an application at
my company, one of the hotspots
> that was discovered was pagination.
>
> In order to display the correct pagination links on the page, the
pagination library we used (most
> pagination libraries for that matter) ran the query with OFFSET and
LIMIT to get the paginated
> results, and then re-ran the query without the OFFSET and LIMIT and
wrapped them in a SELECT COUNT(*)
> FROM main_query to get the total number of rows.
>
> In an attempt to optimize this, we used a window function as follows:
>
> Given a query that looked as follows:
>
> SELECT a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> add total_entries_count column as follows:
>
> SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> This calculates the total number of unpaginated rows correctly,
without affecting the runtime of the
> query. At least as far as I can tell.

It can affect the runtime considerably.

I created a 10^6 row test table and tried:

test=> EXPLAIN ANALYZE SELECT id, val FROM large OFFSET 100 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------
Limit (cost=1.49..1.64 rows=10 width=12) (actual time=0.177..0.195
rows=10 loops=1)
-> Seq Scan on large (cost=0.00..14902.00 rows=1000000 width=12)
(actual time=0.028..0.114 rows=110 loops=1)
Total runtime: 0.251 ms
(3 rows)

test=> EXPLAIN ANALYZE SELECT id, val, COUNT(*) OVER () AS
total_entries_count FROM large OFFSET 100 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------
Limit (cost=2.74..3.01 rows=10 width=12) (actual
time=1893.606..1893.625 rows=10 loops=1)
-> WindowAgg (cost=0.00..27402.00 rows=1000000 width=12) (actual
time=1893.435..1893.559 rows=110 loops=1)
-> Seq Scan on large (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.025..647.182 rows=1000000 loops=1)
Total runtime: 1915.255 ms
(4 rows)

That is because the second query will have to scan all rows, while the
first one can stop scanning after 110 rows.

> The questions I have are:
>
> 1) Are there any adverse effects that the above window function can
have?

I can only think of the performance degradation mentioned above.

> 2) Are there any cases where the count would return incorrectly?

No.

> 3) In general, is this an appropriate use-case for using window
functions?

I think it is.

Maybe you can do better if you don't retrieve the total count
of rows for every set of rows you select.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2012-11-14 08:51:57 Re: Understanding streaming replication
Previous Message Tianyin Xu 2012-11-14 08:12:42 How do query optimizers affect window functions