From: | "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "Clemens Park *EXTERN*" <clemens(dot)park(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Using window functions to get the unpaginated count for paginated queries |
Date: | 2012-11-14 08:42:29 |
Message-ID: | D960CB61B694CF459DCFB4B0128514C208AF0960@exadv11.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Clemens Park wrote:
> Recently, during a performance improvement sweep for an application at
my company, one of the hotspots
> that was discovered was pagination.
>
> In order to display the correct pagination links on the page, the
pagination library we used (most
> pagination libraries for that matter) ran the query with OFFSET and
LIMIT to get the paginated
> results, and then re-ran the query without the OFFSET and LIMIT and
wrapped them in a SELECT COUNT(*)
> FROM main_query to get the total number of rows.
>
> In an attempt to optimize this, we used a window function as follows:
>
> Given a query that looked as follows:
>
> SELECT a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> add total_entries_count column as follows:
>
> SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> This calculates the total number of unpaginated rows correctly,
without affecting the runtime of the
> query. At least as far as I can tell.
It can affect the runtime considerably.
I created a 10^6 row test table and tried:
test=> EXPLAIN ANALYZE SELECT id, val FROM large OFFSET 100 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------
Limit (cost=1.49..1.64 rows=10 width=12) (actual time=0.177..0.195
rows=10 loops=1)
-> Seq Scan on large (cost=0.00..14902.00 rows=1000000 width=12)
(actual time=0.028..0.114 rows=110 loops=1)
Total runtime: 0.251 ms
(3 rows)
test=> EXPLAIN ANALYZE SELECT id, val, COUNT(*) OVER () AS
total_entries_count FROM large OFFSET 100 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------
Limit (cost=2.74..3.01 rows=10 width=12) (actual
time=1893.606..1893.625 rows=10 loops=1)
-> WindowAgg (cost=0.00..27402.00 rows=1000000 width=12) (actual
time=1893.435..1893.559 rows=110 loops=1)
-> Seq Scan on large (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.025..647.182 rows=1000000 loops=1)
Total runtime: 1915.255 ms
(4 rows)
That is because the second query will have to scan all rows, while the
first one can stop scanning after 110 rows.
> The questions I have are:
>
> 1) Are there any adverse effects that the above window function can
have?
I can only think of the performance degradation mentioned above.
> 2) Are there any cases where the count would return incorrectly?
No.
> 3) In general, is this an appropriate use-case for using window
functions?
I think it is.
Maybe you can do better if you don't retrieve the total count
of rows for every set of rows you select.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2012-11-14 08:51:57 | Re: Understanding streaming replication |
Previous Message | Tianyin Xu | 2012-11-14 08:12:42 | How do query optimizers affect window functions |