From: | Jie Li <jay23jack(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Why percent_rank is so slower than rank? |
Date: | 2010-12-09 07:26:09 |
Message-ID: | AANLkTi=Jxs_mG4kJuxEWLdKvwREHYBeSA40AYnXqYhtq@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
I'm new to window functions. Recently I run some simple queries but
surprised to find percent_rank is so slower than rank, could anybody tell me
why?
The table schema:
test=# \d inventory1
Table "public.inventory1"
Column | Type | Modifiers
----------------------+---------+-----------
inv_date_sk | integer | not null
inv_item_sk | integer | not null
inv_warehouse_sk | integer | not null
inv_quantity_on_hand | integer |
test=# \dt+ inventory1
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------+-------+----------+---------+-------------
public | inventory1 | table | workshop | 8880 kB |
The rank query result:
test=# explain analyze select inv_date_sk,inv_item_sk, rank()over(partition
by inv_date_sk order by inv_item_sk) from inventory1;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=631.947..1361.158 rows=189000 loops=1)
-> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=631.924..771.990 rows=189000 loops=1)
Sort Key: inv_date_sk, inv_item_sk
Sort Method: quicksort Memory: 12218kB
-> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.055..198.948 rows=189000 loops=1)
Total runtime: 1500.193 ms
(6 rows)
The percent_rank result:
test=# explain analyze select inv_date_sk,inv_item_sk,
percent_rank()over(partition by inv_date_sk order by inv_item_sk) from
inventory1;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual
time=766.432..32924.804 rows=189000 loops=1)
-> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual
time=756.320..905.407 rows=189000 loops=1)
Sort Key: inv_date_sk, inv_item_sk
Sort Method: quicksort Memory: 12218kB
-> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000
width=8) (actual time=0.102..224.607 rows=189000 loops=1)
Total runtime: 33152.188 ms
(6 rows)
One special thing is that all the values of the partition key(inv_date_sk)
are the same, that is, there is only one window partition. I find that
percent_rank needs to buffer all the tuples to get the total number of rows.
But why is it so expensive?
I use 8.4.4. And I only increase the work_mem to 100M and leave other
parameters untouched.
Thanks,
Li Jie
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2010-12-09 07:52:11 | Re: Hot Standby btree delete records and vacuum_defer_cleanup_age |
Previous Message | Jeff Janes | 2010-12-09 05:44:13 | Re: BufFreelistLock |