From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Subject: | Re: PoC: Partial sort |
Date: | 2013-12-22 15:38:05 |
Message-ID: | CAPpHfdtqfqm+WR7cQWK-f0+qD_F4VNuvPuyvjmR7xCxQODAXrw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
Next revision. It expected to do better work with optimizer. It introduces
presorted_keys argument of cost_sort function which represent number of
keys already sorted in Path. Then this function uses estimate_num_groups to
estimate number of groups with different values of presorted keys and
assumes that dataset is uniformly divided by
groups. get_cheapest_fractional_path_for_pathkeys tries to select the path
matching most part of path keys.
You can see it's working pretty good on single table queries.
create table test as (select id, (random()*5)::int as v1,
(random()*1000)::int as v2 from generate_series(1,1000000) id);
create index test_v1_idx on test (v1);
create index test_v1_v2_idx on test (v1, v2);
create index test_v2_idx on test (v2);
vacuum analyze;
postgres=# explain analyze select * from test order by v1, id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Sort (cost=149244.84..151744.84 rows=1000000 width=12) (actual
time=2111.476..2586.493 rows=1000000 loops=1)
Sort Key: v1, id
Sort Method: external merge Disk: 21512kB
-> Seq Scan on test (cost=0.00..15406.00 rows=1000000 width=12)
(actual time=0.012..113.815 rows=1000000 loops=1)
Total runtime: 2683.011 ms
(5 rows)
postgres=# explain analyze select * from test order by v1, id limit 10;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=11441.77..11442.18 rows=10 width=12) (actual
time=79.980..79.982 rows=10 loops=1)
-> Partial sort (cost=11441.77..53140.44 rows=1000000 width=12)
(actual time=79.978..79.978 rows=10 loops=1)
Sort Key: v1, id
Presorted Key: v1
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using test_v1_idx on test (cost=0.42..47038.83
rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
Total runtime: 81.786 ms
(7 rows)
postgres=# explain analyze select * from test order by v1, v2 limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.90 rows=10 width=12) (actual time=0.031..0.047
rows=10 loops=1)
-> Index Scan using test_v1_v2_idx on test (cost=0.42..47286.28
rows=1000000 width=12) (actual time=0.029..0.043 rows=10 loops=1)
Total runtime: 0.083 ms
(3 rows)
postgres=# explain analyze select * from test order by v2, id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Partial sort (cost=97.75..99925.50 rows=1000000 width=12) (actual
time=1.069..1299.481 rows=1000000 loops=1)
Sort Key: v2, id
Presorted Key: v2
Sort Method: quicksort Memory: 52kB
-> Index Scan using test_v2_idx on test (cost=0.42..47603.79
rows=1000000 width=12) (actual time=0.030..812.083 rows=1000000 loops=1)
Total runtime: 1393.850 ms
(6 rows)
However, work with joins needs more improvements.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
partial-sort-2.patch | application/octet-stream | 50.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2013-12-22 16:12:56 | Re: PoC: Partial sort |
Previous Message | Amit Kapila | 2013-12-22 15:02:58 | Re: ALTER SYSTEM SET command to change postgresql.conf parameters |