From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andreas Karlsson <andreas(at)proxel(dot)se> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Subject: | Re: PoC: Partial sort |
Date: | 2013-12-28 08:28:04 |
Message-ID: | CAPpHfdvHBXX5CVsbt8a8ao3RomHe+WZ3Ph06_gHSr93pQo1aXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> On 12/22/2013 04:38 PM, Alexander Korotkov wrote:
>
>> postgres=# explain analyze select * from test order by v1, id limit 10;
>> QUERY
>> PLAN
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> -----------------------
>> Limit (cost=11441.77..11442.18 rows=10 width=12) (actual
>> time=79.980..79.982 rows=10 loops=1)
>> -> Partial sort (cost=11441.77..53140.44 rows=1000000 width=12)
>> (actual time=79.978..79.978 rows=10 loops=1)
>> Sort Key: v1, id
>> Presorted Key: v1
>> Sort Method: top-N heapsort Memory: 25kB
>> -> Index Scan using test_v1_idx on test (cost=0.42..47038.83
>> rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
>> Total runtime: 81.786 ms
>> (7 rows)
>>
>
> Have you thought about how do you plan to print which sort method and how
> much memory was used? Several different sort methods may have been use in
> the query. Should the largest amount of memory/disk be printed?
Apparently, now amount of memory for sorted last group is printed. Your
proposal makes sense: largest amount of memory/disk should be printed.
> However, work with joins needs more improvements.
>>
>
> That would be really nice to have, but the patch seems useful even without
> the improvements to joins.
Attached revision of patch implements partial sort usage in merge joins.
create table test1 as (
select id,
(random()*100)::int as v1,
(random()*10000)::int as v2
from generate_series(1,1000000) id);
create table test2 as (
select id,
(random()*100)::int as v1,
(random()*10000)::int as v2
from generate_series(1,1000000) id);
create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);
create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);
# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1 and t1.v2 =
t2.v2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Merge Join (cost=2257.67..255273.39 rows=983360 width=24)
Merge Cond: ((t1.v1 = t2.v1) AND (t1.v2 = t2.v2))
-> Partial sort (cost=1128.84..116470.79 rows=1000000 width=12)
Sort Key: t1.v1, t1.v2
Presorted Key: t1.v1
-> Index Scan using test1_v1_idx on test1 t1
(cost=0.42..47604.01 rows=1000000 width=12)
-> Materialize (cost=1128.83..118969.00 rows=1000000 width=12)
-> Partial sort (cost=1128.83..116469.00 rows=1000000 width=12)
Sort Key: t2.v1, t2.v2
Presorted Key: t2.v1
-> Index Scan using test2_v1_idx on test2 t2
(cost=0.42..47602.22 rows=1000000 width=12)
I believe now patch covers desired functionality. I'm going to focus on
nailing down details, refactoring and documenting.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
partial-sort-3.patch | application/octet-stream | 65.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2013-12-28 09:04:52 | Re: PoC: Partial sort |
Previous Message | MauMau | 2013-12-28 07:37:44 | Re: [bug fix] connection service file doesn't take effect with ECPG apps |