| From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> | 
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> | 
| Cc: | Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: [PATCH] Incremental sort | 
| Date: | 2017-03-29 14:14:51 | 
| Message-ID: | CAPpHfdsnMTL4V57eutAt68ZeA7d4N-eWfthcXY=2PK0P5iU+kw@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Mon, Mar 20, 2017 at 5:19 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
>
>> Please, find rebased patch in the attachment.
>>
>
> I had a quick look at this.
>
> * I'd love to have an explanation of what an Incremental Sort is, in the
> file header comment for nodeIncrementalSort.c.
>
Done.
* I didn't understand the maxMem stuff in tuplesort.c. The comments there
> use the phrase "on-disk memory", which seems like an oxymoron. Also,
> "maximum status" seems weird, as it assumes that there's a natural order to
> the states.
>
Variables were renamed.
* In the below example, the incremental sort is significantly slower than
> the Seq Scan + Sort you get otherwise:
>
> create table foo (a int4, b int4, c int4);
> insert into sorttest select g, g, g from generate_series(1, 1000000) g;
> vacuum foo;
> create index i_sorttest on sorttest (a, b, c);
> set work_mem='100MB';
>
>
> postgres=# explain select count(*) from (select * from sorttest order by
> a, c) as t;
>                                               QUERY PLAN
> ------------------------------------------------------------
> -------------------------------------------
>  Aggregate  (cost=138655.68..138655.69 rows=1 width=8)
>    ->  Incremental Sort  (cost=610.99..124870.38 rows=1102824 width=12)
>          Sort Key: sorttest.a, sorttest.c
>          Presorted Key: sorttest.a
>          ->  Index Only Scan using i_sorttest on sorttest
> (cost=0.43..53578.79 rows=1102824 width=12)
> (5 rows)
>
> Time: 0.409 ms
> postgres=# select count(*) from (select * from sorttest order by a, c) as
> t;
>   count
> ---------
>  1000000
> (1 row)
>
> Time: 387.091 ms
>
>
> postgres=# explain select count(*) from (select * from sorttest order by
> a, c) as t;
>                                   QUERY PLAN
> ------------------------------------------------------------
> -------------------
>  Aggregate  (cost=130063.84..130063.85 rows=1 width=8)
>    ->  Sort  (cost=115063.84..117563.84 rows=1000000 width=12)
>          Sort Key: sorttest.a, sorttest.c
>          ->  Seq Scan on sorttest  (cost=0.00..15406.00 rows=1000000
> width=12)
> (4 rows)
>
> Time: 0.345 ms
> postgres=# select count(*) from (select * from sorttest order by a, c) as
> t;
>   count
> ---------
>  1000000
> (1 row)
>
> Time: 231.668 ms
>
> According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
> alleviate that, it might be worthwhile to add a special case for when the
> group contains exactly one group, and not put the tuple to the tuplesort in
> that case.
I'm not sure we should do such optimization for one tuple per group, since
it's similar situation with 2 or 3 tuples per group.
> Or if we cannot ensure that the Incremental Sort is actually faster, the
> cost model should probably be smarter, to avoid picking an incremental sort
> when it's not a win.
I added to cost_sort() extra costing for incremental sort: cost of extra
tuple copying and comparing as well as cost of tuplesort reset.
The only problem is that I made following estimate for tuplesort reset:
run_cost += 10.0 * cpu_tuple_cost * num_groups;
It makes ordinal sort to be selected in your example, but it contains
constant 10 which is quite arbitrary.  It would be nice to evade such hard
coded constants, but I don't know how could we calculate such cost
realistically.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
| Attachment | Content-Type | Size | 
|---|---|---|
| incremental-sort-4.patch | application/octet-stream | 112.5 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2017-03-29 14:14:53 | Re: Transactions involving multiple postgres foreign servers | 
| Previous Message | Peter Eisentraut | 2017-03-29 13:38:12 | Re: [PATCH] Reduce src/test/recovery verbosity |