Re: [HACKERS] [PATCH] Incremental sort

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Incremental sort
Date: 2018-03-16 02:12:59
Message-ID: c590d275-a847-167c-015e-3f9c5f37314f@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/10/2018 06:05 PM, Alexander Korotkov wrote:
> On Sat, Mar 10, 2018 at 6:42 PM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru <mailto:a(dot)korotkov(at)postgrespro(dot)ru>> wrote:
>
> ...
>
> After some investigation of benchmark results, I found 2 sources of
> regressions of incremental sort.
>
> *Case 1: Underlying node scan lose is bigger than incremental sort win*
>
> ===== 33 [Wed Mar  7 10:14:14 CET 2018] scale:10000000 groups:10
> work_mem:64MB incremental:on max_workers:0 =====
> SELECT * FROM s_1 ORDER BY a, b
>                                                                    QUERY
> PLAN                                                                    
> -------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=1588080.84..1588080.84 rows=1 width=20) (actual
> time=5874.527..5874.527 rows=0 loops=1)
>    ->  Incremental Sort  (cost=119371.51..1488081.45 rows=9999939
> width=20) (actual time=202.842..5653.224 rows=10000000 loops=1)
>          Sort Key: s_1.a, s_1.b
>          Presorted Key: s_1.a
>          Sort Method: external merge  Disk: 29408kB
>          Sort Groups: 11
>          ->  Index Scan using s_1_a_idx on s_1  (cost=0.43..323385.52
> rows=9999939 width=20) (actual time=0.051..1494.105 rows=10000000 loops=1)
>  Planning time: 0.269 ms
>  Execution time: 5877.367 ms
> (9 rows)
>
> ===== 37 [Wed Mar  7 10:15:51 CET 2018] scale:10000000 groups:10
> work_mem:64MB incremental:off max_workers:0 =====
> SELECT * FROM s_1 ORDER BY a, b
>                                                           QUERY PLAN   
>                                                       
> ------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=1656439.93..1656439.93 rows=1 width=20) (actual
> time=4741.716..4741.716 rows=0 loops=1)
>    ->  Sort  (cost=1531440.69..1556440.54 rows=9999939 width=20) (actual
> time=3522.156..4519.278 rows=10000000 loops=1)
>          Sort Key: s_1.a, s_1.b
>          Sort Method: external merge  Disk: 293648kB
>          ->  Seq Scan on s_1  (cost=0.00..163694.39 rows=9999939
> width=20) (actual time=0.021..650.322 rows=10000000 loops=1)
>  Planning time: 0.249 ms
>  Execution time: 4777.088 ms
> (7 rows)
>
> In this case optimizer have decided that "Index Scan + Incremental
> Sort" would be cheaper than "Seq Scan + Sort". But it appears that
> the amount of time we loose by selecting Index Scan over Seq Scan is
> bigger than amount of time we win by selecting Incremental Sort over
> Sort. I would note that regular Sort consumes about 10X more disk
> space. I bet that all this space has fit to OS cache of test
> machine. But optimizer did expect actual IO to take place in this
> case. This has lead actual time to be inadequate the costing.
>

Yes, you're right the temporary file(s) likely fit into RAM in this test
(and even if they did not, the storage system is pretty good).

> *Case 2: Underlying node is not parallelyzed*
>
> ===== 178 [Wed Mar  7 11:18:53 CET 2018] scale:10000000 groups:100
> work_mem:8MB incremental:on max_workers:2 =====
> SELECT * FROM s_2 ORDER BY a, b, c
>                                                                    
>  QUERY PLAN                                                             
>        
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=1179047.88..1179047.88 rows=1 width=20) (actual
> time=4819.999..4819.999 rows=0 loops=1)
>    ->  Incremental Sort  (cost=89.04..1079047.34 rows=10000054 width=20)
> (actual time=0.203..4603.197 rows=10000000 loops=1)
>          Sort Key: s_2.a, s_2.b, s_2.c
>          Presorted Key: s_2.a, s_2.b
>          Sort Method: quicksort  Memory: 135kB
>          Sort Groups: 10201
>          ->  Index Scan using s_2_a_b_idx on s_2  (cost=0.43..406985.62
> rows=10000054 width=20) (actual time=0.052..1461.177 rows=10000000 loops=1)
>  Planning time: 0.313 ms
>  Execution time: 4820.037 ms
> (9 rows)
>
> ===== 182 [Wed Mar  7 11:20:11 CET 2018] scale:10000000 groups:100
> work_mem:8MB incremental:off max_workers:2 =====
> SELECT * FROM s_2 ORDER BY a, b, c
>                                                                  QUERY
> PLAN                                                                 
> --------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=1705580.76..1705580.76 rows=1 width=20) (actual
> time=3985.818..3985.818 rows=0 loops=1)
>    ->  Gather Merge  (cost=649951.66..1622246.98 rows=8333378 width=20)
> (actual time=1782.354..3750.868 rows=10000000 loops=1)
>          Workers Planned: 2
>          Workers Launched: 2
>          ->  Sort  (cost=648951.64..659368.36 rows=4166689 width=20)
> (actual time=1778.362..2091.253 rows=3333333 loops=3)
>                Sort Key: s_2.a, s_2.b, s_2.c
>                Sort Method: external merge  Disk: 99136kB
>                Worker 0:  Sort Method: external merge  Disk: 96984kB
>                Worker 1:  Sort Method: external merge  Disk: 97496kB
>                ->  Parallel Seq Scan on s_2  (cost=0.00..105361.89
> rows=4166689 width=20) (actual time=0.022..233.640 rows=3333333 loops=3)
>  Planning time: 0.265 ms
>  Execution time: 4007.591 ms
> (12 rows)
>
> The situation is similar to case #1 except that in the pair "Seq Scan
> + Sort" Sort also gets paralellyzed. In the same way as in previous
> case, disk writes/reads during external sort are overestimated,
> because they actually use OS cache. I would also say that it's not
> necessary wrong decision of optimizer, because doing this work in
> single backend may consume less resources despite being overall
> slower.
>

Yes, that seems like a likely explanation too.

I agree those don't seem like an issue in the Incremental Sort patch,
but like a more generic costing problems.

Thanks for looking into the benchmark results.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-03-16 02:44:39 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Tomas Vondra 2018-03-16 02:07:08 Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types