From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] [PATCH] Incremental sort |
Date: | 2018-03-16 02:12:59 |
Message-ID: | c590d275-a847-167c-015e-3f9c5f37314f@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/10/2018 06:05 PM, Alexander Korotkov wrote:
> On Sat, Mar 10, 2018 at 6:42 PM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru <mailto:a(dot)korotkov(at)postgrespro(dot)ru>> wrote:
>
> ...
>
> After some investigation of benchmark results, I found 2 sources of
> regressions of incremental sort.
>
> *Case 1: Underlying node scan lose is bigger than incremental sort win*
>
> ===== 33 [Wed Mar 7 10:14:14 CET 2018] scale:10000000 groups:10
> work_mem:64MB incremental:on max_workers:0 =====
> SELECT * FROM s_1 ORDER BY a, b
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1588080.84..1588080.84 rows=1 width=20) (actual
> time=5874.527..5874.527 rows=0 loops=1)
> -> Incremental Sort (cost=119371.51..1488081.45 rows=9999939
> width=20) (actual time=202.842..5653.224 rows=10000000 loops=1)
> Sort Key: s_1.a, s_1.b
> Presorted Key: s_1.a
> Sort Method: external merge Disk: 29408kB
> Sort Groups: 11
> -> Index Scan using s_1_a_idx on s_1 (cost=0.43..323385.52
> rows=9999939 width=20) (actual time=0.051..1494.105 rows=10000000 loops=1)
> Planning time: 0.269 ms
> Execution time: 5877.367 ms
> (9 rows)
>
> ===== 37 [Wed Mar 7 10:15:51 CET 2018] scale:10000000 groups:10
> work_mem:64MB incremental:off max_workers:0 =====
> SELECT * FROM s_1 ORDER BY a, b
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1656439.93..1656439.93 rows=1 width=20) (actual
> time=4741.716..4741.716 rows=0 loops=1)
> -> Sort (cost=1531440.69..1556440.54 rows=9999939 width=20) (actual
> time=3522.156..4519.278 rows=10000000 loops=1)
> Sort Key: s_1.a, s_1.b
> Sort Method: external merge Disk: 293648kB
> -> Seq Scan on s_1 (cost=0.00..163694.39 rows=9999939
> width=20) (actual time=0.021..650.322 rows=10000000 loops=1)
> Planning time: 0.249 ms
> Execution time: 4777.088 ms
> (7 rows)
>
> In this case optimizer have decided that "Index Scan + Incremental
> Sort" would be cheaper than "Seq Scan + Sort". But it appears that
> the amount of time we loose by selecting Index Scan over Seq Scan is
> bigger than amount of time we win by selecting Incremental Sort over
> Sort. I would note that regular Sort consumes about 10X more disk
> space. I bet that all this space has fit to OS cache of test
> machine. But optimizer did expect actual IO to take place in this
> case. This has lead actual time to be inadequate the costing.
>
Yes, you're right the temporary file(s) likely fit into RAM in this test
(and even if they did not, the storage system is pretty good).
> *Case 2: Underlying node is not parallelyzed*
>
> ===== 178 [Wed Mar 7 11:18:53 CET 2018] scale:10000000 groups:100
> work_mem:8MB incremental:on max_workers:2 =====
> SELECT * FROM s_2 ORDER BY a, b, c
>
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1179047.88..1179047.88 rows=1 width=20) (actual
> time=4819.999..4819.999 rows=0 loops=1)
> -> Incremental Sort (cost=89.04..1079047.34 rows=10000054 width=20)
> (actual time=0.203..4603.197 rows=10000000 loops=1)
> Sort Key: s_2.a, s_2.b, s_2.c
> Presorted Key: s_2.a, s_2.b
> Sort Method: quicksort Memory: 135kB
> Sort Groups: 10201
> -> Index Scan using s_2_a_b_idx on s_2 (cost=0.43..406985.62
> rows=10000054 width=20) (actual time=0.052..1461.177 rows=10000000 loops=1)
> Planning time: 0.313 ms
> Execution time: 4820.037 ms
> (9 rows)
>
> ===== 182 [Wed Mar 7 11:20:11 CET 2018] scale:10000000 groups:100
> work_mem:8MB incremental:off max_workers:2 =====
> SELECT * FROM s_2 ORDER BY a, b, c
> QUERY
> PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1705580.76..1705580.76 rows=1 width=20) (actual
> time=3985.818..3985.818 rows=0 loops=1)
> -> Gather Merge (cost=649951.66..1622246.98 rows=8333378 width=20)
> (actual time=1782.354..3750.868 rows=10000000 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Sort (cost=648951.64..659368.36 rows=4166689 width=20)
> (actual time=1778.362..2091.253 rows=3333333 loops=3)
> Sort Key: s_2.a, s_2.b, s_2.c
> Sort Method: external merge Disk: 99136kB
> Worker 0: Sort Method: external merge Disk: 96984kB
> Worker 1: Sort Method: external merge Disk: 97496kB
> -> Parallel Seq Scan on s_2 (cost=0.00..105361.89
> rows=4166689 width=20) (actual time=0.022..233.640 rows=3333333 loops=3)
> Planning time: 0.265 ms
> Execution time: 4007.591 ms
> (12 rows)
>
> The situation is similar to case #1 except that in the pair "Seq Scan
> + Sort" Sort also gets paralellyzed. In the same way as in previous
> case, disk writes/reads during external sort are overestimated,
> because they actually use OS cache. I would also say that it's not
> necessary wrong decision of optimizer, because doing this work in
> single backend may consume less resources despite being overall
> slower.
>
Yes, that seems like a likely explanation too.
I agree those don't seem like an issue in the Incremental Sort patch,
but like a more generic costing problems.
Thanks for looking into the benchmark results.
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2018-03-16 02:44:39 | Re: Parallel Aggregates for string_agg and array_agg |
Previous Message | Tomas Vondra | 2018-03-16 02:07:08 | Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types |