| From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> | 
|---|---|
| To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> | 
| Cc: | Hackety Man <hacketyman(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...) | 
| Date: | 2018-04-17 16:13:48 | 
| Message-ID: | CAFj8pRAOCactafq3yTeJEfKOdxBd1yjYZK+T6iEGBaDbk1hMdQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
2018-04-17 12:52 GMT+02:00 Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>:
>
>
> On 04/17/2018 07:17 AM, Pavel Stehule wrote:
>
>> Hi
>>
>> 2018-04-16 22:42 GMT+02:00 Hackety Man <hacketyman(at)gmail(dot)com <mailto:
>> hacketyman(at)gmail(dot)com>>:
>>
>> ...
>>
> >
>
>> A support of parallel query execution is not complete -  it doesn't work
>> in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be
>> - the worst case of EXISTS is same like COUNT), then due disabled parallel
>> execution the COUNT(*) is faster now. It is unfortunate, because I believe
>> so this issue will be fixed in few years.
>>
>>
> None of the issues seems to be particularly related to parallel query.
> It's much more likely a general issue with planning EXISTS / LIMIT and
> non-uniform data distribution.
I was wrong EXISTS are not supported. It looks like new dimension of
performance issues related to parallelism. I understand so this example is
worst case.
postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1
WHERE LOWER(text_distinct) = LOWER('Test5000001'));
                                                      QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
 Result  (cost=4.08..4.09 rows=1 width=1) (actual time=423.600..423.600
rows=1 loops=1)
   Buffers: shared hit=3296 read=2110
   InitPlan 1 (returns $0)
     ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0)
(actual time=423.595..423.595 rows=0 loops=1)
           Filter: (lower(text_distinct) = 'test5000001'::text)
           Rows Removed by Filter: 1000000
           Buffers: shared hit=3296 read=2110
 Planning Time: 0.133 ms
 Execution Time: 423.633 ms
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test5000001');
                                                               QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual
time=246.662..246.662 rows=1 loops=1)
   Buffers: shared hit=817 read=549
   ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual
time=246.642..246.656 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=817 read=549
         ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8)
(actual time=242.168..242.169 rows=1 loops=3)
               Buffers: shared hit=3360 read=2046
               ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00
rows=2083 width=0) (actual time=242.165..242.165 rows=0 loops=3)
                     Filter: (lower(text_distinct) = 'test5000001'::text)
                     Rows Removed by Filter: 333333
                     Buffers: shared hit=3360 read=2046
 Planning Time: 0.222 ms
 Execution Time: 247.927 ms
The cost of EXISTS is too low to use parallelism, and value is found too
late.
When I decrease startup cost to 0 of parallel exec I got similar plan,
similar time
postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1
WHERE LOWER(text_distinct) = LOWER('Test5000001'));
                                                             QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=2.43..2.44 rows=1 width=1) (actual time=246.398..246.402
rows=1 loops=1)
   Buffers: shared hit=885 read=489
   InitPlan 1 (returns $1)
     ->  Gather  (cost=0.00..12156.00 rows=5000 width=0) (actual
time=246.393..246.393 rows=0 loops=1)
           Workers Planned: 2
           Workers Launched: 2
           Buffers: shared hit=885 read=489
           ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00
rows=2083 width=0) (actual time=241.067..241.067 rows=0 loops=3)
                 Filter: (lower(text_distinct) = 'test5000001'::text)
                 Rows Removed by Filter: 333333
                 Buffers: shared hit=3552 read=1854
 Planning Time: 0.138 ms
 Execution Time: 247.623 ms
(13 rows)
From this perspective it looks so cost of EXISTS(subselect) is maybe too
low.
Regards
Pavel
>
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavel Stehule | 2018-04-17 16:28:19 | Re: [HACKERS] proposal: schema variables | 
| Previous Message | Hackety Man | 2018-04-17 15:43:36 | Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...) |