Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

From: Hackety Man <hacketyman(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)
Date: 2018-04-17 14:01:14
Message-ID: CA+1fJ+-ibpa4puzKpQm6EYNiXOkD5rx0wqkO7ktOG9jVdY_zBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
> On 04/16/2018 10:42 PM, Hackety Man wrote:
>
>> ...
>> The first thing I did was to run some baseline tests using the basic
>> queries inside of the IF() checks found in each of the functions to
>> see how the query planner handled them. I ran the following two
>> queries.
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test5000001');
>> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test5000001');
>>
>>
> Those are not the interesting plans, though. The EXISTS only cares about
> the first row, so you should be looking at
>
> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
> LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;

Okay. I tested this query and it did return an execution time on par with
my tests of the "zz_spx_ifexists_noidx" function.

>
>
> I moved on to test the other function with the following query...
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>> zz_spx_ifcount_noidx('Test5000001');
>>
>> and I got the following "auto_explain" results...
>>
>> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 426.279 ms
>> plan:
>> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>> Result (cost=4.08..4.09 rows=1 width=1) (actual
>> time=426.274..426.274 rows=1 loops=1)
>> Buffers: shared read=5406
>> InitPlan 1 (returns $0)
>> -> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000
>> width=0) (actual time=426.273..426.273 rows=0 loops=1)
>> Filter: (lower(text_distinct) = 'test5000001'::text)
>> Rows Removed by Filter: 1000000
>> Buffers: shared read=5406
>> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT: SQL statement
>> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 428.077 ms
>> plan:
>> Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifexists_noidx('Test5000001')
>> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26
>> rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
>> Buffers: shared hit=30 read=5438
>>
>> Definitely not the execution time, or query plan, results I was
>> expecting. As we can see, no workers were employed here and my
>> guess was that this was the reason or the large execution time
>> difference between these 2 tests? 199 milliseconds versus 428
>> milliseconds, which is a big difference. Why are workers not being
>> employed here like they were when I tested the query found inside of
>> the IF() check in a standalone manner? But then I ran another test
>> and the results made even less sense to me.
>>
>>
> The plan difference is due to not realizing the EXISTS essentially implies
> LIMIT 1. Secondly, it expects about 5000 rows matching the condition,
> uniformly spread through the table. But it apparently takes much longer to
> find the first one, hence the increased duration.
>

Ah. I did not know that. So EXISTS inherently applies a LIMIT 1, even
though it doesn't show in the query plan, correct? Is it not possible for
parallel scans to be implemented while applying an implicit, or explicit,
LIMIT 1?

>
> How did you generate the data?

I used generate_series() to create 1 million records in sequence at the
same time that I created the table using the following script...

CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS
int_distinct, 'Test'::text || generate_series(0, 999999)::text AS
text_distinct;

>
> When I ran the above query the first 5 times after starting my
>> Postgres service, I got the same results each time (around 428
>> milliseconds), but when running the query 6 or more times, the
>> execution time jumps up to almost double that. Here are the
>> "auto_explain" results running this query a 6th time...
>>
>>
> This is likely due to generating a generic plan after the fifth execution.
> There seems to be only small difference in costs, though.
>
>
> --"auto_explain" results after running the same query 6 or more
>> times.
>> 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 761.847 ms
>> plan:
>> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>> Result (cost=4.58..4.59 rows=1 width=1) (actual
>> time=761.843..761.843 rows=1 loops=1)
>> Buffers: shared hit=160 read=5246
>> InitPlan 1 (returns $0)
>> -> Seq Scan on zz_noidx1 (cost=0.00..22906.00 rows=5000
>> width=0) (actual time=761.841..761.841 rows=0 loops=1)
>> Filter: (lower(text_distinct) = lower($1))
>> Rows Removed by Filter: 1000000
>> Buffers: shared hit=160 read=5246
>> 2018-04-16 15:01:51.635 EDT [12616] CONTEXT: SQL statement
>> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>> 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 762.156 ms
>> plan:
>> Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifexists_noidx('Test5000001')
>> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26
>> rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
>> Buffers: shared hit=160 read=5246
>>
>> As you can see, the execution time jumps up to about 762
>> milliseonds. I can see in the sequence scan node that the LOWER()
>> function shows up on the right side of the equal operator, whereas
>> in the first 5 runs of this test query the plan did not show this.
>> Why is this?
>>
>>
> It doesn't really matter on which side it shows, it's more about a generic
> plan built without knowledge of the parameter value.
>

Right. I was more wondering why it switched over to a generic plan, as
you've stated, like clockwork starting with the 6th execution run.

>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hackety Man 2018-04-17 14:05:45 Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)
Previous Message Akshay Ballarpure 2018-04-17 12:24:12 RE: Data migration from postgres 8.4 to 9.4