Re: Function runtime increases after 5 calls in the same session.

From: Ilya Ashchepkov <koctep(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Function runtime increases after 5 calls in the same session.
Date: 2014-10-22 16:38:31
Message-ID: m28mi9$6gr$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Wrote in message:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>> On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep(at)gmail(dot)com> wrote:
>>> I wrote a function and during testing it I came across the strange
>>> behaviour.
>>> Function runtime is about 200ms first 5 times, 6th and futher calls takes
>>> ~22000 ms.
>>> I simplified my schema, you can see it in attached file.
>>> I've tested on 9.4beta3 and 9.3.5.
>
>> I didn't look at the test case. But most likely the problem is that
>> after 5 executions, plancache decides that a generic plan is no more
>> expensive than a custom plan, and reverts to executing the generic
>> one. But in reality there is a big difference in execution time.
>
> Yeah. Ilya didn't show the exact case he was testing, but I suppose
> it was one where the timestamptz range covered the whole table (since
> the test script created a table with a mighty tiny range of date values,
> it would have taken some effort to do otherwise). Anyway I tried it with
> this case:
> select test.fun (1, 'yesterday', 'tomorrow');
> and indeed reproduced a big slowdown in later executions.
>
> You can examine what the planner is doing with prepared statements,
> such as ones in plpgsql functions, via PREPARE/EXECUTE:
>
> regression=# prepare foo(int, timestamptz, timestamptz) as
> select sum(V1.data + V2.data)
> from test.view V1
> inner join test.view V2 using(id)
> where
> V1.object_id=$1
> and V1.time between $2 and $3
> and V2.time between $2 and $3;
> PREPARE
>
> The first time you try it you get:
>
> regression=# explain execute foo (1, 'yesterday', 'tomorrow');
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=7037.25..7037.26 rows=1 width=8)
> -> Merge Join (cost=5909.71..6759.53 rows=55544 width=8)
> Merge Cond: (v1.id = v2.id)
> -> Sort (cost=2967.35..2975.69 rows=3333 width=8)
> Sort Key: v1.id
> -> Subquery Scan on v1 (cost=0.00..2772.33 rows=3333 width=8)
> -> Seq Scan on data (cost=0.00..2739.00 rows=3333 width=8)
> Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <= '2014-10-22 00:00:00-04'::timestamp with time zone) AND (object_id = 1) AND test.can_read(object_id))
> -> Sort (cost=2942.35..2950.69 rows=3333 width=8)
> Sort Key: v2.id
> -> Subquery Scan on v2 (cost=0.00..2747.33 rows=3333 width=8)
> -> Seq Scan on data data_1 (cost=0.00..2714.00 rows=3333 width=8)
> Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <= '2014-10-22 00:00:00-04'::timestamp with time zone) AND test.can_read(object_id))
> (13 rows)
>
> After repeating that a few times it switches to:
>
> regression=# explain execute foo (1, 'yesterday', 'tomorrow');
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=428.63..428.64 rows=1 width=8)
> -> Nested Loop (cost=5.20..428.54 rows=17 width=8)
> Join Filter: (data.id = v2.id)
> -> Bitmap Heap Scan on data (cost=4.91..81.29 rows=17 width=8)
> Recheck Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
> Filter: test.can_read(object_id)
> -> Bitmap Index Scan on data_object_id_time_idx (cost=0.00..4.91 rows=50 width=0)
> Index Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
> -> Materialize (cost=0.29..342.79 rows=17 width=8)
> -> Subquery Scan on v2 (cost=0.29..342.71 rows=17 width=8)
> -> Index Scan using data_object_id_time_idx on data data_1 (cost=0.29..342.54 rows=17 width=8)
> Index Cond: (("time" >= $2) AND ("time" <= $3))
> Filter: test.can_read(object_id)
> (13 rows)
>
> Notice the plan now contains parameter markers $n instead of literal
> constants; this is because this is a generic plan. The problem is
> that the planner has estimated this case a lot cheaper than it did
> when it saw the actual parameter values (and could determine that the
> query was going to have to scan the entire table). Without the actual
> parameter values, it has to fall back on rules of thumb to estimate
> how much of the table will be scanned. Although it doesn't know the
> comparison values, it does recognize the "between" constructs as being
> range restrictions, and its rule of thumb is that those are fairly
> selective --- about 1% IIRC. So it thinks the query will not need to
> read very much of the table and chooses a plan that'd be appropriate
> if that were true.
>
> The custom-vs-generic-plan heuristic is designed for cases where the
> generic plan is based on unduly pessimistic assumptions; which is where
> most of the problems have been historically. Here we've got one where
> the generic plan is based on unduly optimistic assumptions. Not sure
> how we might fix that without breaking the more commonly complained-of
> case.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

Thank you for detailed explanation.
You guessed right my test, I forgot to add it to the first message.
I'll add more data and try more realistic test
--

Ilya

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2014-10-22 17:21:48 Re: Index scan vs indexonly scan method
Previous Message Enrico Pirozzi 2014-10-22 16:18:41 Re: Index scan vs indexonly scan method