Re: Getting an optimal plan on the first execution of a pl/pgsql function

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Pedro França <pedro(dot)franca(at)golsat(dot)com(dot)br>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Getting an optimal plan on the first execution of a pl/pgsql function
Date: 2015-12-14 19:21:54
Message-ID: CAKFQuwbavm1XnbX1dS=8R6vrpUbnAo6HraqCg_UD2+v3nEjaRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Dec 14, 2015 at 11:53 AM, Pedro França <pedro(dot)franca(at)golsat(dot)com(dot)br>
wrote:

> I have a really busy function that I need to optimize the best way I can.
> This function is just a nested select statement that is requested several
> times a sec by a legacy application. I'm running a PostgreSQL 9.4 on a
> CentOS 6;
>
> The indexes are in place but I've noticed that it is only used after the
> first execution of the function.
>

​How do you know this?​

I think that the problem is that Postgres isn't getting the best execution
> plan at first because of a parameter that it is highly exclusive in the
> majority of the cases, but it can be not as good sometimes. We can't change
> the way we call the function to a plain sql statement or a view because we
> can't change the application code itself.
>
> When I test with EXPLAIN ANALYZE after the first execution, the query runs
> really fast but the aplication sessions call the function only once and
> then are terminated. I need that the first execution use the actual
> optimized plan.
>
> We tried messing around with the connector driver that manage the
> connection pooling to issue a DISCARD TEMP instead of DISCARD ALL, so it
> could keep the cached plan of the sessions and the performance improved a
> lot, but I don't want to do that in a production environment.
>

Given the constraints you've listed this seems like it might be your only
avenue of improvement.​ Your problem that the performance improvement is
seen due to caching effects. If you throw away the cache you loose the
improvement.

> I've tried to change the language to a sql function but it didn't help as
> the execution time didn't drop after the first execution.
>

​Yes, this likely would make thing worse...depending upon how it is called.

I've tried to add the "SET LOCAL join_collapse_limit = 1" too but it
> appears it doesn't work inside a function;
>

​I wouldn't expect that parameter to have any effect in this scenario.

Here is the function code:
>
> CREATE OR REPLACE FUNCTION public.ap_keepalive_geteqpid_veiid(
> IN tcbserie bigint,
> IN protocolo integer)
> RETURNS TABLE(eqpid integer, veiid integer, tcbid integer, veiplaca
> character varying, veiproprietariocliid integer, tcbtppid integer,
> tcbversao character, veirpmparametro double precision, tcbconfiguracao
> bigint, tcbevtconfig integer, veibitsalertas integer, sluid integer, harid
> integer) AS
> $BODY$
> BEGIN
>
> RETURN QUERY
> SELECT teqp.eqpID,
> teqp.eqpveiID AS veiID,
> tcb.tcbID,
> tvei.veiPlaca,
> tvei.veiProprietariocliID,
> tcb.tcbtppID,
> tcb.tcbVersao,
> tvei.veiRPMParametro,
> COALESCE(COALESCE(NULLIF(tcb.tcbConfiguracao, 0),
> tcc.clcConfiguracaoBitsVeic), 0) AS tcbConfiguracao,
> COALESCE(tcb.tcbevtConfig, 0) AS tcbevtConfig,
> COALESCE(tvei.veiBitsAlertas, 0) AS veiBitsAlertas,
> COALESCE(tvei.veisluID, 0) AS sluID,
> COALESCE(tcb.tcbharID, 0) AS harID
> FROM TabComputadorBordo tcb
> INNER JOIN TabEquipamento teqp ON teqp.eqptcbID = tcb.tcbID
> INNER JOIN TabPacoteProduto tpp ON teqp.eqptppID = tpp.tppID
> INNER JOIN TabVeiculos tvei ON teqp.eqpveiID = tvei.veiID
> LEFT JOIN TabCliente tcli ON tcli.cliid = tvei.veiProprietariocliID
> LEFT JOIN TabClienteConfig tcc ON tcc.clcCliID = tcli.cliID
> WHERE tcb.tcbserie = $1
> AND teqp.eqpAtivo = 1
> AND tpp.tppIDProtocolo = $2
> AND tvei.veiBloqueioSinal = 0;
>
> END
> $BODY$
> LANGUAGE plpgsql VOLATILE
> COST 10000
> ROWS 1;
>
> Execution plan in the first execution:
>

​You likely could make this STABLE instead of VOLATILE; though that doesn't
solve your problem.​

> "Function Scan on ap_keepalive_geteqpid_veiid (cost=0.25..0.26 rows=1
> width=116) (actual time=3.268..3.268 rows=1 loops=1)"
> "Planning time: 0.032 ms"
> "Execution time: 3.288 ms"
>
> Second execution:
>
> "Function Scan on ap_keepalive_geteqpid_veiid (cost=0.25..0.26 rows=1
> width=116) (actual time=0.401..0.402 rows=1 loops=1)"
> "Planning time: 0.058 ms"
> "Execution time: 0.423 ms"
>
>
​I'm doubting the query inside of the function is the problem here...it is
the function usage itself. Calling a function has overhead in that the
body of function needs to be processed. This only has to happen once per
session. The first call of the function incurs this overhead while
subsequent calls do not.

Pending others correcting me...I fairly certain regarding my conclusions
though somewhat inexperienced in doing this kind of diagnostics.

David J.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2015-12-14 20:41:57 Re: Getting an optimal plan on the first execution of a pl/pgsql function
Previous Message Pedro França 2015-12-14 18:53:45 Getting an optimal plan on the first execution of a pl/pgsql function