| From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> | 
|---|---|
| To: | Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru> | 
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: why subplan is 10x faster then function? | 
| Date: | 2017-10-04 19:20:15 | 
| Message-ID: | CAFj8pRB45AndbS=zFskOtCaojzPva45S3Dc0mYfJFw0h6v4cxg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
2017-10-01 12:45 GMT+02:00 Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>:
> 1 октября 2017 г. 12:42:14 GMT+03:00, Pavel Stehule <
> pavel(dot)stehule(at)gmail(dot)com> пишет:
> >2017-09-30 23:23 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> >
> >> Hi
> >>
> >> I have some strange slow queries based on usage "view" functions
> >>
> >> one function looks like this:
> >>
> >> CREATE OR REPLACE FUNCTION
> >ides_funcs.najdatsplt_cislo_exekuce(mid_najdatsplt
> >> bigint)
> >>  RETURNS character varying
> >>  LANGUAGE sql
> >>  STABLE
> >> AS $function$
> >> select CISLOEXEKUCE
> >>       from najzalobpr MT, najvzallok A1,
> >>                     NAJZALOBST A2, NAJZALOBCE A3 where
> >>                     MT.ID_NAJVZALLOK= A1.ID_NAJVZALLOK AND
> >>                     A1.ID_NAJZALOBST=A2.ID_NAJZALOBST AND
> >>                     A2.ID_NAJZALOBCE= A3.ID_NAJZALOBCE AND
> >>                     MT.ID_NAJDATSPLT = mID_NAJDATSPLT  LIMIT 1;
> >> $function$ cost 20
> >> ;
> >>
> >> I know so using this kind of functions is not good idea - it is
> >customer
> >> old code generated from Oracle. I had idea about possible planner
> >issues.
> >> But this is a executor issue.
> >>
> >> when this function is evaluated as function, then execution needs
> >about 46
> >> sec
> >>
> >>     ->  Nested Loop Left Join  (cost=0.71..780360.31 rows=589657
> >> width=2700) (actual time=47796.588..47796.588 rows=0 loops=1)
> >>           ->  Nested Loop  (cost=0.29..492947.20 rows=589657
> >width=2559)
> >> (actual time=47796.587..47796.587 rows=0 loops=1)
> >>                 ->  Seq Scan on najdatsplt mt  (cost=0.00..124359.24
> >> rows=1106096 width=1013) (actual time=47796.587..47796.587 rows=0
> >loops=1)
> >>                       Filter:
> >(najdatsplt_cislo_exekuce(id_najdatsplt) IS
> >> NOT NULL)
> >>                       Rows Removed by Filter: 1111654
> >>
> >> When I use correlated subquery, then
> >>
> >>  ->  Nested Loop  (cost=0.29..19876820.11 rows=589657 width=2559)
> >(actual
> >> time=3404.154..3404.154 rows=0 loops=1)
> >>   ->  Seq Scan on najdatsplt mt  (cost=0.00..19508232.15 rows=1106096
> >> width=1013) (actual time=3404.153..3404.153 rows=0 loops=1)
> >>       Filter: ((SubPlan 11) IS NOT NULL)
> >>       Rows Removed by Filter: 1111654
> >>       SubPlan 11
> >>         ->  Limit  (cost=1.10..17.49 rows=1 width=144) (actual
> >> time=0.002..0.002 rows=0 loops=1111654)
> >>               ->  Nested Loop  (cost=1.10..17.49 rows=1 width=144)
> >(actual
> >> time=0.002..0.002 rows=0 loops=1111654)
> >>                     ->  Nested Loop  (cost=0.83..17.02 rows=1
> >width=8)
> >> (actual time=0.002..0.002 rows=0 loops=1111654)
> >>                           ->  Nested Loop  (cost=0.56..16.61 rows=1
> >> width=8) (actual time=0.002..0.002 rows=0 loops=1111654)
> >>
> >> The execution plan is +/- same - the bottleneck is in function
> >execution
> >>
> >> Tested with same result on 9.6, 10.
> >>
> >> Is known overhead of function execution?
> >>
> >>
> >profile of slow execution looks like
> >
> >+   24,71%    24,40%         48235  postmaster  [.] SearchCatCache
> >+   14,25%     0,00%             0  postmaster  [unknown]           [.]
> >0000000000000000
> >+    9,76%     9,65%         19071  postmaster  [.]
> >TupleDescInitEntry
> >+    3,91%     3,86%          7625  postmaster  [.]
> >ExecAssignScanProjectionInfoWithVarno
> >+    3,56%     3,52%          6955  postmaster  [.] AllocSetAlloc
> >+    2,66%     2,63%          5193  postmaster  [.]
> >FunctionCall2Coll
> >+    2,65%     2,62%          5183  postmaster  [.]
> >ResourceArrayRemove
> >+    2,42%     2,39%          4719  postmaster  [.]
> >ExecTypeFromTLInternal
> >+    2,21%     2,19%          4321  postmaster  [.]
> >DirectFunctionCall1Coll
> >+    2,02%     2,00%          3961  postmaster  [.]
> >heap_getsysattr
> >+    1,85%     1,82%          3604  postmaster  [.]
> >exprTypmod
> >+    1,81%     1,79%          3540  postmaster  [.]
> >ResourceArrayAdd
> >+    1,68%     1,66%          3282  postmaster  [.]
> >hash_uint32
> >+    1,65%     1,63%          3214  postmaster  [.]
> >hash_search_with_hash_value
> >+    1,64%     1,62%          3208  postmaster  [.]
> >CatalogCacheComputeHashValue
> >+    1,28%     1,26%          2498  postmaster  [.]
> >MemoryContextAllocZeroAligned
> >+    1,25%     1,24%          2446  postmaster  [.] palloc0
> >
> >Any ides why SearchCatCache is called too often?
> >
> >
> >
> >> Regards
> >>
> >> Pavel
> >>
>
> Looks like you've already collected profile with call-graph. So you can
> tell us where it were called from.
>
I have more info no. Probably ExecInitIndexScan/ExecAssignResultTypeFromTL
is very expensive, when it is called often (in this case for every row of
table)
                  - 20,88% ExecInitIndexScan
                     - 16,31% ExecAssignResultTypeFromTL
                        - 16,22% ExecTypeFromTL
                           - 15,87% ExecTypeFromTLInternal
                              - 13,39% TupleDescInitEntry
                                 - 9,40% SearchSysCache
                                    - 8,65% SearchCatCache
                                       - 1,87%
CatalogCacheComputeHashValue
                                          + 1,27% DirectFunctionCall1Coll
or
   - 79,65%
exec_stmt_execsql
▒
      - 77,50%
SPI_execute_plan_with_paramlist
▒
         - 77,12%
_SPI_execute_plan
▒
            - 74,06%
_SPI_pquery
▒
               - 64,21%
ExecutorStart
▒
                  - 64,11%
standard_ExecutorStart
▒
                     - 63,60%
InitPlan
▒
                        - 61,62%
ExecInitNode
▒
                           - 61,46%
ExecInitLimit
▒
                              - 60,00%
ExecInitNode
▒
                                 - 59,89%
ExecInitNestLoop
▒
                                    - 58,71%
ExecInitNode                                                              ▒
                                       - 37,69%
ExecInitNestLoop                                                       ▒
                                          - 36,80%
ExecInitNode                                                        ▒
                                             - 21,36%
ExecInitNestLoop                                                 ▒
                                                - 20,38%
ExecInitNode                                                  ▒
                                                   - 20,25%
ExecInitIndexScan                                          ▒
                                                      + 11,58%
ExecAssignResultTypeFromTL                              ▒
                                                      + 2,46%
ExecInitQual                                             ▒
                                                      + 1,95%
ExecIndexBuildScanKeys                                   ▒
                                                      + 1,10%
ExecOpenScanRelation                                     ▒
                                                      + 0,92%
ExecAssignExprContext                                    ▒
                                                      + 0,65%
ExecAssignScanProjectionInfo                             ▒
                                                      + 0,63%
index_open                                               ▒
                                             - 15,37%
ExecInitIndexScan                                                ▒
                                                - 11,39%
ExecAssignResultTypeFromTL                                    ▒
                                                   - 11,31%
ExecTypeFromTL                                             ▒
                                                      - 11,06%
ExecTypeFromTLInternal                                  ▒
                                                         - 9,36%
TupleDescInitEntry                                    ▒
                                                            + 6,59%
SearchSysCache                                     ▒
                                                            + 0,95%
ReleaseSysCache                                    ▒
                                                           0,50%
exprTypmod                                            ▒
                                                + 1,05%
ExecInitQual                                                   ▒
                                                  0,81%
ExecIndexBuildScanKeys                                         ▒
                                                + 0,63%
ExecAssignScanProjectionInfo                                   ▒
                                       + 20,88%
ExecInitIndexScan                                                      ▒
                                      0,54% ExecAssignResultTypeFromTL
ExecInit time is significantly higher than run time .. 65% / 6%
Regards
Pavel
>
> With regards,
> --
> Sokolov Yura aka funny_falcon
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2017-10-04 19:21:04 | Re: [sqlsmith] crash in RestoreLibraryState during low-memory testing | 
| Previous Message | Robert Haas | 2017-10-04 19:12:32 | Re: 64-bit queryId? |