Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

From: Amarendra Konda <amar(dot)vijaya(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )
Date: 2020-05-07 17:36:02
Message-ID: CAJNAD0=NxGe+JfT6d_Wa4LXKhTcUckPyvDibSBMH31FGYRn8-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Hi Adrian,

Thanks for the reply. And i have kept latest execution plans, for various
SQL statements ( inner join, sub queries and placing values instead of sub
query) .
As suggested, tried with INNER JOIN, however result was similar to
subquery.

Is there any way we can tell the optimiser to process less number of rows
based on the LIMIT value ? ( i.e. may be SQL re-write) ?

*INNER SQL*

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pi.process_instance_id AS
pi_process_instance_id FROM process_instance pi WHERE pi.user_id =
'137074931866340' AND pi.app_id = '126502930200650';
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using fki_conv_konotor_user_user_id on public.process_instance
pi (cost=0.43..2.66 rows=1 width=8) *(actual time=0.018..0.019 rows=2
loops=1)*
Output: process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
Planning time: 0.119 ms
Execution time: 0.041 ms

*Full query - Sub query*

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id
AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id =
'126502930200650' AND pa.created > '1970-01-01 00:00:00' AND
pa.process_instance_id in (SELECT pi.process_instance_id AS
pi_process_instance_id FROM process_instance pi WHERE pi.user_id =
'137074931866340' AND pi.app_id = '126502930200650') ORDER BY
pa.process_instance_id, pa.created limit 50;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual
time=744.386..744.415 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3760 read=39316
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual
time=744.384..744.396 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=3760 read=39316
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual
time=0.044..727.297 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id,
pa.created
Buffers: shared hit=3754 read=39316
-> Index Scan using fki_conv_konotor_user_user_id on
public.process_instance pi (cost=0.43..2.66 rows=1 width=8) *(actual
time=0.009..0.015 rows=2 loops=1)*
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using
process_activity_process_instance_id_app_id_created_idx on
public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) *(actual
time=0.029..349.000 rows=23506 loops=2)*
Output: pa.process_activity_id,
pa.process_activity_type, pa.voice_url, pa.process_activity_user_id,
pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id,
pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
pa.status_fragment, pa.internal_meta, pa.interaction_id,
pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id =
pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND
(pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=3749 read=39316
Planning time: 2.547 ms
Execution time: 744.499 ms
(22 rows)

*Full query - INNER JOIN*

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id
AS pa_process_activity_id FROM process_activity pa INNER JOIN
process_instance pi ON pi.process_instance_id = pa.process_instance_id AND
pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND
pi.user_id = '137074931866340' AND pi.app_id = '126502930200650' ORDER BY
pa.process_instance_id, pa.created limit 50;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual
time=87.803..87.834 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual
time=87.803..87.815 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual
time=0.030..73.847 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id,
pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on
public.process_instance pi (cost=0.43..2.66 rows=1 width=8) *(actual
time=0.015..0.018 rows=2 loops=1)*
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using
process_activity_process_instance_id_app_id_created_idx on
public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) *(actual
time=0.011..21.447 rows=23506 loops=2)*
Output: pa.process_activity_id,
pa.process_activity_type, pa.voice_url, pa.process_activity_user_id,
pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id,
pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
pa.status_fragment, pa.internal_meta, pa.interaction_id,
pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id =
pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND
(pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.428 ms
Execution time: 87.905 ms

*FULL Query - INNER SQL replaced with result*

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS
m_process_activity_id FROM process_activity pa WHERE pa.app_id =
'126502930200650' AND pa.created > '1970-01-01 00:00:00' AND
pa.process_instance_id in (*137074941043913,164357609323111*) ORDER BY
pa.process_instance_id,pa.created limit 50;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
Limit (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095
rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=55
-> Index Scan using
process_activity_process_instance_id_app_id_created_idx on
public.process_activity pa (cost=0.70..2100.39 rows=2841 width=24) *(actual
time=0.015..0.077 rows=50 loops=1)*
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY
('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id =
'126502930200650'::bigint) AND (m.created > '1970-01-01
00:00:00'::timestamp without time
zone))
Buffers: shared hit=55
Planning time: 1.710 ms
Execution time: 0.147 ms

Regards, Amar

On Thu, May 7, 2020 at 8:10 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 5/7/20 4:19 AM, Amarendra Konda wrote:
> > Hi,
> >
> > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
> >
> > We have noticed huge difference interms of execution plan ( response
> > time) , When we pass the direct values Vs inner query to IN clause.
> >
> > High level details of the use case are as follows
> >
> > * As part of the SQL there are 2 tables named Process_instance
> > (master) and Process_activity ( child)
> > * Wanted to fetch TOP 50 rows from Process_activity table for the
> > given values of the Process_instance.
> > * When we used Inner Join / Inner query ( query1) between parent
> > table and child table , LIMIT is not really taking in to account.
> > Instead it is fetching more rows and columns that required, and
> > finally limiting the result
>
> It is doing what you told it to do which is SELECT all
> process_instance_i's for user_id='317079413683604' and app_id =
> '427380312000560' and then filtering further. I am going to guess that
> if you run the inner query alone you will find it returns ~23496 rows.
> You might have better results if you an actual join between
> process_activity and process_instance. Something like below(obviously
> not tested):
>
> SELECT
> pa.process_activity_id
> FROM
> process_activity pa
> JOIN
> process_instance pi
> ON
> pa.process_instance_id = pi.process_instance_id
> WHERE
> pa.app_id = '427380312000560'
> AND
> pa.created > '1970-01-01 00:00:00'
> AND
> pi.user_id = '317079413683604'
> ORDER BY
> pa.process_instance_id,
> pa.created
> LIMIT 50;
>
> The second query is not equivalent as you are not filtering on user_id
> and you are filtering on only three process_instance_id's.
>
>
> > *
> >
> >
> > *Query1*
> >
> > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> > pa.process_activity_id FROM process_activity pa WHERE pa.app_id =
> > '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND
> > pa.process_instance_id in *_(SELECT pi.process_instance_id FROM
> > process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id =
> > '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50;
> >
> >
> > QUERY
> PLAN
> >
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=1071.47..1071.55 rows=31 width=24) (actual
> > time=85.958..85.991 rows=50 loops=1)
> > Output: pa.process_activity_id, pa.process_instance_id, pa.created
> > Buffers: shared hit=43065
> > -> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual
> > time=85.956..85.971 rows=50 loops=1)
> > Output: pa.process_activity_id, pa.process_instance_id,
> pa.created
> > Sort Key: pa.process_instance_id, pa.created
> > Sort Method: top-N heapsort Memory: 28kB
> > Buffers: shared hit=43065
> > -> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual
> > time=0.031..72.183 rows=46992 loops=1)
> > Output: pa.process_activity_id, pa.process_instance_id,
> > pa.created
> > Buffers: shared hit=43065
> > -> Index Scan using fki_conv_konotor_user_user_id on
> > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual
> > time=0.010..0.013 rows=2 loops=1)
> > Output: pi.process_instance_id
> > Index Cond: (pi.user_id =
> '317079413683604'::bigint)
> > Filter: (pi.app_id = '427380312000560'::bigint)
> > Buffers: shared hit=5
> > -> Index Scan using
> > process_activity_process_instance_id_app_id_created_idx on
> > public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24)
> > (actual time=0.015..20.702 rows=*23496* loops=2)
> > * Output: pa.process_activity_id, pa.process_activity_type,
> > pa.voice_url, pa.process_activity_user_id, pa.app_id,
> > pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
> > pa.label_category_id, pa.label_id, pa.csat_response_id,
> > pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id,
> pa.market
> > ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment,
> > pa.internal_meta, pa.interaction_id, pa.do_not_translate,
> > pa.should_translate, pa.in_reply_to*
> > Index Cond: ((pa.process_instance_id =
> > pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND
> > (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
> > Buffers: shared hit=43060
> > Planning time: 0.499 ms
> > Execution time: 86.040 ms
> > (22 rows)
> >
> > *_Query 2_*
> >
> > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> > pa.process_activity_id AS m_process_activity_id FROM process_activity m
> > WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01
> > 00:00:00' AND pa.process_instance_id in
> > (*240117466018927,325820556706970,433008275197305*) ORDER BY
> > pa.process_instance_id,pa.created limit 50;
> >
> > QUERY PLAN
> >
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094
> > rows=50 loops=1)
> > Output: process_activity_id, process_instance_id, created
> > Buffers: shared hit=50
> > -> Index Scan using
> > process_activity_process_instance_id_app_id_created_idx on
> > public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24)
> > (actual time=0.022..0.079 *rows=50* loops=1)
> > Output: process_activity_id, process_instance_id, created
> > Index Cond: ((pa.process_instance_id = ANY
> > ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND
> > (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01
> > 00:00:00'::timestamp without time zone))
> > Buffers: shared hit=50
> > Planning time: 0.167 ms
> > Execution time: 0.137 ms
> > (9 rows)
> >
> >
> > Can someone explain
> >
> > * Why It is fetching more columns and more rows, incase of inner query
> ?
> > * Is there any option to really limit values with INNER JOIN, INNER
> > query ? If yes, can you please share information on this ?
> >
> > Thanks in advance for your time and suggestions.
> >
> > Regards, Amar
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-05-07 17:36:38 Re: pg_dump negation regex
Previous Message Christian Ramseyer 2020-05-07 17:01:47 Re: Lock Postgres account after X number of failed logins?

Browse pgsql-performance by date

  From Date Subject
Next Message Amarendra Konda 2020-05-07 17:49:18 Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )
Previous Message samhitha g 2020-05-07 17:10:55 pg_attribute, pg_class, pg_depend grow huge in count and size with multiple tenants.