Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

From: Amarendra Konda <amar(dot)vijaya(at)gmail(dot)com>
To: Virendra Kumar <viru_7683(at)yahoo(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )
Date: 2020-05-07 18:51:45
Message-ID: CAJNAD0m2W9JpOvm9fPJyW3s62Hhh0c1xP-5iu3yToio76yDQBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Hi Virendra,

Thanks for your time.

Here is the table and index structure

* process_activity*
Table "public.process_activity"
Column | Type | Modifiers

--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null
default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree
(process_activity_user_id) WITH (fillfactor='70')
"*process_activity_process_instance_id_app_id_created_idx*" btree
(process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx"
btree (process_instance_id, app_id, read_by_user, created) WITH
(fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id)
WITH (fillfactor='70')

*process_instance*
Table "public.process_instance"
Column | Type | Modifiers

-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default
next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null

Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"*fki_conv_konotor_user_user_id*" btree (user_id) WITH (fillfactor='70')

Regards, Amarendra

On Fri, May 8, 2020 at 12:01 AM Virendra Kumar <viru_7683(at)yahoo(dot)com> wrote:

> Sending table structure with indexes might help little further in
> understanding.
>
> Regards,
> Virendra
>
> On Thursday, May 7, 2020, 11:08:14 AM PDT, Amarendra Konda <
> amar(dot)vijaya(at)gmail(dot)com> wrote:
>
>
> Hi David,
>
> In earlier reply, Over looked another condition, hence please ignore that
> one
>
> Here is the correct one with all the needed conditions. According to the
> latest one, exists also not limiting rows from the process_activity table.
>
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id
> AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id =
> '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS (
> SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND *pi.process_instance_id
> = pa.process_instance_id * AND pi.user_id = '137074931866340') ORDER BY
> pa.process_instance_id, pa.created limit 50;
>
>
>
> QUERY PLAN
>
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -----------------------
> Limit (cost=1079.44..1079.52 rows=32 width=24) (actual
> time=85.747..85.777 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=43070
> -> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual
> time=85.745..85.759 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Sort Key: pa.process_instance_id, pa.created
> Sort Method: top-N heapsort Memory: 28kB
> Buffers: shared hit=43070
> -> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual
> time=0.025..72.115 rows=47011 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id,
> pa.created
> Buffers: shared hit=43070
> -> Index Scan using fki_conv_konotor_user_user_id on
> public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual
> time=0.010..0.015 rows=2 loops=1)
> Output: pi.app_id, pi.process_instance_id
> Index Cond: (c.user_id = '137074931866340'::bigint)
> Filter: (c.app_id = '126502930200650'::bigint)
> Buffers: shared hit=5
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) *(actual
> time=0.011..20.320 rows=23506 loops=2)*
> Output: pa.process_activity_id,
> pa.process_activity_type, pa.voice_url, pa.process_activity_user_id,
> pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
> pa.label_category_id, pa.label_id, pa.csat_respons
> e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id,
> pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
> pa.status_fragment, pa.internal_meta, pa.interaction_id,
> pa.do_not_translate, pa.should_tr
> anslate, pa.in_reply_to
> Index Cond: ((m.process_instance_id =
> pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND
> (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
> Buffers: shared hit=43065
> Planning time: 0.455 ms
> Execution time: 85.830 ms
>
> On Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar(dot)vijaya(at)gmail(dot)com>
> wrote:
>
> Hi David,
>
> Thanks for the reply.This has optimized number of rows.
>
> Can you please explain, why it is getting more columns in output, even
> though we have asked for only one column ?
>
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id
> AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id =
> '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS (
> SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND
> pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created
> limit 50;
>
>
>
> QUERY PLAN
>
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -----------------
> Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629
> rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=274950
> -> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473
> width=24) (actual time=821.282..891.607 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=274950
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..262062725.21 rows=367790473
> width=32) (actual time=821.253..891.517 rows=50 loops=1)
>
>
> * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url,
> pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias,
> pa.read_by_user, pa.source, pa.label_category_id, pa.label_id,
> pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated,
> pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
> pa.status_fragment, pa.internal_meta, pa.interaction_id,
> pa.do_not_translate, pa.should_translate, pa.in_reply_to*
> Index Cond: ((m.app_id = '126502930200650'::bigint) AND
> (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
> Buffers: shared hit=274946
> -> Materialize (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.001..0.001 rows=1 loops=50)
> Output: pi.app_id
> Buffers: shared hit=4
> -> Index Scan using fki_conv_konotor_user_user_id on
> public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.020..0.020 rows=1 loops=1)
> Output: pi.app_id
> Index Cond: (pi.user_id = '137074931866340'::bigint)
> Filter: (pi.app_id = '126502930200650'::bigint)
> Buffers: shared hit=4
> Planning time: 0.297 ms
> Execution time: 891.686 ms
> (20 rows)
>
> On Thu, May 7, 2020 at 9:17 PM David G. Johnston <
> david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
>
> On 5/7/20 4:19 AM, Amarendra Konda wrote:
> > Hi,
> >
> > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
> >
> > We have noticed huge difference interms of execution plan ( response
> > time) , When we pass the direct values Vs inner query to IN clause.
> >
> > High level details of the use case are as follows
> >
> > * As part of the SQL there are 2 tables named Process_instance
> > (master) and Process_activity ( child)
> > * Wanted to fetch TOP 50 rows from Process_activity table for the
> > given values of the Process_instance.
> > * When we used Inner Join / Inner query ( query1) between parent
> > table and child table , LIMIT is not really taking in to account.
> > Instead it is fetching more rows and columns that required, and
> > finally limiting the result
>
> It is doing what you told it to do which is SELECT all
> process_instance_i's for user_id='317079413683604' and app_id =
> '427380312000560' and then filtering further. I am going to guess that
> if you run the inner query alone you will find it returns ~23496 rows.
> You might have better results if you an actual join between
> process_activity and process_instance. Something like below(obviously
> not tested):
>
>
> What the OP seems to want is a semi-join:
>
> (not tested)
>
> SELECT pa.process_activity_id
> FROM process_activity pa WHERE pa.app_id = '427380312000560' AND
> pa.created > '1970-01-01 00:00:00'
> AND EXISTS (
> SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND
> pi.user_id = '317079413683604'
> )
> ORDER BY
> pa.process_instance_id,
> pa.created limit 50;
>
> I'm unsure exactly how this will impact the plan choice but it should be
> an improvement, and in any case more correctly defines what it is you are
> looking for.
>
> David J.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tory M Blue 2020-05-07 18:55:01 Odd postgres12 upgrade is changing or munging a password?
Previous Message Michael Lewis 2020-05-07 18:32:32 Re: AutoVacuum and growing transaction XID's

Browse pgsql-performance by date

  From Date Subject
Next Message Adrian Klaver 2020-05-07 19:25:15 Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )
Previous Message Michael Lewis 2020-05-07 18:32:32 Re: AutoVacuum and growing transaction XID's