Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

From: Amarendra Konda <amar(dot)vijaya(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )
Date: 2020-05-07 18:07:42
Message-ID: CAJNAD0=-3TO_=E+ABSwAcrZw5rDUw+wQ3R8SyVYkrYVkGZwLeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Hi David,

In earlier reply, Over looked another condition, hence please ignore that
one

Here is the correct one with all the needed conditions. According to the
latest one, exists also not limiting rows from the process_activity table.

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id
AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id =
'126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS (
SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND
*pi.process_instance_id
= pa.process_instance_id * AND pi.user_id = '137074931866340') ORDER BY
pa.process_instance_id, pa.created limit 50;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual
time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual
time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual
time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id,
pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on
public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual
time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using
process_activity_process_instance_id_app_id_created_idx on
public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) *(actual
time=0.011..20.320 rows=23506 loops=2)*
Output: pa.process_activity_id,
pa.process_activity_type, pa.voice_url, pa.process_activity_user_id,
pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id,
pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
pa.status_fragment, pa.internal_meta, pa.interaction_id,
pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id =
pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND
(m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 ms

On Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar(dot)vijaya(at)gmail(dot)com>
wrote:

> Hi David,
>
> Thanks for the reply.This has optimized number of rows.
>
> Can you please explain, why it is getting more columns in output, even
> though we have asked for only one column ?
>
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id
> AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id =
> '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS (
> SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND
> pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created
> limit 50;
>
>
>
> QUERY PLAN
>
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -----------------
> Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629
> rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=274950
> -> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473
> width=24) (actual time=821.282..891.607 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=274950
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..262062725.21 rows=367790473
> width=32) (actual time=821.253..891.517 rows=50 loops=1)
>
>
> * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url,
> pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias,
> pa.read_by_user, pa.source, pa.label_category_id, pa.label_id,
> pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated,
> pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
> pa.status_fragment, pa.internal_meta, pa.interaction_id,
> pa.do_not_translate, pa.should_translate, pa.in_reply_to*
> Index Cond: ((m.app_id = '126502930200650'::bigint) AND
> (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
> Buffers: shared hit=274946
> -> Materialize (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.001..0.001 rows=1 loops=50)
> Output: pi.app_id
> Buffers: shared hit=4
> -> Index Scan using fki_conv_konotor_user_user_id on
> public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.020..0.020 rows=1 loops=1)
> Output: pi.app_id
> Index Cond: (pi.user_id = '137074931866340'::bigint)
> Filter: (pi.app_id = '126502930200650'::bigint)
> Buffers: shared hit=4
> Planning time: 0.297 ms
> Execution time: 891.686 ms
> (20 rows)
>
> On Thu, May 7, 2020 at 9:17 PM David G. Johnston <
> david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
>> On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
>> wrote:
>>
>>> On 5/7/20 4:19 AM, Amarendra Konda wrote:
>>> > Hi,
>>> >
>>> > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
>>> > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>>> >
>>> > We have noticed huge difference interms of execution plan ( response
>>> > time) , When we pass the direct values Vs inner query to IN clause.
>>> >
>>> > High level details of the use case are as follows
>>> >
>>> > * As part of the SQL there are 2 tables named Process_instance
>>> > (master) and Process_activity ( child)
>>> > * Wanted to fetch TOP 50 rows from Process_activity table for the
>>> > given values of the Process_instance.
>>> > * When we used Inner Join / Inner query ( query1) between parent
>>> > table and child table , LIMIT is not really taking in to account.
>>> > Instead it is fetching more rows and columns that required, and
>>> > finally limiting the result
>>>
>>> It is doing what you told it to do which is SELECT all
>>> process_instance_i's for user_id='317079413683604' and app_id =
>>> '427380312000560' and then filtering further. I am going to guess that
>>> if you run the inner query alone you will find it returns ~23496 rows.
>>> You might have better results if you an actual join between
>>> process_activity and process_instance. Something like below(obviously
>>> not tested):
>>>
>>
>> What the OP seems to want is a semi-join:
>>
>> (not tested)
>>
>> SELECT pa.process_activity_id
>> FROM process_activity pa WHERE pa.app_id = '427380312000560' AND
>> pa.created > '1970-01-01 00:00:00'
>> AND EXISTS (
>> SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND
>> pi.user_id = '317079413683604'
>> )
>> ORDER BY
>> pa.process_instance_id,
>> pa.created limit 50;
>>
>> I'm unsure exactly how this will impact the plan choice but it should be
>> an improvement, and in any case more correctly defines what it is you are
>> looking for.
>>
>> David J.
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Julien Rouhaud 2020-05-07 18:22:00 Re: increase of xact_commit vs txid_current
Previous Message Amarendra Konda 2020-05-07 17:49:18 Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

Browse pgsql-performance by date

  From Date Subject
Next Message github kran 2020-05-07 18:23:04 AutoVacuum and growing transaction XID's
Previous Message Amarendra Konda 2020-05-07 17:49:18 Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )