Re: Postgresql RDS DB Latency Chossing Hash join Plan

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: github kran <githubkran(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgresql RDS DB Latency Chossing Hash join Plan
Date: 2019-02-13 17:37:53
Message-ID: CAHOFxGory5zhmF6bpM8dtHzM4DJkvoXV37xwpJMRPgNtszRQcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

I didn't see your email yesterday, sorry about that. Index scans instead of
sequential scans and nested loop instead of hash join means that you have
bad row count estimates on "Non prod Aurora RDS instance" as far as I can
figure. Have you run commands like-

analyze asset_info_2019_2_part4;
analyze asset_info_2019_2_part2;

etc? If data are very similar, indexes all exist, and
default_statistics_target are the same, then you should be getting the same
plans.

*Michael Lewis | Software Engineer*
*Entrata*
*c: **619.370.8697 <619-370-8697>*

On Wed, Feb 13, 2019 at 8:49 AM github kran <githubkran(at)gmail(dot)com> wrote:

>
>
> On Tue, Feb 12, 2019 at 12:55 PM github kran <githubkran(at)gmail(dot)com> wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis <mlewis(at)entrata(dot)com>
>> wrote:
>>
>>> Did you update the stats by running ANALYZE on the tables involved, or
>>> perhaps the entire database on the 'Non prod Aurora RDS instance'? Can you
>>> share the two execution plans?
>>>
>>> *Michael Lewis | Software Engineer*
>>> *Entrata*
>>> *c: **619.370.8697 <619-370-8697>*
>>>
>>
>> Here is the plan for both of the DB instances.
>>
>
> Michael - Did you get any chance to look at this issue. Also we see there
> is a sequential scan being done instead of index scan.
>
>>
>>>
>>> On Tue, Feb 12, 2019 at 11:27 AM github kran <githubkran(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 12, 2019 at 7:07 AM github kran <githubkran(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran <githubkran(at)gmail(dot)com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis <mlewis(at)entrata(dot)com>
>>>>>> wrote:
>>>>>>
>>>>>>> Are default statistics target the same on both prod and AWS? Have
>>>>>>> you analyzed all tables being used in this query to ensure stats are up
>>>>>>> proper? If the optimizer is choosing a different plan, then the stats must
>>>>>>> be different IMO.
>>>>>>>
>>>>>>>
>>>>>>> *Michael Lewis | Software Engineer*
>>>>>>> *Entrata*
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for your reply I have verified few of the tables and their
>>>>>> default statistics target and they seem to be same but is there anything in
>>>>>> particular you want me to look at it to differentiate Prod and Non prod
>>>>>> databases ?. ( Also the DB instance size is same but there is little
>>>>>> more data in the Non prod Aurora RDS instance compared to Prod
>>>>>> instance).
>>>>>>
>>>>>> Query used.
>>>>>>> select * from pg_stats where tablename = 'tableName'
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran <githubkran(at)gmail(dot)com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Postgres Team,
>>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently
>>>>>>>> deleted few million rows from the database and ran into a issue in one of
>>>>>>>> our dev account where the
>>>>>>>> DB was not normal after this deletion. We did re index, vacuuming
>>>>>>>> entire database but we couldnt bring it to the same state as earlier. So
>>>>>>>> next steps we deleted the database and
>>>>>>>> recreated the database by copying the snapshot from a production
>>>>>>>> instance. Further did vacumming, re-index on the database.
>>>>>>>>
>>>>>>>> After this now the dev database seems to be in a better state than
>>>>>>>> earlier but we are seeing few of our DB calls are taking more than 1 minute
>>>>>>>> when we are fetching data and we observed
>>>>>>>> this is because the query plan was executing a hash join as part of
>>>>>>>> the query whereas a similar query on prod instance is not doing any hash
>>>>>>>> join and is returning faster.
>>>>>>>>
>>>>>>>> Also we did not want to experiment by modifing the DB settings by
>>>>>>>> doing enable_hash_join to off or random_page_count to 1 as we dont have
>>>>>>>> these settings in Prod instance.
>>>>>>>>
>>>>>>>> Note:
>>>>>>>> The partition table sizes we have here is between 40 GB to 75 GB
>>>>>>>> and this is our normal size range, we have a new partition table for every
>>>>>>>> 7 days.
>>>>>>>>
>>>>>>>> Appreciate your ideas on what we could be missing and what we can
>>>>>>>> correct here to reduce the query latency.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> githubKran
>>>>>>>>
>>>>>>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Kretschmer 2019-02-13 17:38:15 Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child
Previous Message Adrian Klaver 2019-02-13 17:29:25 Re: Subquery to select max(date) value

Browse pgsql-sql by date

  From Date Subject
Next Message github kran 2019-02-14 15:47:47 Re: Postgresql RDS DB Latency Chossing Hash join Plan
Previous Message github kran 2019-02-13 15:49:38 Re: Postgresql RDS DB Latency Chossing Hash join Plan