Re: Postgresql RDS DB Latency Chossing Hash join Plan

From: github kran <githubkran(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgresql RDS DB Latency Chossing Hash join Plan
Date: 2019-02-14 22:11:06
Message-ID: CACaZr5Sm3LT8QHza16=hPY3OkA6u3SzA82k=pX5bUV0Z-8hm2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> How many total rows in these tables? I am assuming these are partitions
> and those 100 million rows got distributed. If the data difference is
> significant, then you aren't guaranteed similar performance. You may want
> to follow more of the suggested steps on.
>
> https://wiki.postgresql.org/wiki/SlowQueryQuestions
>
>
> *Michael Lewis | Software Engineer*
> *Entrata*
> *c: **619.370.8697 <619-370-8697>*
>
> Michael - Yes correct the data of 100 million rows is distributed to all
the partitions.
FInally I feel we have come to conclusion after we changed the
random_page_cost from 4 to 2 in Non prod instance and we see improvements
in the query which use to take from 1 minute to 1 -2 seconds.
That's correct we have around 490 million rows in few of our partition
tables. The partition tables are created for every 7 days.

We ran our API test which hits the backend database Aurora RDS PostgreSQL
and see our query response times , requests/sec are better than before. Do
you recommend this setting on a Production instance? Right now we are
planning to go
implement this option of random_page_cost to 2. ( We know it also works if
we turn off the hash_join but we dont want to implement this change but
rather use random_page_cost to 2).

Questions.
1) What is your recommendation on this ? Can we modify this change on Prod
instance which is performing better today or only keep this change to Non
prod instance ?. ( Actually we want to implement this change on Non Prod
instance)
2) Are there are any downside in changing this value ?.
3) Also another question we have a 64TB storage and running on r4 16 x
large EC2 instance. Few of our partitions are bigger and few of them are
smaller. We have data from last 9 months and we are planning to keep the
data for about
close to 1 year till May. Do you see any concerns on this ?. Eventually we
are thinking to archive this data in next 2 months by dropping of older
partitions.
3) What could be the problems of keeping the data longer if there is a
regular maintenance like VACUUM and other maintenace activities

>
> On Thu, Feb 14, 2019 at 8:48 AM github kran <githubkran(at)gmail(dot)com> wrote:
>
>>
>>
>> On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis <mlewis(at)entrata(dot)com>
>> wrote:
>>
>>> I didn't see your email yesterday, sorry about that. Index scans instead
>>> of sequential scans and nested loop instead of hash join means that you
>>> have bad row count estimates on "Non prod Aurora RDS instance" as far as I
>>> can figure. Have you run commands like-
>>>
>>> analyze asset_info_2019_2_part4;
>>> analyze asset_info_2019_2_part2;
>>>
>>> etc? If data are very similar, indexes all exist, and
>>> default_statistics_target are the same, then you should be getting the same
>>> plans.
>>>
>>>
>>> *Michael Lewis | Software Engineer*
>>> *Entrata*
>>> *c: **619.370.8697 <619-370-8697>*
>>>
>>> 1) Yes did the VACUUM for all the tables like asset_info_2019_2_part1,
>> part2 , part4 and also for location_info_xxx to remove any dead tuples and
>> also rebuilt the indexes..
>> 2) REINDEX table location_data_2018_12_part4;( Like wise ran for all
>> the tables and also VACUUM for all the tables).
>> 3) The data in Non prod instance is more. One thing to mention here
>> when we built the Non prod instance we copied SNAPSHOT from Prod instance
>> and on top of that inserted data about 100 million rows and then did VACUUM
>> and re-indexed the tables.
>>
>> I cant think of anything we can do here but let us know if you need any
>> more details on this problem. Iam happy to share more details.
>>
>>
>>>
>>> On Wed, Feb 13, 2019 at 8:49 AM github kran <githubkran(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 12, 2019 at 12:55 PM github kran <githubkran(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis <mlewis(at)entrata(dot)com>
>>>>> wrote:
>>>>>
>>>>>> Did you update the stats by running ANALYZE on the tables involved,
>>>>>> or perhaps the entire database on the 'Non prod Aurora RDS instance'? Can
>>>>>> you share the two execution plans?
>>>>>>
>>>>>> *Michael Lewis | Software Engineer*
>>>>>> *Entrata*
>>>>>> *c: **619.370.8697 <619-370-8697>*
>>>>>>
>>>>>
>>>>> Here is the plan for both of the DB instances.
>>>>>
>>>>
>>>> Michael - Did you get any chance to look at this issue. Also we see
>>>> there is a sequential scan being done instead of index scan.
>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Feb 12, 2019 at 11:27 AM github kran <githubkran(at)gmail(dot)com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 12, 2019 at 7:07 AM github kran <githubkran(at)gmail(dot)com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran <githubkran(at)gmail(dot)com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis <mlewis(at)entrata(dot)com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Are default statistics target the same on both prod and AWS? Have
>>>>>>>>>> you analyzed all tables being used in this query to ensure stats are up
>>>>>>>>>> proper? If the optimizer is choosing a different plan, then the stats must
>>>>>>>>>> be different IMO.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Michael Lewis | Software Engineer*
>>>>>>>>>> *Entrata*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for your reply I have verified few of the tables and their
>>>>>>>>> default statistics target and they seem to be same but is there anything in
>>>>>>>>> particular you want me to look at it to differentiate Prod and Non prod
>>>>>>>>> databases ?. ( Also the DB instance size is same but there is little
>>>>>>>>> more data in the Non prod Aurora RDS instance compared to Prod
>>>>>>>>> instance).
>>>>>>>>>
>>>>>>>>> Query used.
>>>>>>>>>> select * from pg_stats where tablename = 'tableName'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran <githubkran(at)gmail(dot)com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Postgres Team,
>>>>>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We
>>>>>>>>>>> recently deleted few million rows from the database and ran into a issue in
>>>>>>>>>>> one of our dev account where the
>>>>>>>>>>> DB was not normal after this deletion. We did re index,
>>>>>>>>>>> vacuuming entire database but we couldnt bring it to the same state as
>>>>>>>>>>> earlier. So next steps we deleted the database and
>>>>>>>>>>> recreated the database by copying the snapshot from a production
>>>>>>>>>>> instance. Further did vacumming, re-index on the database.
>>>>>>>>>>>
>>>>>>>>>>> After this now the dev database seems to be in a better state
>>>>>>>>>>> than earlier but we are seeing few of our DB calls are taking more than 1
>>>>>>>>>>> minute when we are fetching data and we observed
>>>>>>>>>>> this is because the query plan was executing a hash join as part
>>>>>>>>>>> of the query whereas a similar query on prod instance is not doing any hash
>>>>>>>>>>> join and is returning faster.
>>>>>>>>>>>
>>>>>>>>>>> Also we did not want to experiment by modifing the DB settings
>>>>>>>>>>> by doing enable_hash_join to off or random_page_count to 1 as we dont have
>>>>>>>>>>> these settings in Prod instance.
>>>>>>>>>>>
>>>>>>>>>>> Note:
>>>>>>>>>>> The partition table sizes we have here is between 40 GB to 75 GB
>>>>>>>>>>> and this is our normal size range, we have a new partition table for every
>>>>>>>>>>> 7 days.
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your ideas on what we could be missing and what we
>>>>>>>>>>> can correct here to reduce the query latency.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> githubKran
>>>>>>>>>>>
>>>>>>>>>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2019-02-14 22:58:25 Re: Postgresql RDS DB Latency Chossing Hash join Plan
Previous Message Michael Lewis 2019-02-14 18:43:10 Re: Postgresql RDS DB Latency Chossing Hash join Plan

Browse pgsql-sql by date

  From Date Subject
Next Message Michael Lewis 2019-02-14 22:58:25 Re: Postgresql RDS DB Latency Chossing Hash join Plan
Previous Message Michael Lewis 2019-02-14 18:43:10 Re: Postgresql RDS DB Latency Chossing Hash join Plan