RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

From: "Godfrin, Philippe E" <Philippe(dot)Godfrin(at)nov(dot)com>
To: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
Date: 2021-12-13 16:34:39
Message-ID: SA0PR15MB39334214C78D456BD112A39682749@SA0PR15MB3933.namprd15.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
>
>From: Peter J. Holzer hjp-pgsql(at)hjp(dot)at<mailto:hjp-pgsql(at)hjp(dot)at>
>Sent: Friday, December 10, 2021 3:43 PM
>To: pgsql-general(at)lists(dot)postgresql(dot)org<mailto:pgsql-general(at)lists(dot)postgresql(dot)org>
>Subject: Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
>
>On 2021-12-10 18:04:07 +0000, Godfrin, Philippe E wrote:
>> >But in my experience the biggest problem with large tables are unstable
>> >execution plans - for most of the parameters the optimizer will choose
>> >to use an index, but for some it will erroneously think that a full
>> >table scan is faster. That can lead to a situation where a query
>> >normally takes less than a second, but sometimes (seemingly at random)
>> >it takes several minutes
>[...]
>> For Peter I have a question. What exactly causes ‘unstable execution plans’ ??
>>
>> Besides not using bind variables, bad statistics, would you elaborate
>> in what would contribute to that instability?
>
>Not using bind variables and bad statistics are certainly big factors:
>
>On one hand not using bind variables gives a lot more information to the
>optimizer, so it can choose a better plan at run time. On the other hand
>that makes hard to predict what plan it will choose.
>
>Bad statistics come in many flavours: They might just be wrong, that's
>usually easy to fix. More problematic are statistics which just don't
>describe reality very well - they may not show a correlation, causing
>the optimizer to assume that two distributions are independent when they
>really aren't (since PostgreSQL 10 you can create statistics on multiple
>columns which helps in many but not all cases) or not show some other
>peculiarity of the data. Or they may be just so close to a flipping
>point that a small change causes the optimizer to choose a wildly
>different plan.
>
>Another source is dynamically generated SQL. Your application may just
>put together SQL from fragments or it might use something like
>SQLalchemy or an ORM. In any of these cases what looks like one query
>from a user's perspective may really be a whole family of related
>queries - and PostgreSQL will try to find the optimal plan for each of
>them. Which is generally a good thing, but it adds opportunities to mess
>up.
>
>hp
>
>--
>_ | Peter J. Holzer | Story must make more sense than reality.
>|_|_) | |
>| | | hjp(at)hjp(dot)at<mailto:hjp(at)hjp(dot)at> | -- Charles Stross, "Creative writing
>__/ | http://www.hjp.at/ | challenge!"

Good answer Peter, I agree wholeheartedly. I was curious if there was something specific to Postgresql .
phil

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kaushal Shriyan 2021-12-13 17:22:26 Error : /usr/local/share/lua/5.1/pgmoon/init.lua:211: don’t know how to auth: 10
Previous Message Achilleas Mantzios 2021-12-13 16:25:24 Re: Postgresql + containerization possible use case