RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

From: "Godfrin, Philippe E" <Philippe(dot)Godfrin(at)nov(dot)com>
To: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
Date: 2021-12-10 18:04:07
Message-ID: SA0PR15MB393324C6A5522A80913DC42582719@SA0PR15MB3933.namprd15.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 2021-12-08 14:44:47 -0500, David Gauthier wrote:
> So far, the tables I have in my DB have relatively low numbers of records (most
> are < 10K, all are < 10M). Things have been running great in terms of
> performance. But a project is being brainstormed which may require some tables
> to contain a couple billion records.
[...]
> What else should I be worried about ?
>
> I suspect that part of why things are running really well so far is that the
> relatively small amounts of data in these tables ends up in the DB cache and
> disk I/O is kept at a minimum. Will that no longer be the case once queries
> start running on these big tables ?

>
>Depends a lot on how good the locality of your queries is. If most read
>only the same parts of the same indexes, those will still be in the
>cache. If they are all over the place or if you have queries which need
>to read large parts of your tables, cache misses will make your
>performance a lot less predictable, yes. That stuff is also hard to
>test, because when you are testing a query twice in a row, the second
>time it will likely hit the cache and be quite fast.
>
>But in my experience the biggest problem with large tables are unstable
>execution plans - for most of the parameters the optimizer will choose
>to use an index, but for some it will erroneously think that a full
>table scan is faster. That can lead to a situation where a query
>normally takes less than a second, but sometimes (seemingly at random)
>it takes several minutes - users will understandably be upset about such
>behaviour. It is in any case a good idea to monitor execution times to
>find such problems (ideally before users complain), but each needs to be
>treated on an individual basis, and sometimes there seems to be no good
>solution.

To the OP, that's is a tall order to answer - basically that's wjhy DBA's still have
Jobs...

For Peter I have a question. What exactly causes 'unstable execution plans' ??

Besides not using bind variables, bad statistics, would you elaborate in what would
contribute to that instability?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2021-12-10 20:06:20 Re: Postgresql + containerization possible use case
Previous Message Adrian Klaver 2021-12-10 16:01:20 Re: Postgresql + containerization possible use case