Quick Links

Re: 600 million rows of data. Bad hardware or need partitioning?

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Arya F <arya6000(at)gmail(dot)com>
Cc:	Michael Lewis <mlewis(at)entrata(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Re: 600 million rows of data. Bad hardware or need partitioning?
Date:	2020-05-06 01:37:41
Message-ID:	20200506013741.GH28974@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, May 05, 2020 at 08:31:29PM -0400, Arya F wrote:
> On Mon, May 4, 2020 at 5:21 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> > I mentioned in February and March that you should plan to set shared_buffers
> > to fit the indexes currently being updated.
>
> The following command gives me
>
> select pg_size_pretty (pg_indexes_size('test_table'));
> pg_size_pretty > 5216 MB
>
> So right now, the indexes on that table are taking about 5.2 GB, if a
> machine has 512 GB of RAM and SSDs, is it safe to assume I can achieve
> the same update that takes 1.5 minutes in less than 5 seconds while
> having 600 million rows of data without partitioning?

I am not prepared to guarantee server performance..

But, to my knowledge, you haven't configured shared_buffers at all. Which I
think might be the single most important thing to configure for loading speed
(with indexes).

Couple months ago, you said your server had 4GB RAM, which isn't much, but if
shared_buffers is ~100MB, I think that deserves attention.

If you get good performance with a million rows and 32MB buffers, then you
could reasonably hope to get good performance (at least initially) with
100million rows and 320MB buffers. Scale that up to whatever you expect your
index size to be. Be conservative since you may need to add indexes later, and
you can expect they'll become bloated, so you may want to run a reindex job.

shared_buffers is frequently set to ~25% of RAM, and if you need to efficiently
use indexes larger than what that supports, then you should add RAM, or
implement partitioning.

--
Justin

In response to

Re: 600 million rows of data. Bad hardware or need partitioning? at 2020-05-06 00:31:29 from Arya F

Responses

Re: 600 million rows of data. Bad hardware or need partitioning? at 2020-05-10 04:10:20 from Arya F

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Steve Pritchard	2020-05-06 16:19:48	Inaccurate Rows estimate for "Bitmap And" causes Planner to choose wrong join
Previous Message	Arya F	2020-05-06 00:31:29	Re: 600 million rows of data. Bad hardware or need partitioning?