Re: perf tuning for 28 cores and 252GB RAM

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Michael Curry <curry(at)cs(dot)umd(dot)edu>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: perf tuning for 28 cores and 252GB RAM
Date: 2019-06-17 23:45:41
Message-ID: CAMkU=1wK1GJi7_pFV_jLxrhUyNzaUvVB3vvv=sSNdKD7O5fbMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 17, 2019 at 4:51 PM Michael Curry <curry(at)cs(dot)umd(dot)edu> wrote:

> I am using a Postgres instance in an HPC cluster, where they have
> generously given me an entire node. This means I have 28 cores and 252GB
> RAM. I have to assume that the very conservative default settings for
> things like buffers and max working memory are too small here.
>
> We have about 20 billion rows in a single large table.
>

What is that in bytes? Do you only have that one table?

> The database is not intended to run an application but rather to allow a
> few individuals to do data analysis, so we can guarantee the number of
> concurrent queries will be small, and that nothing else will need to use
> the server. Creating multiple different indices on a few subsets of the
> columns will be needed to support the kinds of queries we want.
>
> What settings should be changed to maximize performance?
>

With 28 cores for only a few users, parallelization will probably be
important. That feature is fairly new to PostgreSQL and rapidly improving
from version to version, so you will want to use the last version you can
(v11). And then increase the values for max_worker_processes,
max_parallel_maintenance_workers, max_parallel_workers_per_gather, and
max_parallel_workers. With the potential for so many parallel workers
running at once, you wouldn't want to go overboard on work_mem, maybe 2GB.
If you don't think all allowed users will be running large queries at the
same time (because they are mostly thinking what query to run, or thinking
about the results of the last one they ran, rather than actually running
queries), then maybe higher than that.

If your entire database can comfortably fit in RAM, I would make
shared_buffers large enough to hold the entire database. If not, I would
set the value small (say, 8GB) and let the OS do the heavy lifting of
deciding what to keep in cache. If you go with the first option, you
probably want to use pg_prewarm after each restart to get the data into
cache as fast as you can, rather than let it get loaded in naturally as you
run queries; Also, you would probably want to set random_page_cost and
seq_page_cost quite low, like maybe 0.1 and 0.05.

You haven't described what kind of IO capacity and setup you have, knowing
that could suggest other changes to make. Also, seeing the results of
`explain (analyze, buffers)`, especially with track_io_timing turned on,
for some actual queries could provide good insight for what else might need
changing.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Miles Elam 2019-06-17 23:54:51 Inserts restricted to a trigger
Previous Message Ken Tanzer 2019-06-17 23:33:44 Re: psql UPDATE field [tab] expands to DEFAULT?