Re: Some tuning suggestions on a Red Hat 6.7 - PG 9.5.3 production environment

From: domenico febbo <mimmopasticcio(at)gmail(dot)com>
To: Pietro Pugni <pietro(dot)pugni(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org list" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Some tuning suggestions on a Red Hat 6.7 - PG 9.5.3 production environment
Date: 2016-11-14 17:36:51
Message-ID: CALqvQ1PJuYdDrduEb6PkC1kXBRUiHVrKH3zAnF5DxPAXP_ncmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

dear Pietro,
are you sure about

effective_io_concurrency = 30

could you please explain the type of disk storage?

Il 14/Nov/2016 12:46, "Pietro Pugni" <pietro(dot)pugni(at)gmail(dot)com> ha scritto:

> Dear list,
> I’m looking for some guidelines on how to optimize the configuration of a
> production database dedicated to a DWH application.
> I run the application on different machines and have solved several issues
> since now but am struggling on a production environment running Red Hat 6.7
> and PostgreSQL 9.5.3.
> My application does a lot of reads and many writes (plain “SELECT … INTO”
> and “INSERT”, no “UPDATE”), but on a order of magnitude lower than the
> reads.
> The work flow consists of two big blocks: an ETL phase and the workloads
> on the data imported during the ETL phase.
>
> The biggest schema has about 1.2 billions of rows distributed over a ten
> of tables; many of those tables are partitioned and have indexes. At the
> moment the database stores two schemas but I plan to add other three
> schemas of similar size.
>
> The machine is virtualized and has 8 CPUs at about 3GHz, 64GB of RAM and
> 5TB of storage. It runs on Red Hat 6.7, kernel 2.6.x
>
> The configuration changes I made so far are:
> max_connections = 30
> shared_buffers = 32GB
> work_mem = 256MB
> maintenance_work_mem = 4GB
> effective_io_concurrency = 30
> checkpoint_completion_target = 0.9
> random_page_cost = 2.0
> effective_cache_size = 48GB
> default_statistics_target = 1000
>
> autovacuum is on and the collation is ‘C’.
>
>
> The first issue I faced was about maintenance_work_mem because I set it to
> 16GB and the server silently crashed during a VACUUM because I didn’t
> consider that it could take up to autovacuum_max_workers *
> maintenance_work_mem (roughly 48GB). So I lowered maintenance_work_mem to
> 4GB and it did work. *Should I set maintenance_work_mem to a smaller
> value (1GB) after the ETL terminates or can I leave it at 4GB without
> degrading the overall performance?*
>
> The second issue emerged during a intensive parallel query. I implemented
> a splitter that parallelize certain kind of queries. There were 8 similar
> queries running that was working on 8 overall disjoined subsets of the same
> table; this table has roughly 4.5 millions of rows. These queries uses
> SELECT DISTINCT, ORDER BY, OVER (PARTITION BY … ORDER BY) and COALESCE().
> At a certain point the server crashed and I found the following error in
> the logs:
>
> *postgres server process was terminated by signal 9 killed*
>
> After some research, I found that probably it was the OOM killer. Running
> “dmesg” tells that effectively it was. Reading the documentation and this
> answer on SO ( http://stackoverflow.com/questions/16418173/psql-seems-
> to-timeout-with-long-queries ), I realized that probably the issue is due
> to a misconfiguration. The value I set for this pg instance don’t seem to
> be so wrong, except maybe from maintenance_work_mem. I will certainly
> disable OOM as suggested by the official docs (
> https://www.postgresql.org/docs/current/static/kernel-
> resources.html#LINUX-MEMORY-OVERCOMMIT ) but *was wondering if I could
> tune the configuration a little better. Can someone give me some more
> advices?*
>
> I run the same application with different data (and workload) on other
> machines, but they have different configurations (Ubuntu 16.0.4). On one of
> them I previously disabled the virtual memory overcommit and never
> experienced that issue, but the machine has 128GB of RAM.
>
> I hope to have been clear enough.
> Thank you everyone
> Pietro
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Cresswell 2016-11-14 17:53:43 Why is the optimiser choosing a sub-optimal plan?
Previous Message Jeff Janes 2016-11-14 16:50:13 Re: Query planner chooses index scan backward instead of better index option