From: | pfote <pfote(at)ypsilon(dot)net> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | strange hot_standby behaviour |
Date: | 2012-10-01 14:28:26 |
Message-ID: | 5069A88A.3090008@ypsilon.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I had a very strange effect on the weekend that smells like a bug, so
i'd like so share it.
Setup:
machine A: 16 CPU Cores (modern), 128GB RAM, nice 6-drive SAS Raid-10
machines B, C: 8 Cores (substantially older than A), 48GB Ram, some scsi
Raid, substantially slower than A
The workload is about 80% - 90% SELECTs with heavy sorting and grouping,
the remaining are INSERTs/UPDATEs/DELETEs.
So In the original setup A is the master, B and C are hot standby's
that process some of the SELECTs, but by far the most processing is done
on the master (A). pg version is 9.0.6. CPU utilization is about 80% on
the master and between 90-100% in the standby's, so it's decided to
upgrade to the latest 9.2 to profit from the latest performance
enhancements.
So B gets upgraded to 9.2.1-1.pgdg60+1 (from pgapt.debian.org) and
becomes master, then A becomes a hot_standby slave that takes all the
SELECTs (and C becomes another hot_standby). In the beginning everything
works as expected, CPU utilization drops from 80% to about 50-60%,
selects run faster, everything looks smoother (some queries drop from
>5s to <1s due to 9.2s index-only-scan feature). Its friday, everyone
is happy.
About 16 hours later, saturday morning around 6:00, A suddenly goes wild
and has a CPU utilization of 100% without a change in the workload, out
of the blue. Queries that used to take <1s suddenly take 5-10s, "explain
analyze" plans of these queries havn't change a bit though. Switching
the workload off causes the server to become idle. (while I'm writing
this I realize we haven't tried to restart A). Instead, $boss decides to
twitch back to the original setup, so B gets dropped, A becomes master
and gets 100% of the workload (all SELECTs/INSERTs/UPDATEs/DELETEs), and
everything becomes just like friday, CPU usage drops to 50-60%,
everything runs smothly.
I'm not sure yet if this is replication related or a 9.2.1 problem. Any
Ideas?
regards
Andreas Pfotenhauer
Ypsilon.NET AG
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-10-01 14:42:02 | Re: Postgres error when adding new page |
Previous Message | Merlin Moncure | 2012-10-01 14:28:21 | Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS? |