Postgres on Kubernetes/VMware

From: George Sexton <georges(at)mhsoftware(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Postgres on Kubernetes/VMware
Date: 2020-11-17 22:18:45
Message-ID: 82f63aa3-6d11-1361-1457-12829c05cf2a@mhsoftware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Everyone,

 I’ve run into an issue that’s got me stumped and I would be really
grateful for any ideas. We’re deploying Postgres 11.8 on Kubernetes
1.17.9. The nodes are running RedHat EL 7.9 with the latest kernel for
that distribution.

 We create a Kubernetes pod that runs Postgres. It’s setup with /pgdata
going to a PersistentVolumeClaim that’s thin provisioned with a size set
at 500GB. Initially, the allocated size of the PersistentVolumeClaim’s
corresponding disk file (VMware .VMDK) is around 12GB. However, the size
of the .VMDK keeps increasing at roughly 150-300MB/Hour even though the
size of the /pgdata folder as shown by df is stable around 230MB. To try
to troubleshoot this, I set up Postgres to put the pg_wal folder on a
different partition. That didn’t make any difference.

 Replication is turned on for this Postgres instance, and it was
observed that the replica’s .VMDK file grew at a rate of 25MB/Hour.

 I’ve checked to make sure that the partition is mounted with the
DISCARD option. I created a test program that would allocate disk space,
and then free it. I confirmed that when DISCARD is used for mount,
VMware will reclaim the space as expected. For example, if the .VMDK
grows to 20GB during a test, it shrinks back down to some size like 1GB
when the test completes.

 To try to sort this out, I’ve done the following:

1. Written the test program and confirmed that on the ESXI Host/VCenter
space reclamation works.
2. Moved the pg_wal directory to another partition.
3. Changed Auto-Vacuum frequency to run less often (12 hours).
4. Executed “vacuum full”, followed by fstrim </mount point> from the
Kubernetes node. This freed perhaps 12 MB from the /pgdata folder,
and around 19MB from the .VMDK size.
5. Checked the /proc/fd descriptors for the running connections and
confirmed no unusual temp files are opened.
6. Confirmed the replication is working as expected.
7. Set archive_mode off to eliminate that as a source of noise.

 Does anyone have any ideas about what’s causing this? Is there
anything unusual about how Postgres allocates temporary data files or
frees them? I’m really just grasping. Thanks for looking!

George

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-11-17 22:30:44 Re: pg_upgrade from 12 to 13 failes with plpython2
Previous Message Devrim Gündüz 2020-11-17 22:17:40 Re: pg_upgrade from 12 to 13 failes with plpython2