Re: Need to know more about pg_test_fsync utility

From: PGSQL DBA <pgsqldba(dot)1987(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Need to know more about pg_test_fsync utility
Date: 2021-12-13 06:52:06
Message-ID: CAKaKWS9yhu8+Xs7bvh5GAtUwowEsq41iQOg7GPaaYnXcSd94=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Thomas,

Apologies for uploading the wrong screenshot in the attachment. I have
uploaded the correct output of pg_test_fsync for your reference .

Please find our environment details as below:
Infra: Azure IaaS
OS: Red Hat Enterprise Linux Server release 7.9
File System: XFS
Disk: Premium SSD
IOPS: 5000 per Disk
Throughput: 200 MB/Sec per Disk
Disk Cache: Read+Write
Workload: OLTP

Actual issue is we are getting slow execution of Commit & Prepared Commit
statements only. Execution of these statements is taking 2 to 14 seconds.
That's why we are analysing the pg_test_fsync output & FIO for further
troubleshooting but unable to find any right directions.

On Mon, 13 Dec 2021 at 10:45, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA <pgsqldba(dot)1987(at)gmail(dot)com> wrote:
> > As you mentioned in question-8, "I'd investigate whether data is being
> cached unexpectedly, perhaps indicating that committed transactions be lost
> in a system crash event." So, I would like to know that if we configure the
> disk for the WALs with read+write disk cache then will it create any
> performance issue and show the attached output?
>
> Which OS and file system are you running and what kind of storage
> system is it? If you can turn write caching on and off in your
> storage layer without affecting the ops/sec that's a pretty good clue
> that it's respecting the cache control commands that the OS sends.
> The numbers you showed are reassuringly low. Hmm, I wonder why you
> have such a low number for non-sync'd writes (the last line).
>
> I have a concrete example of using this tool to learn something about
> an unfamiliar-to-me operating system, from this morning: I saw a
> system that reported ~6k IOPS for open_datasync and only ~600 IOPS for
> fdatasync. With some limited knowledge of file systems I expect the
> latter to involve sending a SYNCHRONIZE CACHE command to flush device
> caches, and the former to do that, or use a more efficient FUA flag to
> flush just particular writes and not the whole write cache. I didn't
> expect it to go 10x faster, so something had to be wrong. With some
> more investigation and reading I learned that the storage drivers I
> was using do indeed ignore FUA cache control flags, so that
> wal_sync_method is not crash safe; if you lose power and come back up,
> you might lose a bunch of committed transactions from the end of the
> WAL. I turned off write caches in the storage device, and then
> open_datasync became a lot slower, matching my expectations.
>
> These are primitive tools, but can help to check some assumptions...
>
> Obligatory remark: PostgreSQL 9.5 is out of support, please see
> https://www.postgresql.org/support/versioning/.
>
> > I also would like to know is there any best Practice from PostgreSQL
> which mentions what is the disk latency required for the WAL & DATA disk?
>
> No particular latency is required by PostgreSQL, and I don't have a
> general answer to this. Is it fast enough for you? What sort of
> workload is it running, OLTP, reporting, ...?
>

Attachment Content-Type Size
image/png 61.5 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Achilleas Mantzios 2021-12-13 08:06:59 Re: Postgresql + containerization possible use case
Previous Message Thomas Munro 2021-12-13 02:45:08 Re: Need to know more about pg_test_fsync utility