From: | Scott Ribe <scott_ribe(at)elevated-dev(dot)com> |
---|---|
To: | pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org> |
Subject: | regarding PG on ZFS performance |
Date: | 2022-04-12 16:18:09 |
Message-ID: | 46F40448-CB5C-4D86-806D-27CF791A364F@elevated-dev.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Just re-ran some older tests with the current version of Ubunutu & ZFS (but older kernel thanks to a multi-way incompatibility with other things). Results are that with proper tuning, ZFS RAIDZ1 on 4 NVMe drives gives higher TPS on pgbench at scale 10,000 than XFS on one of the same NVMe--but the initial population of the db takes 25% longer.
Proper tuning: PG full_page_writes off (for ZFS, on for NVMe); ZFS lz4 compression, 64K recordsize, relatime
db created by: pgbench -i -s 10000 --foreign-keys test
benchmarked as: pgbench -c 100 -j 4 -t 1000 test
NVMe: 31,804 TPS
RAIDZ1: 50,228 TPS
Some other notes:
- the situation is reversed, single NVMe is faster when using 10 connections instead of 100
- these tests are all from within containers running on Kubernetes--pg server and client in same container, connected over domain sockets
- 256GB and 48 CPU pod limits--running where there's still the cgroup double-counting bug, so CPU is theoretically throttled to ~24, leaving ~20 to PG server
- the container is actually getting very slightly throttled at barely over 20 CPU--so not sure if it's CPU-bound or IO-bound
- PG settings are set up for a larger database, shared_buffers, work_mem, parallel workers, autovacuum, etc
- I'd read that because of the way ZFS handles RAIDZ1 compared to RAID5, that performance probably didn't suffer relative to RAID10, and this is the case--tests with ZFS RAID10 on the same drives were a tiny bit slower (2-3%) than RAIDZ1 for TPS, but a bit faster on initial population (6-8%)
- as an aside, WekaFS (https://www.aspsys.com/solutions/storage-solutions/weka-io/) is about 10% faster than RAIDZ1 (both TPS and initial fill)
I hope that experience from someone who actually bothered to read up on how to configure ZFS for PG can put to rest some "ZFS is too slow" misinformation. I am certain that ZFS is not nearly the fastest for all configurations (for instance, I am unable to configure the 4 NVMe drives into a hardware RAID10 to test, and it seems that ZFS may not scale well to larger numbers of disks) but "too slow to ever be consider for serious work" is flat-out wrong.
--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
https://www.linkedin.com/in/scottribe/
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Ribe | 2022-04-12 19:48:30 | Re: regarding PG on ZFS performance |
Previous Message | Julien Rouhaud | 2022-04-12 14:20:28 | Re: set autocommit only for select statements |