From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pglz compression performance, take two |
Date: | 2021-03-19 19:35:46 |
Message-ID: | EB24F56E-901B-46D6-BFFC-71D51603ABDE@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Jan 21, 2021, at 6:48 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> @cfbot: rebased
> <0001-Reorganize-pglz-compression-code.patch>
Review comments.
First, I installed a build from master without this patch, created a test installation with lots of compressed text and array columns, upgraded the binaries to a build with this patch included, and tried to find problems with the data left over from the pre-patch binaries. Everything checks out. This is on little-endian mac osx intel core i9, not on any ARM platform that you are targeting with portions of the patch.
+/**************************************
+ * CPU Feature Detection *
+ **************************************/
+/* PGLZ_FORCE_MEMORY_ACCESS
+ * By default, access to unaligned memory is controlled by `memcpy()`, which is safe and portable.
+ * Unfortunately, on some target/compiler combinations, the generated assembly is sub-optimal.
+ * The below switch allow to select different access method for improved performance.
+ * Method 0 (default) : use `memcpy()`. Safe and portable.
+ * Method 1 : direct access. This method is portable but violate C standard.
+ * It can generate buggy code on targets which assembly generation depends on alignment.
+ * But in some circumstances, it's the only known way to get the most performance (ie GCC + ARMv6)
+ * See https://fastcompression.blogspot.fr/2015/08/accessing-unaligned-memory.html for details.
+ * Prefer these methods in priority order (0 > 1)
+ */
The link to blogspot.fr has a lot more information than your summary in the code comments. It might be hard to understand this comment if the blogspot article were ever to disappear. Perhaps you could include a bit more of the relevant details?
+#ifndef PGLZ_FORCE_MEMORY_ACCESS /* can be defined externally */
+#if defined(__GNUC__) && \
+ ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) \
+ || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) )
+#define PGLZ_FORCE_MEMORY_ACCESS 1
+#endif
+#endif
I can understand wanting to set this on gcc + ARMv6, but doesn't this belong in a configure script rather than directly in the compression code?
The blogspot article indicates that the author lied about alignment to the compiler when using gcc on ARMv6, thereby generating a fast load instruction which happens to work on ARMv6. You appear to be using that same approach. Your #if defined(__GNUC__), seems to assume that all future versions of gcc will generate the instructions that you want, and not start generating some other set of instructions. Wouldn't you at least need a configure test to verify that the version of gcc being used generates the desired assembly? Even then, you'd be banking on gcc doing the same thing for the test code and for the pglz code, which I guess might not be true. Have you considered using inline assembly instead?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2021-03-19 19:37:04 | Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch |
Previous Message | Tom Lane | 2021-03-19 19:25:45 | Re: [PATCH] ProcessInterrupts_hook |