[PATCH] Hex-coding optimizations using SVE on ARM.

From: "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: [PATCH] Hex-coding optimizations using SVE on ARM.
Date: 2025-01-09 11:22:05
Message-ID: OSZPR01MB8499D4884C4541159FA00ECC8B132@OSZPR01MB8499.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

This email aims to discuss the contribution of optimized hex_encode and hex_decode functions for ARM (aarch64) machines. These functions are widely used for encoding and decoding binary data in the bytea data type.

The current method for hex_encode and hex_decode relies on a scalar implementation that processes data byte by byte, with no SIMD-based optimization available. With the introduction of SVE optimizations, we leverage CPU intrinsics to process larger data blocks in parallel, significantly reducing encoding and decoding latency.

We have designed this feature to ensure compatibility and robustness. It includes compile-time and runtime checks for SVE compatibility with both the compiler and hardware. If either check fails, the code falls back to the existing scalar implementation, ensuring fail-safe operation.

For the architecture-specific functions, we have used pg_attribute_target("arch=armv8-a+sve") to compile, enabling precise compiler control without using extra global CFLAGS.

System Configurations
Machine: AWS EC2 m7g.4xlarge
OS: Ubuntu 22.04
GCC: 11.4

Benchmark and Results
Setup:
We have developed a microbenchmark based on [0] to evaluate the performance of the SVE-enabled hex_encode and hex_decode functions compared to the default implementation across various input sizes. The microbenchmark patch is attached in the mail.

Query:
time psql -c "select hex_decode_test(1000000, input_size);"
time psql -c "select hex_decode_test(1000000, input_size);"
The query was executed for input sizes ranging from 8 to 262144 bytes.

Results:
Significant speed-up in query performance has been observed up to 17 times for hex_encode and up to 4 times for hex_decode.

Additionally, we tested the optimization with the bytea data type on a table of size 1435 MB containing two columns: the first an auto-incrementing ID and the second a bytea column holding binary data. We then ran the query "SELECT data FROM bytea_table" using a script and recorded the time taken by hex_encode using perf. The results are presented below.

Default scalar implementation:
Query exec time: 2.858 sec
hex_encode function time: 1.228 sec

SVE-enabled implementation:
Query exec time: 1.654 sec (approximately 1.7 times improvement)
hex_encode_sve function time: 0.085 sec (approximately 14.44 times improvement)

Improvements using SVE are noticeable starting from an input size of 16 bytes for hex_encode and 32 bytes for hex_decode. Hence, SVE implementations are used only when the input size surpasses these thresholds.

We would like to contribute our above work so that it can be available for the community to utilize. To do so, we are following the procedure mentioned in Submitting a Patch - PostgreSQL wiki<https://wiki.postgresql.org/wiki/Submitting_a_Patch>. Please find the attachment for the patches and performance results.

Please let us know if you have any queries or suggestions.

Thanks & Regards,
Susmitha Devanga.

[0] https://postgr.es/m/CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=ghLQ@mail.gmail.com

Attachment Content-Type Size
image/png 71.3 KB
hex_encode_woFlags.PNG image/png 82.9 KB
v1-0001-test-module-for-hex-coding 1.patch application/octet-stream 4.7 KB
v1-0001-SVE-support-for-hex-encode-and-hex-decode.patch application/octet-stream 15.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-01-09 11:29:22 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Nazir Bilal Yavuz 2025-01-09 11:20:16 Re: Make pg_stat_io view count IOs as bytes instead of blocks