Re: RFC: Lock-free XLog Reservation from WAL

From: Japin Li <japinli(at)hotmail(dot)com>
To: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: RFC: Lock-free XLog Reservation from WAL
Date: 2025-01-06 03:32:21
Message-ID: ME0P300MB044511A4CE4FDEB163EBD830B6102@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi, Zhigou

Thanks for the patch.

On Thu, 02 Jan 2025 at 09:14, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com> wrote:
> Hi all,
>
> I am reaching out to solicit your insights and comments on a recent
> proposal regarding the "Lock-free XLog Reservation from WAL." We have
> identified some challenges with the current WAL insertions, which
> require space reservations in the WAL buffer which involve updating
> two shared-memory statuses in XLogCtlInsert: CurrBytePos (the start
> position of the current XLog) and PrevBytePos (the prev-link to the
> previous XLog). Currently, the use of XLogCtlInsert.insertpos_lck
> ensures consistency but introduces lock contention, hindering the
> parallelism of XLog insertions.
>
> To address this issue, we propose the following changes:
>
> 1. Removal of PrevBytePos: This will allow increments of the CurrBytePos (a single uint64 field) to be implemented with an atomic operation (fetch_add).
> 2. Updating Prev-Link of next XLog: Based on the fact that the
> prev-link of the next XLog always points to the head of the current
> Xlog,we will slightly exceed the reserved memory range of the current
> XLog to update the prev-link of the next XLog, regardless of which
> backend acquires the next memory space. The next XLog inserter will
> wait until its prev-link is updated for CRC calculation before
> starting its own XLog copy into the WAL.
> 3. Breaking Sequential Write Convention: Each backend will update the
> prev-link of its next XLog first, then return to the header position
> for the current log insertion. This change will reduce the dependency
> of XLog writes on previous ones (compared with the sequential writes).
> 4. Revised GetXLogBuffer: To support #3, we need update this function to separate the LSN it intends to access from the LSN it expects to update in the insertingAt field.
> 5. Increase NUM_XLOGINSERT_LOCKS: With the above changes, increasing NUM_XLOGINSERT_LOCKS, for example to 128, could effectively enhance the parallelism.
>
> The attached patch could pass the regression tests (make check, make
> check-world), and in the performance test of this POC on SPR (480
> vCPU) shows that this optimization could help the TPCC benchmark
> better scale with the core count and as a result the performance with
> full cores enabled could be improved by 2.04x.
>

I tried it and found it cannot pass all test-cases. Here is the output
comes from Rocky 9.5.

mkdir build && cd build
../configure \
--prefix=/tmp/pg \
--enable-tap-tests \
--enable-debug \
--enable-cassert \
--enable-depend \
--enable-dtrace \
--with-icu \
--with-openssl \
--with-python \
--with-libxml \
--with-libxslt \
--with-lz4 \
--with-pam \
CFLAGS='-Wmissing-prototypes -Wincompatible-pointer-types'
make -j $(nproc) -s && make install -s
(cd contrib/ && make -j $(nproc) -s && make install -s)
make check-world

The error as follows:

echo "# +++ tap check in src/test/recovery +++" && rm -rf '/home/japin/postgresql/build/src/test/recovery'/tmp_check && /usr/bin/mkdir -p '/home/japin/postgresql/build/src/test/recovery'/tmp_check && cd /home/japin/postgresql/build/../src/test/recovery && TESTLOGDIR='/home/japin/postgresql/build/src/test/recovery/tmp_check/log' TESTDATADIR='/home/japin/postgresql/build/src/test/recovery/tmp_check' PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/bin:/home/japin/postgresql/build/src/test/recovery:$PATH" LD_LIBRARY_PATH="/home/japin/postgresql/build/tmp_install/home/japin/postgresql/build/pg/lib" INITDB_TEMPLATE='/home/japin/postgresql/build'/tmp_install/initdb-template PGPORT='65432' top_builddir='/home/japin/postgresql/build/src/test/recovery/../../..' PG_REGRESS='/home/japin/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress' /usr/bin/prove -I /home/japin/postgresql/build/../src/test/perl/ -I /home/japin/postgresql/build/../src/test/recovery t/*.pl
# +++ tap check in src/test/recovery +++
t/001_stream_rep.pl ................... ok
t/002_archiving.pl .................... 1/? Bailout called. Further testing stopped: command "pg_ctl -D /home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata -l /home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.log promote" exited with value 1
FAILED--Further testing stopped: command "pg_ctl -D /home/japin/postgresql/build/src/test/recovery/tmp_check/t_002_archiving_standby_data/pgdata -l /home/japin/postgresql/build/src/test/recovery/tmp_check/log/002_archiving_standby.log promote" exited with value 1
make[2]: *** [Makefile:28: check] Error 255
make[2]: Leaving directory '/home/japin/postgresql/build/src/test/recovery'
make[1]: *** [Makefile:42: check-recovery-recurse] Error 2
make[1]: Leaving directory '/home/japin/postgresql/build/src/test'
make: *** [GNUmakefile:71: check-world-src/test-recurse] Error 2

--
Regrads,
Japin Li

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2025-01-06 04:23:46 Re: Allow NOT VALID foreign key constraints on partitioned tables.
Previous Message Peter Smith 2025-01-06 03:17:14 Re: Documentation update of wal_retrieve_retry_interval to mention table sync worker