From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Takashi Menjo <takashi(dot)menjo(at)gmail(dot)com> |
Cc: | Takashi Menjo <takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp>, "Deng, Gang" <gang(dot)deng(at)intel(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PoC] Non-volatile WAL buffer |
Date: | 2020-11-27 00:02:58 |
Message-ID: | ef48ccde-ba57-f4d3-58ae-b249b847447c@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/26/20 10:19 PM, Tomas Vondra wrote:
>
>
> On 11/26/20 9:59 PM, Heikki Linnakangas wrote:
>> On 26/11/2020 21:27, Tomas Vondra wrote:
>>> Hi,
>>>
>>> Here's the "simple patch" that I'm currently experimenting with. It
>>> essentially replaces open/close/write/fsync with pmem calls
>>> (map/unmap/memcpy/persist variants), and it's by no means committable.
>>> But it works well enough for experiments / measurements, etc.
>>>
>>> The numbers (5-minute pgbench runs on scale 500) look like this:
>>>
>>> master/btt master/dax ntt simple
>>> -----------------------------------------------------------
>>> 1 5469 7402 7977 6746
>>> 16 48222 80869 107025 82343
>>> 32 73974 158189 214718 158348
>>> 64 85921 154540 225715 164248
>>> 96 150602 221159 237008 217253
>>>
>>> A chart illustrating these results is attached. The four columns are
>>> showing unpatched master with WAL on a pmem device, in BTT or DAX modes,
>>> "ntt" is the patch submitted to this thread, and "simple" is the patch
>>> I've hacked together.
>>>
>>> As expected, the BTT case performs poorly (compared to the rest).
>>>
>>> The "master/dax" and "simple" perform about the same. There are some
>>> differences, but those may be attributed to noise. The NTT patch does
>>> outperform these cases by ~20-40% in some cases.
>>>
>>> The question is why. I recall suggestions this is due to page faults
>>> when writing data into the WAL, but I did experiment with various
>>> settings that I think should prevent that (e.g. disabling WAL reuse
>>> and/or disabling zeroing the segments) but that made no measurable
>>> difference.
>>
>> The page faults are only a problem when mmap() is used *without* DAX.
>>
>> Takashi tried a patch earlier to mmap() WAL segments and insert WAL to
>> them directly. See 0002-Use-WAL-segments-as-WAL-buffers.patch at
>> https://www.postgresql.org/message-id/000001d5dff4%24995ed180%24cc1c7480%24%40hco.ntt.co.jp_1.
>> Could you test that patch too, please? Using your nomenclature, that
>> patch skips wal_buffers and does:
>>
>> clients -> wal segments (PMEM DAX)
>>
>> He got good results with that with DAX, but otherwise it performed
>> worse. And then we discussed why that might be, and the page fault
>> hypothesis was brought up.
>>
>
> D'oh, I haven't noticed there's a patch doing that. This thread has so
> many different patches - which is good, but a bit confusing.
>
>> I think 0002-Use-WAL-segments-as-WAL-buffers.patch is the most promising
>> approach here. But because it's slower without DAX, we need to keep the
>> current code for non-DAX systems. Unfortunately it means that we need to
>> maintain both implementations, selectable with a GUC or some DAX
>> detection magic. The question then is whether the code complexity is
>> worth the performance gin on DAX-enabled systems.
>>
>
> Sure, I can give it a spin. The question is whether it applies to
> current master, or whether some sort of rebase is needed. I'll try.
>
Unfortunately, that patch seems to fail for me :-(
The patches seem to be for PG12, so I applied them on REL_12_STABLE (all
the parts 0001-0005) and then I did this:
LIBS="-lpmem" ./configure --prefix=/home/tomas/pg-12-pmem --enable-debug
make -s install
initdb -X /opt/pmemdax/benchmarks/wal -D /opt/nvme/benchmarks/data
pg_ctl -D /opt/nvme/benchmarks/data/ -l pg.log start
createdb test
pgbench -i -s 500 test
which however fails after just about 70k rows generated (PQputline
failed), and the pg.log says this:
PANIC: could not open or mmap file
"pg_wal/000000010000000000000006": No such file or directory
CONTEXT: COPY pgbench_accounts, line 721000
STATEMENT: copy pgbench_accounts from stdin
Takashi-san, can you check and provide a fixed version? Ideally, I'll
take a look too, but I'm not familiar with this patch so it may take
more time.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Arne Roland | 2020-11-27 00:05:02 | Re: Rename of triggers for partitioned tables |
Previous Message | Arne Roland | 2020-11-26 23:28:31 | Rename of triggers for partitioned tables |