From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? |
Date: | 2022-08-09 07:30:35 |
Message-ID: | CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 8, 2022 at 6:10 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> I played with a simple insert use-case [1] that generates ~380 WAL
> files, with different block sizes. To my surprise, I have not seen any
> improvement with larger block sizes. I may be doing something wrong
> here, suggestions on to test and see the benefits are welcome.
>
> > > I think this should also handle the remainder after processing whole
> > > blocks, just for completeness. If I call the code as presented with size
> > > 8193, I think this code will only write 8192 bytes.
> >
> > Hm, I will fix it.
>
> Fixed.
>
> I'm attaching v5 patch-set. I've addressed review comments received so
> far and fixed a compiler warning that CF bot complained about.
>
> Please review it further.
I tried to vary the zero buffer size to see if there's any huge
benefit for the WAL-generating queries. Unfortunately, I didn't see
any benefit on my dev system (16 vcore, 512GB SSD, 32GB RAM) . The use
case I've tried is at [1] and the results are at [2].
Having said that, the use of pg_pwritev_with_retry() in walmethods.c
will definitely reduce number of system calls - on HEAD the
dir_open_for_write() makes pad_to_size/XLOG_BLCKSZ i.e. 16MB/8KB =
2,048 write() calls and with patch it makes only 64
pg_pwritev_with_retry() calls with XLOG_BLCKSZ zero buffer size. The
proposed patches will provide straight 32x reduction in system calls
(for pg_receivewal and pg_basebackup) apart from the safety against
partial writes.
[1]
/* built source code with release flags */
./configure --with-zlib --enable-depend --prefix=$PWD/inst/
--with-openssl --with-readline --with-perl --with-libxml CFLAGS='-O2'
> install.log && make -j 8 install > install.log 2>&1 &
\q
./pg_ctl -D data -l logfile stop
rm -rf data
/* ensured that nothing exists in OS page cache */
free -m
sudo su
sync; echo 3 > /proc/sys/vm/drop_caches
exit
free -m
./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "64GB";'
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET work_mem = "16MB";'
./psql -d postgres -c 'ALTER SYSTEM SET checkpoint_timeout = "1d";'
./pg_ctl -D data -l logfile restart
./psql -d postgres -c 'create table foo(bar int);'
./psql -d postgres
\timing
insert into foo select * from generate_series(1, 100000000); /* this
query generates about 385 WAL files, no checkpoint hence no recycle of
old WAL files, all new WAL files */
[2]
HEAD
Time: 84249.535 ms (01:24.250)
HEAD with wal_init_zero off
Time: 75086.300 ms (01:15.086)
#define PWRITEV_BLCKSZ XLOG_BLCKSZ
Time: 85254.302 ms (01:25.254)
#define PWRITEV_BLCKSZ (4 * XLOG_BLCKSZ)
Time: 83542.885 ms (01:23.543)
#define PWRITEV_BLCKSZ (16 * XLOG_BLCKSZ)
Time: 84035.770 ms (01:24.036)
#define PWRITEV_BLCKSZ (64 * XLOG_BLCKSZ)
Time: 84749.021 ms (01:24.749)
#define PWRITEV_BLCKSZ (256 * XLOG_BLCKSZ)
Time: 84273.466 ms (01:24.273)
#define PWRITEV_BLCKSZ (512 * XLOG_BLCKSZ)
Time: 84233.576 ms (01:24.234)
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2022-08-09 08:12:08 | Re: Using each rel as both outer and inner for JOIN_ANTI |
Previous Message | Peter Smith | 2022-08-09 07:14:54 | Re: Data is copied twice when specifying both child and parent table in publication |