From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_preadv() and pg_pwritev() |
Date: | 2020-12-22 11:06:50 |
Message-ID: | CA+hUKG+TCoareSYcAu05yaMh9eUck6Hd8FQZt5VEvotNs3YOMQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 21, 2020 at 11:40 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2020-12-20 16:26:42 +1300, Thomas Munro wrote:
> > > 1. port.h cannot assume that <limits.h> has already been included;
> > > nor do I want to fix that by including <limits.h> there. Do we
> > > really need to define a fallback value of IOV_MAX? If so,
> > > maybe the answer is to put the replacement struct iovec and
> > > IOV_MAX in some new header.
> >
> > Ok, I moved all this stuff into port/pg_uio.h.
>
> Can we come up with a better name than 'uio'? I find that a not exactly
> meaningful name.
Ok, let's try port/pg_iovec.h.
> Or perhaps we could just leave the functions in port/port.h, but extract
> the value of the define at configure time? That way only pread.c /
> pwrite.c would need it.
That won't work for the struct definition, so client code would need
to remember to do:
#ifdef HAVE_SYS_UIO_H
#include <sys/uio.h>
#endif
... which is a little tedious, or port.h would need to do that and
incur an overhead in every translation unit, which Tom objected to.
That's why I liked the separate header idea.
> > > 3. The patch as given won't prove anything except that the code
> > > compiles. Is it worth fixing at least one code path to make
> > > use of pg_preadv and pg_pwritev, so we can make sure this code
> > > is tested before there's layers of other new code on top?
> >
> > OK, here's a patch to zero-fill fresh WAL segments with pwritev().
>
> I think that's a good idea. However, I see two small issues: 1) If we
> write larger amounts at once, we need to handle partial writes. That's a
> large enough amount of IO that it's much more likely to hit a memory
> shortage or such in the kernel - we had to do that a while a go for WAL
> writes (which can also be larger), if memory serves.
>
> Perhaps we should have pg_pwritev/readv unconditionally go through
> pwrite.c/pread.c and add support for "continuing" partial writes/reads
> in one central place?
Ok, here's a new version with the following changes:
1. Define PG_IOV_MAX, a reasonable size for local variables, not
larger than IOV_MAX.
2 Use 32 rather than 1024, based on off-list complaint about 1024
potentially swamping the IO system unfairly.
3. Add a wrapper pg_pwritev_retry() to retry automatically on short
writes. (I didn't write pg_preadv_retry(), because I don't currently
need it for anything so I didn't want to have to think about EOF vs
short-reads-for-implementation-reasons.)
4. I considered whether pg_pwrite() should have built-in retry
instead of a separate wrapper, but I thought of an argument against
hiding the "raw" version: the AIO patch set already understands short
reads/writes and knows how to retry at a higher level, as that's
needed for native AIO too, so I think it makes sense to be able to
keep things the same and not solve the same problem twice. A counter
argument would be that you could get the retry underway sooner with a
tight loop, but I'm not expecting this to be common.
> > I'm drawing a blank on trivial candidate uses for preadv(), without
> > infrastructure from later patches.
>
> Can't immediately think of something either.
One idea I had for the future is for xlogreader.c to read the WAL into
a large multi-page circular buffer, which could wrap around using a
pair of iovecs, but that requires lots more work .
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Add-pg_preadv-and-pg_pwritev.patch | text/x-patch | 14.9 KB |
v3-0002-Use-vectored-I-O-to-zero-WAL-segments.patch | text/x-patch | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2020-12-22 11:13:12 | Re: Single transaction in the tablesync worker? |
Previous Message | Bharath Rupireddy | 2020-12-22 10:32:24 | Re: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped |