Quick Links

Re: Use pread and pwrite instead of lseek + write and read

From:	Oskari Saarenmaa <os(at)ohmu(dot)fi>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Use pread and pwrite instead of lseek + write and read
Date:	2016-09-15 06:55:47
Message-ID:	6e2a5f2c-8f16-6711-717a-9c7cff45546f@ohmu.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

17.08.2016, 22:11, Tom Lane kirjoitti:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I don't understand why you think this would create non-trivial
>> portability issues.
>
> The patch as submitted breaks entirely on platforms without pread/pwrite.
> Yes, we can add a configure test and some shim functions to fix that,
> but the argument that it makes the code shorter will get a lot weaker
> once we do.

I posted an updated patch which just calls lseek + read/write, the
code's still a lot shorter.

> I agree that adding such functions is pretty trivial, but there are
> reasons to think there are other hazards that are less trivial:
>
> First, a self-contained shim function will necessarily do an lseek every
> time, which means performance will get *worse* not better on non-pread
> platforms. And yes, the existing logic to avoid lseeks fires often enough
> to be worthwhile, particularly in seqscans.

This will only regress on platforms without pread. The only relevant
such platform appears to be Windows which has equivalent APIs.

FWIW, I ran the same pgbench benchmarks on my Linux system where I
always used lseek() + read/write instead of pread and pwrite - they ran
slightly faster than the previous code which saved seek positions, but I
suppose a workload with lots of seqscans could be slower.

Unfortunately I didn't save the actual numbers anywhere, but I can rerun
the benchmarks if you're interested. The numbers were pretty stable
across multiple runs.

> Second, I wonder whether this will break any kernel's readahead detection.
> I wouldn't be too surprised if successive reads (not preads) without
> intervening lseeks are needed to trigger readahead on at least some
> platforms. So there's a potential, both on platforms with pread and those
> without, for this to completely destroy seqscan performance, with
> penalties very far exceeding what we might save by avoiding some kernel
> calls.

At least Linux and FreeBSD don't seem to care how and why you read
pages, they'll do readahead regardless of the way you read files and
extend the readahead once you access previously readahead pages. They
disable readahead only if fadvise(POSIX_FADV_RANDOM) has been used.

I'd expect any kernel that implements mmap to also implement readahead
based on page usage rather than than the seek position. Do you know of
a kernel that would actually use the seek position for readahead?

> I'd be more excited about this if the claimed improvement were more than
> 1.5%, but you know as well as I do that that's barely above the noise
> floor for most performance measurements. I'm left wondering why bother,
> and why take any risk of de-optimizing on some platforms.

I think it makes sense to try to optimize for the platforms that people
actually use for performance critical workloads, especially if it also
allows us to simplify the code and remove more lines than we add. It's
nice if the software still works on legacy platforms, but I don't think
we should be concerned about a hypothetical performance impact on
platforms no one uses in production anymore.

/ Oskari

In response to

Re: Use pread and pwrite instead of lseek + write and read at 2016-08-17 19:11:15 from Tom Lane

Responses

Re: Use pread and pwrite instead of lseek + write and read at 2016-09-15 18:11:11 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Kirkwood	2016-09-15 08:41:19	Re: less expensive pg_buffercache on big shmem
Previous Message	Ashutosh Bapat	2016-09-15 06:29:48	Re: Push down more full joins in postgres_fdw