| From: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tobias Oberstein <tobias(dot)oberstein(at)gmail(dot)com>, "pgsql-hackers\(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. |
| Date: | 2018-04-16 06:13:30 |
| Message-ID: | 87604rg0n0.fsf@news-spur.riddles.org.uk |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>>>>> "Thomas" == Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
Thomas> * it's also been claimed that readahead heuristics are not
Thomas> defeated on Linux or FreeBSD, which isn't too surprising
Thomas> because you'd expect it to be about blocks being faulted in,
Thomas> not syscalls
I don't know about linux, but on FreeBSD, readahead/writebehind is
tracked at the level of open files but implemented at the level of
read/write clustering. I have patched kernels in the past to improve the
performance in mixed read/write cases; pg would benefit on unpatched
kernels from using separate file opens for backend reads and writes.
(The typical bad scenario is doing a create index, or other seqscan that
updates hint bits, on a freshly-restored table; the alternation of
reading block N and writing block N-x destroys the readahead/writebehind
since they use a common offset.)
The code that detects sequential behavior can not distinguish between
pread() and lseek+read, it looks only at the actual offset of the
current request compared to the previous one for the same fp.
Thomas> +1 for adopting pread()/pwrite() in PG12.
ditto
--
Andrew (irc:RhodiumToad)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kyotaro HORIGUCHI | 2018-04-16 07:17:40 | Re: Boolean partitions syntax |
| Previous Message | Thomas Munro | 2018-04-16 05:40:35 | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. |