From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Takashi Menjo <takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PoC] Non-volatile WAL buffer |
Date: | 2020-01-27 18:54:38 |
Message-ID: | CA+TgmoZWvm36GyYNDn3gksVAkuPrc86G9W4of8AgYR=SSU7Lmw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 27, 2020 at 2:01 AM Takashi Menjo
<takashi(dot)menjou(dot)vg(at)hco(dot)ntt(dot)co(dot)jp> wrote:
> It sounds reasonable, but I'm sorry that I haven't tested such a program
> yet. I'll try it to compare with my non-volatile WAL buffer. For now, I'm
> a little worried about the overhead of mmap()/munmap() for each WAL segment
> file.
I guess the question here is how the cost of one mmap() and munmap()
pair per WAL segment (normally 16MB) compares to the cost of one
write() per block (normally 8kB). It could be that mmap() is a more
expensive call than read(), but by a small enough margin that the
vastly reduced number of system calls makes it a winner. But that's
just speculation, because I don't know how heavy mmap() actually is.
I have a different concern. I think that, right now, when we reuse a
WAL segment, we write entire blocks at a time, so the old contents of
the WAL segment are overwritten without ever being read. But that
behavior might not be maintained when using mmap(). It might be that
as soon as we write the first byte to a mapped page, the old contents
have to be faulted into memory. Indeed, it's unclear how it could be
otherwise, since the VM page must be made read-write at that point and
the system cannot know that we will overwrite the whole page. But
reading in the old contents of a recycled WAL file just to overwrite
them seems like it would be disastrously expensive.
A related, but more minor, concern is whether there are any
differences in in the write-back behavior when modifying a mapped
region vs. using write(). Either way, the same pages of the same file
will get dirtied, but the kernel might not have the same idea in
either case about when the changed pages should be written back down
to disk, and that could make a big difference to performance.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-01-27 19:01:09 | Re: JIT performance bug/regression & JIT EXPLAIN |
Previous Message | Andres Freund | 2020-01-27 17:41:03 | Re: JIT performance bug/regression & JIT EXPLAIN |