From: | Radosław Smogura <rsmogura(at)softperience(dot)eu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: MMAP Buffers |
Date: | 2011-04-16 22:43:31 |
Message-ID: | 201104170043.32020.rsmogura@softperience.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Saturday 16 April 2011 17:02:32
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > What he did, I gather, is treat the mmapped buffers as a read-only
> > copy of the data. To actually make any modifications he copies it into
> > shared buffers and treats them like normal. When the buffers get
> > flushed from memory they get written and then the pointers get
> > repointed back at the mmapped copy.
>
> That seems much too late --- won't other processes still be looking at
> the stale mmap'ed version of the page until a write-out happens?
No, no, no :) I wanted to do this, but from above reason I skipped it. I swap
VM pages, I do remap, in place where the shared buffer was I put mmaped page,
and in place where mmaped page was I put shared page (in certain cases, which
should be optimized by e. g. read for update, for initial read of page in
process I directly points to shared buffer), it can be imagined as I affects
TLB. This what I call "VM swap" is remapping, so I don't change pointers, I
change only where this pointers points in physical memory, preserving same
pointer in Virtual Memory.
if 0x1 is start of buffer 1 (at relation 1, block 1)
I have
0x1 - 0x1 + BLCKSZ -> mmaped area
0xfffff1000 - 0xfffff1000 + BLCKSZ -> Shmem
SWAP
0x1 - 0x1 + BLCKSZ -> Shmem
0xfffff1000 - 0xfffff1000 + BLCKSZ -> mmaped area
It's reason I putted in crash reports /proc/{pid}/maps. For e. g. maps after
swap looks like (from crash report):
[...]
#Data mappings
7fe69b7e3000-7fe69b7ef000 r--s 00000000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fe69b7ef000-7fe69b7f1000 rw-s 00148000 00:04 8880132
/SYSV0052ea91 (deleted)
7fe69b7f1000-7fe6db7e3000 r--s 0000e000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
[...]
#SysV shmem mappings
7fec60788000-7fec6078c000 rw-s 00144000 00:04 8880132
/SYSV0052ea91 (deleted)
7fec6078c000-7fec6078e000 r--s 0000c000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fec6078e000-7fec6079c000 rw-s 0014a000 00:04 8880132
/SYSV0052ea91 (deleted)
[...]
Without swap 12516 should be mapped to one VM region of size equal to
BLCKSZ*BLOCKS_PER_SEGMENT (which is about 1GB).
When process reads buffer (or after taking lock), the shared buffer descriptor
is checked if page was modified (currently is it dirty) if yes do swap, if
page is currently in use, or use directly SysV shared areas if pages is just
pinned to process.
Regards,
Radek
> I'm pretty concerned about the memory efficiency of this too, since it
> seems like it's making it *guaranteed*, not just somewhat probable,
> that there are two copies in RAM of every database page that's been
> modified since the last checkpoint (or so).
>
> regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2011-04-16 22:45:36 | Re: ALTER TABLE INHERIT vs collations |
Previous Message | Tom Lane | 2011-04-16 22:23:24 | ALTER TABLE INHERIT vs collations |