Re: MMAP Buffers

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 22:43:31
Message-ID: 201104170043.32020.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Saturday 16 April 2011 17:02:32
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > What he did, I gather, is treat the mmapped buffers as a read-only
> > copy of the data. To actually make any modifications he copies it into
> > shared buffers and treats them like normal. When the buffers get
> > flushed from memory they get written and then the pointers get
> > repointed back at the mmapped copy.
>
> That seems much too late --- won't other processes still be looking at
> the stale mmap'ed version of the page until a write-out happens?
No, no, no :) I wanted to do this, but from above reason I skipped it. I swap
VM pages, I do remap, in place where the shared buffer was I put mmaped page,
and in place where mmaped page was I put shared page (in certain cases, which
should be optimized by e. g. read for update, for initial read of page in
process I directly points to shared buffer), it can be imagined as I affects
TLB. This what I call "VM swap" is remapping, so I don't change pointers, I
change only where this pointers points in physical memory, preserving same
pointer in Virtual Memory.

if 0x1 is start of buffer 1 (at relation 1, block 1)
I have
0x1 - 0x1 + BLCKSZ -> mmaped area
0xfffff1000 - 0xfffff1000 + BLCKSZ -> Shmem

SWAP
0x1 - 0x1 + BLCKSZ -> Shmem
0xfffff1000 - 0xfffff1000 + BLCKSZ -> mmaped area

It's reason I putted in crash reports /proc/{pid}/maps. For e. g. maps after
swap looks like (from crash report):

[...]
#Data mappings
7fe69b7e3000-7fe69b7ef000 r--s 00000000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fe69b7ef000-7fe69b7f1000 rw-s 00148000 00:04 8880132
/SYSV0052ea91 (deleted)
7fe69b7f1000-7fe6db7e3000 r--s 0000e000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
[...]
#SysV shmem mappings

7fec60788000-7fec6078c000 rw-s 00144000 00:04 8880132
/SYSV0052ea91 (deleted)
7fec6078c000-7fec6078e000 r--s 0000c000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fec6078e000-7fec6079c000 rw-s 0014a000 00:04 8880132
/SYSV0052ea91 (deleted)
[...]

Without swap 12516 should be mapped to one VM region of size equal to
BLCKSZ*BLOCKS_PER_SEGMENT (which is about 1GB).

When process reads buffer (or after taking lock), the shared buffer descriptor
is checked if page was modified (currently is it dirty) if yes do swap, if
page is currently in use, or use directly SysV shared areas if pages is just
pinned to process.

Regards,
Radek

> I'm pretty concerned about the memory efficiency of this too, since it
> seems like it's making it *guaranteed*, not just somewhat probable,
> that there are two copies in RAM of every database page that's been
> modified since the last checkpoint (or so).
>
> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2011-04-16 22:45:36 Re: ALTER TABLE INHERIT vs collations
Previous Message Tom Lane 2011-04-16 22:23:24 ALTER TABLE INHERIT vs collations