On 01/09/2014 11:30 PM, Claudio Freire wrote:
> On Thu, Jan 9, 2014 at 4:24 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> On 01/09/2014 09:46 PM, Claudio Freire wrote:
>>> On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> It would be nice to have better operating system support for this.
>>>> For example, IIUC, 64-bit Linux has 128TB of address space available
>>>> for user processes. When you clone(), it can either share the entire
>>>> address space (i.e. it's a thread) or none of it (i.e. it's a
>>>> process). There's no option to, say, share 64TB and not the other
>>>> 64TB, which would be ideal for us. We could then map dynamic shared
>>>> memory segments into the shared portion of the address space and do
>>>> backend-private allocations in the unshared part. Of course, even if
>>>> we had that, it wouldn't be portable, so who knows how much good it
>>>> would do. But it would be awfully nice to have the option.
>>> You can map a segment at fork time, and unmap it after forking. That
>>> doesn't really use RAM, since it's supposed to be lazily allocated (it
>>> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
>>> but I don't think that's portable).
>>>
>>> That guarantees it's free.
>>>
>>> Next, you can map shared memory at explicit addresses (linux's mmap
>>> has support for that, and I seem to recall Windows did too).
>>>
>>> All you have to do, is some book-keeping in shared memory (so all
>>> processes can coordinate new mappings).
>> As far as I undersand the main advantage of DSM is that segment can be
>> allocated at any time - not only at fork time.
>> And it is not because of memory consumption: even without unmap, allocation
>> of some memory region doesn't cause loose pg physical memory. And there are
>> usually no problem with exhaustion of virtual space at 64-bit architecture.
>> But using some combination of flags (as MAP_NORESERVE), it is usually
>> possible to completely eliminate overhead of reserving some address range in
>> virtual space. But mapping dynamically created segment (not at fork time) to
>> the same address really seems to be a big challenge.
> At fork time I only wrote about reserving the address space. After
> reserving it, all you have to do is implement an allocator that works
> in shared memory (protected by a lwlock of course).
>
> In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
> shared memory to coordinate returning an already mapped region (same
> address which is guaranteed to work since we reserved that region), or
> allocate one (within the reserved address space).
Why do we need named segments? There is ShmemAlloc function in
PostgreSQL API.
If RequestAddinShmemSpace can be used without requirement to place
module in preloaded list, then isn't it enough for most extensions?
And ShmemInitHash can be used to maintain named regions if it is needed...
So if we have some reserved address space, do we actually need some
special allocator for this space to allocate new segments in it?
Why existed API to shared memory is not enough?