Quick Links

Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-05 18:28:16
Message-ID:	52C9A440.7010605@garret.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-announce pgsql-hackers

From my point of view it is not a big problem that it is not possible
to place LWLock in DSM.
I can allocate LWLocks in standard way - using RequestAddinLWLocks and
use them for synchronization.

Concerning support of huge pages - actually I do not think that it
should involve something more than just setting MAP_HUGETLB flag.
Allocation of correspondent number of huge pages should be done by
system administrator.

And what I still do not completely understand - how DSM enforces that
segment created by one PosatgreSQL process will be mapped to the same
virtual memory address in all other PostgreSQL processes.
As far as I understand right now (with standard PostgreSQL shared memory
segments) it is enforced by fork().
Shared memory segments are allocated in one process and all other
processes are forked from this process inheriting this memory segments.

But if new DSM segment is allocated at during execution of some query,
then we should add it to virtual space of all PostgreSQL processes. Even
if we somehow notify them all about presence of new segment, there is
absolutely no warranty that all of them can map this segment to the
specified memory address (it can be for some reasons already used by
some other shared object).
Or may be DSM doesn't guarantee than DSM segment is mapped to the same
address in all processes?
In this case it significantly complicates DSM usage: it will not be
possible to use direct pointers.

Can you clarify me please how dynamically allocated DSM segments will be
shared by all PostgreSQL processes?

On 01/05/2014 08:50 PM, Robert Haas wrote:
> On Sat, Jan 4, 2014 at 3:27 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> 1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
>> shared memory), like 9.2, 9.3.1,...
> Yeah. If it's loaded at postmaster start time, then it can work with
> any version. On 9.4+, you could possibly make it work even if it's
> loaded on the fly by using the dynamic shared memory facilities.
> However, there are currently some limitations to those facilities that
> make some things you might want to do tricky. There are pending
> patches to lift some of these limitations.
>
>> 2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
>> hash_search,...)
>> May be I missed something - I just noticed DSM and have no chance to
>> investigate it, but looks like hash table can not be allocated in DSM...
> It wouldn't be very difficult to write an analog of ShmemInitHash() on
> top of the dsm_toc patch that is currently pending. A problem,
> though, is that it's not currently possible to put LWLocks in dynamic
> shared memory, and even spinlocks will be problematic if
> --disable-spinlocks is used. I'm due to write a post about these
> problems; perhaps I should go do that.
>
>> 3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
>> to provide own allocator (although creation of non-releasing memory
>> allocator should not be a big issue).
> The dsm_toc infrastructure would solve this problem.
>
>> 4. Current implementation of DSM still suffers from 256Gb problem. Certainly
>> I can create multiple segments and so provide workaround without using huge
>> pages, but it complicates allocator.
> So it sounds like DSM should also support huge pages somehow. I'm not
> sure what that should look like.
>
>> 5. I wonder if I dynamically add new DSM segment - will it be available for
>> other PostgreSQL processes? For example I run query which loads data in IMCS
>> and so needs more space and allocates new DSM segment. Then another query is
>> executed by other PostgreSQL process which tries to access this data. This
>> process is not forked from the process created this new DSM segment, so I do
>> not understand how this segment will be mapped to the address space of this
>> process, preserving address... Certainly I can prohibit dynamic extension of
>> IMCS storage (hoping that in this case there will be no such problem with
>> DSM). But in this case we will loose the main advantage of using DSM instead
>> of old schema of plugin's private shared memory.
> You can definitely dynamically add a new DSM segment; that's the point
> of making it *dynamic* shared memory. What's a bit tricky as things
> stand today is making sure that it sticks around. The current model
> is that the DSM segment is destroyed when the last process unmaps it.
> It would be easy enough to lift that limitation on systems other than
> Windows; we could just add a dsm_keep_until_shutdown() API or
> something similar. But on Windows, segments are *automatically*
> destroyed *by the operating system* when the last process unmaps them,
> so it's not quite so clear to me how we can allow it there. The main
> shared memory segment is no problem because the postmaster always has
> it mapped, even if no one else does, but that doesn't help for dynamic
> shared memory segments.
>
>> 6. IMCS has some configuration parameters which has to be set through
>> postgresql.conf. So in any case user has to edit postgresql.conf file.
>> In case of using DSM it will be not necessary to add IMCS to
>> shared_preload_libraries list. But I do not think that it is so restrictive
>> and critical requirement, is it?
> I don't really see a problem here. One of the purposes of dynamic
> shared memory (and dynamic background workers) is precisely that you
> don't *necessarily* need to put extensions that use shared memory in
> shared_preload_libraries - or in other words, you can add the
> extension to a running server without restarting it. If you know in
> advance that you will want it, you probably still *want* to put it in
> shared_preload_libraries, but part of the idea is that we can get away
> from requiring that.
>

In response to

Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL at 2014-01-05 16:50:48 from Robert Haas

Responses

Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL at 2014-01-06 03:11:33 from Robert Haas

Browse pgsql-announce by date

	From	Date	Subject
Next Message	james	2014-01-05 18:44:38	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Previous Message	Robert Haas	2014-01-05 18:02:43	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	james	2014-01-05 18:44:38	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Previous Message	Gabriele Bartolini	2014-01-05 18:27:36	Re: [PATCH] Support for pg_stat_archiver view