Re: Postgres as In-Memory Database?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: Andreas Brandl <ml(at)3(dot)141592654(dot)de>, pgsql-general List <pgsql-general(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Postgres as In-Memory Database?
Date: 2013-11-20 19:33:05
Message-ID: CAMkU=1zrbL+Zn0KTaHkCWu2anfaauqVqqgNghWvcVscMQFG-bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Nov 18, 2013 at 5:39 PM, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:

> Hi Jeff and Martin
>
> On 18. November 2013 17:44 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > I rather doubt that. All the bottlenecks I know about for well cached
> read-only workloads are around
> > locking for in-memory concurrency protection, and have little or nothing
> to do with secondary storage.
>
> Interesting point. But I think this is only partially the case - as
> Stonebraker asserts [1]. While I don't see how to speed-up locking (and
> latching), AFAIK there is quite some room for enhancement in buffer pooling
> (see also [2]). Especially in GIS environments there are heavy calculations
> and random access operations - so buffer pool will play a role.
>

I don't know what you mean about enhancements in the buffer pool. For an
in-memory database, there shouldn't be a buffer pool in the first place, as
it is *all* in memory. But anyway, it seems to me like the imported
database size for the OpenStreetMap is going to be about 500GB
(extrapolating from Antarctica, the only file I could download within a
reasonable time), and none of the servers listed in the benchmark had
anywhere near that amount of memory, so that has little chance of working
as an in-memory database.

Improvements made for just in-memory situations suddenly become worthless
if your data grows 10% larger and no longer fits in memory, while
improvements that work for everyone work for everyone. There is a place
for both, but it is not surprising that a project to make general-purpose
tools spends more time on the latter than the former.

To Martin: Stonebraker explicitly supports my hypothesis that in-memory
> databases become prevalent in the future and that "elephants" will be
> challenged if they don't adapt to new architectures, like in-memory and
> column stores.
>

This would be more persuasive if there were impressive osm2vertica or
osm2oracle benchmarking results to show. Otherwise the claims just look
like commercial marketing material to me. I'm not saying there are no
improvements to be made, but the way to make them is to figure out where
the bottlenecks are, not read other people's advertisements and chase them.

The specific use case here is a PostGIS query of an OpenStreetMap data of
> the whole world (see [3]).
>
> On 2013/11/18 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> >> On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller <sfkeller(at)gmail(dot)com>
> wrote:
> >> BTW: Having said (to Martijn) that using Postgres is probably more
> efficient, than programming an in-memory
> > database in a decent language: OpenStreetMap has a very, very large Node
> table which is heavily
> > used by other tables (like ways) - and becomes rather slow in Postgres.
> >
> > Do you know why it is slow? I'd give high odds that it would be a
> specific implementation detail in
> > the code that is suboptimal, or maybe a design decision of PostGIS,
> rather than some high level
> > architectural decision of PostgreSQL.
>
> Referring to the application is something you can always say - but
> shouldn't prevent on enhancing Postgres.
>

Postgres has been enhanced. Now we need to change osm2pgsql to take
advantage of them. It defines indexes on the tables that are going to be
bulk loaded with COPY, which defeats some recent optimizations made to
COPY. The creation of the indexes should be delayed until after the bulk
load is done.

A further enhancement to Postgres would be would be to automatically defer
creation of the indexes when a table is truncated or created within a
transaction, so that users get the benefit of the improvement even without
taking special steps. But that would be quite a bit of work, and would
probably also change user-visible behavior, in that violations of unique
constraints would not be detected until later in the process.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steven Dodd 2013-11-20 19:57:04 Does LC_CTYPE affect performance, index use?
Previous Message Rob Sargent 2013-11-20 17:31:44 Re: How to change content of xml datatype