Re: In-Memory Columnar Store

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: knizhnik <knizhnik(at)garret(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject: Re: In-Memory Columnar Store
Date: 2013-12-11 18:33:08
Message-ID: CAHyXU0zpYpPpK5-keDrhBhXhBSA_qdK1DMhybwLRPzB_omsqdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 11, 2013 at 10:08 AM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer.
> Just defining insertion per-row trigger with empty procedure increase time
> of insertion of 6 million records twice - from 7 till 15 seconds. If trigger
> procedure is not empty, then time is increased proportionally number of
> performed calls.
> In my case inserting data with propagation it in columnar store using
> trigger takes about 80 seconds. But if I first load data without triggers in
> PostgreSQL table and then
> insert it in columnar store using load function (implemented in C), then
> time will be 7+9=16 seconds.

Yeah. For this problem, we either unfortunately have to try to try to
use standard sql functions in such away that supports inlining (this
is a black art mostly, and fragile), or move logic out of the function
and into the query via things like window functions, or just deal with
the performance hit. postgres flavored SQL is pretty much the most
productive language on the planet AFAIC, but the challenge is always
performance, performance.

Down the line, I am optimistic per call function overhead can be
optimized, probably by expanding what can be inlined somehow. The
problem is that this requires cooperation from the language executors
this is not currently possible through the SPI interface, so I really
don't know.

> Certainly I realize that plpgsql is interpreted language. But for example
> also interpreted Python is able to do 100 times more calls per second.
> Unfortunately profiler doesn;t show some bottleneck - looks like long
> calltime is caused by large overhead of initializing and resetting memory
> context and copying arguments data.
>
> 2. Inefficient implementation of expanding composite type columns using
> (foo()).* clause. In this case function foo() will be invoked as much times
> as there are fields in the returned composite type. Even in case of placing
> call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still
> sometimes performs redundant calls which can be avoided using hack with
> adding "OFFSET 1" clause.

Yeah, this is long standing headache. LATERAL mostly deals with this
but most cases (even with pre-9.3) can be worked around one way or
another.

> 3. 256Gb limit for used shared memory segment size at Linux.

I figure this will be solved fairly soon. It's a nice problem to have.

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-12-11 18:35:32 Re: autovacuum_work_mem
Previous Message Gavin Flower 2013-12-11 18:26:46 Re: ANALYZE sampling is too good