Re: JIT compiling with LLVM v9.0

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: JIT compiling with LLVM v9.0
Date: 2018-01-25 15:40:53
Message-ID: b84ae071-0931-e2eb-7207-f8d217246f98@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.01.2018 10:20, Andres Freund wrote:
> Hi,
>
> I've spent the last weeks working on my LLVM compilation patchset. In
> the course of that I *heavily* revised it. While still a good bit away
> from committable, it's IMO definitely not a prototype anymore.
>
> There's too many small changes, so I'm only going to list the major
> things. A good bit of that is new. The actual LLVM IR emissions itself
> hasn't changed that drastically. Since I've not described them in
> detail before I'll describe from scratch in a few cases, even if things
> haven't fully changed.
>
>
> == JIT Interface ==
>
> To avoid emitting code in very small increments (increases mmap/mremap
> rw vs exec remapping, compile/optimization time), code generation
> doesn't happen for every single expression individually, but in batches.
>
> The basic object to emit code via is a jit context created with:
> extern LLVMJitContext *llvm_create_context(bool optimize);
> which in case of expression is stored on-demand in the EState. For other
> usecases that might not be the right location.
>
> To emit LLVM IR (ie. the portabe code that LLVM then optimizes and
> generates native code for), one gets a module from that with:
> extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);
>
> to which "arbitrary" numbers of functions can be added. In case of
> expression evaluation, we get the module once for every expression, and
> emit one function for the expression itself, and one for every
> applicable/referenced deform function.
>
> As explained above, we do not want to emit code immediately from within
> ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed
> expression sets the function to callback, which gets the actual native
> function on the first actual call. That allows to batch together the
> generation of all native functions that are defined before the first
> expression is evaluated - in a lot of queries that'll be all.
>
> Said callback then calls
> extern void *llvm_get_function(LLVMJitContext *context, const char *funcname);
> which'll emit code for the "in progress" mutable module if necessary,
> and then searches all generated functions for the name. The names are
> created via
> extern void *llvm_get_function(LLVMJitContext *context, const char *funcname);
> currently "evalexpr" and deform" with a generation and counter suffix.
>
> Currently expression which do not have access to an EState, basically
> all "parent" less expressions, aren't JIT compiled. That could be
> changed, but I so far do not see a huge need.

Hi,

As far as I understand generation of native code is now always done for
all supported expressions and individually by each backend.
I wonder it will be useful to do more efforts to understand when
compilation to native code should be done and when interpretation is better.
For example many JIT-able languages like Lua are using traces, i.e.
query is first interpreted  and trace is generated. If the same trace is
followed more than N times, then native code is generated for it.

In context of DBMS executor it is obvious that only frequently executed
or expensive queries have to be compiled.
So we can use estimated plan cost and number of query executions as
simple criteria for JIT-ing the query.
May be compilation of simple queries (with small cost) should be done
only for prepared statements...

Another question is whether it is sensible to redundantly do expensive
work (llvm compilation) in all backends.
This question refers to shared prepared statement cache. But even
without such cache, it seems to be possible to use for library name some
signature of the compiled expression and allow
to share this libraries between backends. So before starting code
generation, ExecReadyCompiledExpr can first build signature and check if
correspondent library is already present.
Also it will be easier to control space used by compiled libraries in
this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-01-25 15:55:19 Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
Previous Message Tom Lane 2018-01-25 15:34:33 Re: CONSTANT/NOT NULL/initializer properties for plpgsql record variables