Quick Links

JIT compiling expressions/deform + inlining prototype v2.0

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	JIT compiling expressions/deform + inlining prototype v2.0
Date:	2017-09-01 06:41:31
Message-ID:	20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I previously had an early prototype of JITing [1] expression evaluation
and tuple deforming. I've since then worked a lot on this.

Here's an initial, not really pretty but functional, submission. This
supports all types of expressions, and tuples, and allows, albeit with
some drawbacks, inlining of builtin functions. Between the version at
[1] and this I'd done some work in c++, because that allowed to
experiment more with llvm, but I've now translated everything back.
Some features I'd to re-implement due to limitations of C API.

As a teaser:
tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
┌──────────────┬──────────────┬───────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │
├──────────────┼──────────────┼───────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────┼─────────────┤
│ A │ F │ 188818373 │ 283107483036.109 │ 268952035589.054 │ 279714361804.23 │ 25.5025937044707 │ 38237.6725307617 │ 0.0499976863510723 │ 7403889 │
│ N │ F │ 4913382 │ 7364213967.94998 │ 6995782725.6633 │ 7275821143.98952 │ 25.5321530459003 │ 38267.7833908406 │ 0.0500308669240696 │ 192439 │
│ N │ O │ 375088356 │ 562442339707.852 │ 534321895537.884 │ 555701690243.972 │ 25.4978961033505 │ 38233.9150565265 │ 0.0499956453049625 │ 14710561 │
│ R │ F │ 188960009 │ 283310887148.206 │ 269147687267.211 │ 279912972474.866 │ 25.5132328961366 │ 38252.4148049933 │ 0.0499958481590264 │ 7406353 │
└──────────────┴──────────────┴───────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴────────────────────┴─────────────┘
(4 rows)

Time: 4367.486 ms (00:04.367)
tpch_5[9586][1]=# set jit_expressions=1;set jit_tuple_deforming=1;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)

Time: 3158.575 ms (00:03.159)

tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)
Time: 4383.562 ms (00:04.384)

The potential wins of the JITing itself are considerably larger than the
already significant gains demonstrated above - this version here doesn't
exactly generate the nicest native code around. After these patches the
bottlencks for TCP-H's Q01 are largely inside the float* functions and
the non-expressionified execGrouping.c code. The latter needs to be
expressified to gain benefits due to JIT - that shouldn't be very hard.

The code generation can be improved by moving more of the variable data
into llvm allocated stack data, that also has other benefits.

The patch series currently consists out of the following:

0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch
- boring prep work

0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch
- for JITed deforming we need to know whether a slot's tupledesc will
change

0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch
- boring

0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch
- infrastructure for llvm, including memory lifetime management, and
bulk emission of functions.

0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch
- boring, prep work for expression jiting

0006-WIP-deduplicate-int-float-overflow-handling-code.patch
- boring

0007-Pass-through-PlanState-parent-to-expression-instanti.patch
- boring

0008-WIP-JIT-compile-expression.patch
- that's the biggest patch, actually adding JITing
- code needs to be better documented, tested, and deduplicated

0009-Simplify-aggregate-code-a-bit.patch
0010-More-efficient-AggState-pertrans-iteration.patch
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch
0012-Centralize-slot-deforming-logic-a-bit.patch
- boring, mostly to make comparison between JITed and non-jitted a bit
fairer and to remove unnecessary other bottlenecks.

0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch
- this isn't clean enough.

0014-WIP-JITed-tuple-deforming.patch

- do JITing of deforming, but only when called from within expression,
there we know which columns we want to be deformed etc.

- Not clear what'd be a good way to also JIT other deforming without
additional infrastructure - doing a separate function emission for
every slot_deform_tuple() is unattractive performancewise and
memory-lifetime wise, I did have that at first.

0015-WIP-Expression-based-agg-transition.patch
- allows to JIT aggregate transition invocation, but also speeds up
aggregates without JIT.

0016-Hacky-Preliminary-inlining-implementation.patch
- allows to inline functions, by using bitcode. That bitcode can be
loaded from a list of directories - as long as compatibly configured
the bitcode doesn't have to be generated by the same compiler as the
postgres binary. i.e. gcc postgres + clang bitcode works.

I've whacked this around quite heavily today, this likely has some new
bugs, sorry for that :(

I plan to spend some considerable time over the next weeks to clean this
up and address some of the areas where the performance isn't yet as good
as desirable.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de

Attachment	Content-Type	Size
0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch	text/x-diff	2.0 KB
0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch	text/x-diff	68.3 KB
0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch	text/x-diff	5.9 KB
0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch	text/x-diff	25.2 KB
0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch	text/x-diff	13.5 KB
0006-WIP-deduplicate-int-float-overflow-handling-code.patch	text/x-diff	10.1 KB
0007-Pass-through-PlanState-parent-to-expression-instanti.patch	text/x-diff	3.2 KB
0008-WIP-JIT-compile-expression.patch	text/x-diff	77.3 KB
0009-Simplify-aggregate-code-a-bit.patch	text/x-diff	9.7 KB
0010-More-efficient-AggState-pertrans-iteration.patch	text/x-diff	4.1 KB
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch	text/x-diff	2.3 KB
0012-Centralize-slot-deforming-logic-a-bit.patch	text/x-diff	8.2 KB
0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch	text/x-diff	1.4 KB
0014-WIP-JITed-tuple-deforming.patch	text/x-diff	26.0 KB
0015-WIP-Expression-based-agg-transition.patch	text/x-diff	62.9 KB
0016-Hacky-Preliminary-inlining-implementation.patch	text/x-diff	16.3 KB

Responses

Re: JIT & function naming at 2017-09-02 23:59:55 from Andres Freund
Re: JIT compiling expressions/deform + inlining prototype v2.0 at 2017-09-04 17:01:03 from Konstantin Knizhnik
Re: JIT compiling - v4.0 at 2017-10-04 06:48:09 from Andres Freund
fixed tuple descs (was JIT compiling expressions/deform) at 2017-12-06 09:37:17 from Andres Freund
JIT compiling with LLVM v9.0 at 2018-01-24 07:20:38 from Andres Freund
Re: JIT compiling with LLVM v10.0 at 2018-02-07 14:54:05 from Andres Freund
JIT compiling with LLVM v11 at 2018-03-01 08:02:42 from Andres Freund
JIT compiling with LLVM v12 at 2018-03-13 23:40:32 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Haribabu Kommi	2017-09-01 07:01:03	Re: utility commands benefiting from parallel plan
Previous Message	Tatsuro Yamada	2017-09-01 06:38:12	Re: CLUSTER command progress monitor