From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | JIT compiling expressions/deform + inlining prototype v2.0 |
Date: | 2017-09-01 06:41:31 |
Message-ID: | 20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I previously had an early prototype of JITing [1] expression evaluation
and tuple deforming. I've since then worked a lot on this.
Here's an initial, not really pretty but functional, submission. This
supports all types of expressions, and tuples, and allows, albeit with
some drawbacks, inlining of builtin functions. Between the version at
[1] and this I'd done some work in c++, because that allowed to
experiment more with llvm, but I've now translated everything back.
Some features I'd to re-implement due to limitations of C API.
As a teaser:
tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
┌──────────────┬──────────────┬───────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │
├──────────────┼──────────────┼───────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────┼─────────────┤
│ A │ F │ 188818373 │ 283107483036.109 │ 268952035589.054 │ 279714361804.23 │ 25.5025937044707 │ 38237.6725307617 │ 0.0499976863510723 │ 7403889 │
│ N │ F │ 4913382 │ 7364213967.94998 │ 6995782725.6633 │ 7275821143.98952 │ 25.5321530459003 │ 38267.7833908406 │ 0.0500308669240696 │ 192439 │
│ N │ O │ 375088356 │ 562442339707.852 │ 534321895537.884 │ 555701690243.972 │ 25.4978961033505 │ 38233.9150565265 │ 0.0499956453049625 │ 14710561 │
│ R │ F │ 188960009 │ 283310887148.206 │ 269147687267.211 │ 279912972474.866 │ 25.5132328961366 │ 38252.4148049933 │ 0.0499958481590264 │ 7406353 │
└──────────────┴──────────────┴───────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴────────────────────┴─────────────┘
(4 rows)
Time: 4367.486 ms (00:04.367)
tpch_5[9586][1]=# set jit_expressions=1;set jit_tuple_deforming=1;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)
Time: 3158.575 ms (00:03.159)
tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)
Time: 4383.562 ms (00:04.384)
The potential wins of the JITing itself are considerably larger than the
already significant gains demonstrated above - this version here doesn't
exactly generate the nicest native code around. After these patches the
bottlencks for TCP-H's Q01 are largely inside the float* functions and
the non-expressionified execGrouping.c code. The latter needs to be
expressified to gain benefits due to JIT - that shouldn't be very hard.
The code generation can be improved by moving more of the variable data
into llvm allocated stack data, that also has other benefits.
The patch series currently consists out of the following:
0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch
- boring prep work
0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch
- for JITed deforming we need to know whether a slot's tupledesc will
change
0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch
- boring
0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch
- infrastructure for llvm, including memory lifetime management, and
bulk emission of functions.
0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch
- boring, prep work for expression jiting
0006-WIP-deduplicate-int-float-overflow-handling-code.patch
- boring
0007-Pass-through-PlanState-parent-to-expression-instanti.patch
- boring
0008-WIP-JIT-compile-expression.patch
- that's the biggest patch, actually adding JITing
- code needs to be better documented, tested, and deduplicated
0009-Simplify-aggregate-code-a-bit.patch
0010-More-efficient-AggState-pertrans-iteration.patch
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch
0012-Centralize-slot-deforming-logic-a-bit.patch
- boring, mostly to make comparison between JITed and non-jitted a bit
fairer and to remove unnecessary other bottlenecks.
0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch
- this isn't clean enough.
0014-WIP-JITed-tuple-deforming.patch
- do JITing of deforming, but only when called from within expression,
there we know which columns we want to be deformed etc.
- Not clear what'd be a good way to also JIT other deforming without
additional infrastructure - doing a separate function emission for
every slot_deform_tuple() is unattractive performancewise and
memory-lifetime wise, I did have that at first.
0015-WIP-Expression-based-agg-transition.patch
- allows to JIT aggregate transition invocation, but also speeds up
aggregates without JIT.
0016-Hacky-Preliminary-inlining-implementation.patch
- allows to inline functions, by using bitcode. That bitcode can be
loaded from a list of directories - as long as compatibly configured
the bitcode doesn't have to be generated by the same compiler as the
postgres binary. i.e. gcc postgres + clang bitcode works.
I've whacked this around quite heavily today, this likely has some new
bugs, sorry for that :(
I plan to spend some considerable time over the next weeks to clean this
up and address some of the areas where the performance isn't yet as good
as desirable.
Greetings,
Andres Freund
[1] http://archives.postgresql.org/message-id/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de
From | Date | Subject | |
---|---|---|---|
Next Message | Haribabu Kommi | 2017-09-01 07:01:03 | Re: utility commands benefiting from parallel plan |
Previous Message | Tatsuro Yamada | 2017-09-01 06:38:12 | Re: CLUSTER command progress monitor |