From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: JIT compiling with LLVM v10.1 |
Date: | 2018-02-15 18:11:59 |
Message-ID: | 20180215181159.rprewuofvbmovrej@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2018-02-15 11:59:46 +0300, Konstantin Knizhnik wrote:
> It is well known fact that Postgres spends most of the time in sequence scan
> queries for warm data in deforming tuples (17% in case of TPC-H Q1).
I think that the majority of the time therein is not actually
bottlenecked by CPU, but by cache misses. It might be worthwhile to
repeat your analysis with the last patch of my series applied, and the
#define FASTORDER
uncommented.
> Postgres tries to optimize access to the tuple by caching fixed size
> offsets to the fields whenever possible and loading attributes on demand.
> It is also well know recommendation to put fixed size, non-null, frequently
> used attributes at the beginning of table's attribute list to make this
> optimization work more efficiently.
FWIW, I think this optimization causes vastly more trouble than it's
worth.
> You can see in the code of heap_deform_tuple shows that first NULL value
> will switch it to "slow" mode:
Note that in most workloads the relevant codepath isn't
heap_deform_tuple but slot_deform_tuple.
> 1. Modern platforms are mostly limited by memory access time, number of
> performed instructions is less critical.
I don't think this is quite the correct result. Especially because a lot
of time is spent accessing memory, having code that the CPU can execute
out-of-order (by speculatively executing forward) is hugely
beneficial. Some of the benefit of JITing comes from being able to
start deforming the next field while memory fetches for the previous one
are still ongoing (iff dealing with fixed width cols).
> 2. For large number of attributes JIT-ing of deform tuple can improve speed
> up to two time. Which is quite good result from my point of view.
+1
Note the last version has a small deficiency in decoding varlena datums
that I need to fix (varsize_any isn't inlined anymore).
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2018-02-15 19:47:48 | Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently |
Previous Message | Andres Freund | 2018-02-15 18:04:52 | Re: Add void cast to StaticAssertExpr? |