From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Why JIT speed improvement is so modest? |
Date: | 2019-11-25 15:09:29 |
Message-ID: | 809c295d-9d0b-6a8f-c579-8b0ffe565cdc@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Right now JIT provides about 30% improvement of TPC-H Q1 query:
https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/
I wonder why even at this query, which seems to be ideal use case for
JIT, we get such modest improvement?
I have raised this question several years ago - but that time JIT was
assumed to be in early development stage and performance aspects were
less critical
than required infrastructure changes. But right now JIT seems to be
stable enough and is switch on by default.
Vitesse DB reports 8x speedup on Q1,
ISP-RAS JIT version provides 3x speedup of Q1:
According to this presentation Q1 spends 6% of time in ExecQual and 75%
in ExecAgg.
VOPS provides 10x improvement of Q1.
I have a hypothesis that such difference was caused by the way of
aggregates calculation.
Postgres is using Youngs-Cramer algorithm while both ISPRAS JIT version
and my VOPS are just accumulating results in variable of type double.
I rewrite VOPS to use the same algorithm as Postgres, but VOPS is still
about 10 times faster.
Results of Q1 on scale factor=10 TPC-H data at my desktop with parallel
execution enabled:
no-JIT: 5640 msec
JIT: 4590msec
VOPS: 452 msec
VOPS + Youngs-Cramer algorithm: 610 msec
Below are tops of profiles (functions with more than 1% of time):
JIT:
10.98% postgres postgres [.] float4_accum
8.40% postgres postgres [.] float8_accum
7.51% postgres postgres [.] HeapTupleSatisfiesVisibility
5.92% postgres postgres [.] ExecInterpExpr
5.63% postgres postgres [.] tts_minimal_getsomeattrs
4.35% postgres postgres [.] lookup_hash_entries
3.72% postgres postgres [.] TupleHashTableHash.isra.8
2.93% postgres postgres [.] tuplehash_insert
2.70% postgres postgres [.] heapgettup_pagemode
2.24% postgres postgres [.] check_float8_array
2.23% postgres postgres [.] hash_search_with_hash_value
2.10% postgres postgres [.] ExecScan
1.90% postgres postgres [.] hash_uint32
1.57% postgres postgres [.] tts_minimal_clear
1.53% postgres postgres [.] FunctionCall1Coll
1.47% postgres postgres [.] pg_detoast_datum
1.39% postgres postgres [.] heapgetpage
1.37% postgres postgres [.] TupleHashTableMatch.isra.9
1.35% postgres postgres [.] ExecStoreBufferHeapTuple
1.06% postgres postgres [.] LookupTupleHashEntry
1.06% postgres postgres [.] AggCheckCallContext
no-JIT:
26.82% postgres postgres [.] ExecInterpExpr
15.26% postgres postgres [.] tts_buffer_heap_getsomeattrs
8.27% postgres postgres [.] float4_accum
7.51% postgres postgres [.] float8_accum
5.26% postgres postgres [.] HeapTupleSatisfiesVisibility
2.78% postgres postgres [.] TupleHashTableHash.isra.8
2.63% postgres postgres [.] tts_minimal_getsomeattrs
2.54% postgres postgres [.] lookup_hash_entries
2.05% postgres postgres [.] tuplehash_insert
1.97% postgres postgres [.] heapgettup_pagemode
1.72% postgres postgres [.] hash_search_with_hash_value
1.57% postgres postgres [.] float48mul
1.55% postgres postgres [.] check_float8_array
1.48% postgres postgres [.] ExecScan
1.26% postgres postgres [.] hash_uint32
1.04% postgres postgres [.] tts_minimal_clear
1.00% postgres postgres [.] FunctionCall1Coll
VOPS:
44.25% postgres vops.so [.] vops_avg_state_accumulate
11.76% postgres vops.so [.] vops_float4_avg_accumulate
6.14% postgres postgres [.] ExecInterpExpr
5.89% postgres vops.so [.] vops_float4_sub_lconst
4.89% postgres vops.so [.] vops_float4_mul
4.30% postgres vops.so [.] vops_int4_le_rconst
2.57% postgres vops.so [.] vops_float4_add_lconst
2.31% postgres vops.so [.] vops_count_accumulate
2.24% postgres postgres [.] tts_buffer_heap_getsomeattrs
1.97% postgres postgres [.] heap_page_prune_opt
1.72% postgres postgres [.] HeapTupleSatisfiesVisibility
1.67% postgres postgres [.] AllocSetAlloc
1.47% postgres postgres [.] hash_search_with_hash_value
In theory by elimination of interpretation overhead JIT should provide
performance comparable with vecrtorized executor.
In most programming languages using JIT compiler instead of byte-code
interpreter provides about 10x speed improvement.
Certainly DBMS engine is very different with traditional interpreter and
a lot of time is spent in tuple packing/unpacking (although JIT is also
used here),
in heap traversal,... But it is still unclear to me why if ISPRAS
measurement were correct and we actually spent 75% of Q1 time in
aggregation,
JIT was not able to significantly (times) increase speed on Q1 query?
Experiment with VOPS shows that used aggregation algorithm itself is not
a bottleneck.
Profile also give no answer for this question.
Any ideas?
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2019-11-25 15:24:29 | Re: Why JIT speed improvement is so modest? |
Previous Message | Juan José Santamaría Flecha | 2019-11-25 15:06:46 | Re: logical decoding : exceeded maxAllocatedDescs for .spill files |