From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: WIP: Faster Expression Processing v4 |
Date: | 2017-03-25 22:22:02 |
Message-ID: | 20170325222202.uz6evsjqrtwjnql6@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-03-25 12:22:15 -0400, Tom Lane wrote:
> More random musing ... have you considered making the jump-target fields
> in expressions be relative rather than absolute indexes? That is,
> EEO_JUMP would look like
>
> op += (stepno); \
> EEO_DISPATCH(); \
>
> instead of
>
> op = &state->steps[stepno]; \
> EEO_DISPATCH(); \
>
> I have not carried out a full patch to make this work, but just making
> that one change and examining the generated assembly code looks promising.
> Instead of this
>
> movslq 40(%r14), %r8
> salq $6, %r8
> addq 24(%rbx), %r8
> movq %r8, %r14
> jmp *(%r8)
>
> we get this
>
> movslq 40(%r14), %rax
> salq $6, %rax
> addq %rax, %r14
> jmp *(%r14)
That seems like a good idea. I've not done this in the committed
version (and I don't think we necessarily need to this before the
release), but fo rthe future it seems like a good plan. It makes sense
that it's faster - there's no need to reference state->steps.
> which certainly looks like it ought to be faster. Also, the real reason
> I got interested in this at all is that with relative jumps, groups of
> steps would be position-independent within the steps array, which would
> enable some compile-time tricks that seem impractical with the current
> definition.
Indeed.
> BTW, now that I've spent a bit of time looking at the generated assembly
> code, I'm kind of disinclined to believe any arguments about how we have
> better control over branch prediction with the jump-threading
> implementation.
I measured the performance difference between using it and not using it,
and it came out a pretty clear plus. On gcc 6.3, gcc master snapshot,
and clang-3.9. It's not just that more jumps are duplicated, it's also
that the switch() always adds a boundary check.
> At least with current gcc (6.3.1 on Fedora 25) at -O2,
> what I see is multiple places jumping to the same indirect jump
> instruction :-(. It's not a total disaster: as best I can tell, all the
> uses of EEO_JUMP remain distinct. But gcc has chosen to implement about
> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
> instructions that increment the "op" register and then do an indirect
> jump :-(.
Yea, I see some of that too - "usually" when there's more than just the
jump in common. I think there's some gcc variables that influence this
(min-crossjump-insns (5), max-goto-duplication-insns (8)). Might be
worthwhile experimenting with setting them locally via a pragma or such.
I think Aants wanted to experiment with that, too.
Then there's also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
which causes some forms of computed goto (not ours I think) to be
deoptimized in gcc.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-03-25 22:22:21 | pgsql: git rm execQual.c |
Previous Message | Tom Lane | 2017-03-25 22:17:55 | Re: WIP: Faster Expression Processing v4 |