From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: WIP: Faster Expression Processing v4 |
Date: | 2017-03-24 18:27:42 |
Message-ID: | 20170324182742.3r4qndpugtx6ga3t@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
> Another modest proposal:
>
> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
> trigger initial tupleslot de-forming. Certainly we want to have a single
> slot_getsomeattrs call per source slot, but as-is, we need a separate
> traversal over the expression tree just to precompute the max attribute
> number needed. That can't be good for expression compile speed, and
> it introduces nontrivial bug risks because the code that does that
> is completely decoupled from the code that emits the EEOP_VAR opcodes
> (which are what's really relying on the de-forming to have happened).
Hm. We had the separate traversal for projections for a long while, and
I don't think there've been a a lot of changes to the extraction of the
last attribute number. I'm very doubtful that the cost of traversing
the expression twice is meaningful in comparison to the other costs.
> What do you think about a design like this:
>
> * Drop the FETCHSOME opcodes.
>
> * Add fields to struct ExprState that will hold the maximum inner,
> outer, and scan attribute numbers needed.
>
> * ExecInitExpr initializes those fields to zero, and then during
> ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.
>
> state->last_inner_attno = Max(state->last_inner_attno,
> variable->varattno);
>
> * ExecInitExprSlots, get_last_attnums_walker, etc all go away.
>
> * In the startup segment of ExecInterpExpr, add
>
> if (state->last_inner_attno > 0)
> slot_getsomeattrs(innerslot, state->last_inner_attno);
> if (state->last_outer_attno > 0)
> slot_getsomeattrs(outerslot, state->last_outer_attno);
> if (state->last_scan_attno > 0)
> slot_getsomeattrs(scanslot, state->last_scan_attno);
>
> This would be a little different from the existing code as far as runtime
> branch-prediction behavior goes, but it's not apparent to me that it'd be
> any worse.
I'd be suprised if it weren't.
I'm not super strongly against this setup, but I fail to see the benefit
of whacking this around. I've benchmarked the previous/current setup
fairly extensively, and I'd rather not redo that. In contrast to the
other changes you've talked about, this definitely is in the hot-path...
> Also, for JIT purposes it'd still be entirely possible to compile the
> slot_getsomeattrs calls in-line; you'd just be looking to a different
> part of the ExprState struct to find out what to do.
Yea, we could do that.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Banck | 2017-03-24 18:32:25 | Re: Create replication slot in pg_basebackup if requested and not yet present |
Previous Message | Thomas Munro | 2017-03-24 18:27:40 | Re: pg_serial early wraparound |