From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: WIP: Faster Expression Processing v4 |
Date: | 2017-03-15 20:57:32 |
Message-ID: | 20170315205732.bwb2wh5o5ix2vv4b@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
> >> Color me dubious. Which specific other places have you got in mind, and
> >> do they have expression trees at hand that would tell them which columns
> >> they really need to pull out?
>
> > I was thinking of execGrouping.c's execTuplesMatch(),
> > TupleHashTableHash() (and unequal, but doubt that matters
> > performancewise). There's also nodeHash.c's ExecHashGetValue(), but I
> > think that'd possibly better fixed differently.
>
> The execGrouping.c functions don't have access to an expression tree
> instructing them which columns to pull out of the tuple, so I fail to see
> how get_last_attnums() would be of any use to them.
I presume most of the callers do. We'd have to change the API somewhat,
unless we just have a small loop in execTuplesMatch() determining the
biggest column index (which might be worthwhile / acceptable).
TupleHashTableHash() should be able to have that pre-computed in
BuildTupleHashTable(). Might be more viable to go that way.
> As for ExecHashGetHashValue, it's most likely going to be working from
> virtual tuples passed up to the join, which won't benefit from
> predetermination of the last column to be accessed. The
> tuple-deconstruction would have happened while projecting in the scan
> node below.
I think the physical tuple stuff commonly thwarts that argument? On
master for tpch's Q5 you can e.g. see the following profile (master):
+ 29.38% postgres postgres [.] ExecScanHashBucket
+ 16.72% postgres postgres [.] slot_getattr
+ 5.51% postgres postgres [.] heap_getnext
- 5.50% postgres postgres [.] slot_deform_tuple
- 98.07% slot_deform_tuple
- 85.98% slot_getattr
- 96.59% ExecHashGetHashValue
- ExecHashJoin
- ExecProcNode
+ 85.12% ExecHashJoin
+ 14.88% MultiExecHash
+ 3.41% ExecMakeFunctionResultNoSets
+ 14.02% slot_getsomeattrs
+ 1.58% ExecEvalScalarVarFast
I.e. nearly all calls for slot_deform_tuple are from slot_getattrs in
ExecHashGetHashValue(). And nearly all the time in slot_getattr is
spent on code only executed for actual tuples:
│ if (tuple == NULL) /* internal error */
0.18 │ test %rax,%rax
│ ↓ je 223
│ *
│ * (We have to check this separately because of various inheritance and
│ * table-alteration scenarios: the tuple could be either longer or shorter
│ * than the tupdesc.)
│ */
│ tup = tuple->t_data;
0.47 │ mov 0x10(%rax),%rsi
│ if (attnum > HeapTupleHeaderGetNatts(tup))
75.42 │ movzwl 0x12(%rsi),%eax
0.70 │ and $0x7ff,%eax
0.47 │ cmp %eax,%ebx
│ ↓ jg e8
- Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Kuntal Ghosh | 2017-03-15 21:04:01 | Re: parallelize queries containing initplans |
Previous Message | David Steele | 2017-03-15 20:46:05 | Re: 2017-03 Commitfest Midterm |