From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2020-11-08 14:52:43 |
Message-ID: | CAKU4AWpoQ_c9Rj2sTVPMGxAb+CAAe4z5ejcrL-89QirCxk=k+A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> >
> > So far benchmarking shows there's still a regression from the v8
> > version of the patch. This is using count(*). An earlier test [1] did
> > show speedups when we needed to deform tuples returned by the nested
> > loop node. I've not yet repeated that test again. I was disappointed
> > to see v9 slower than v8 after having spent about 3 days rewriting the
> > patch
>
> I did some further tests this time with some tuple deforming. Again,
> it does seem that v9 is slower than v8.
>
I run your test case on v8 and v9, I can produce a stable difference
between them.
v8:
statement latencies in milliseconds:
1603.611 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;
v9:
statement latencies in milliseconds:
1772.287 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;
then I did a perf on the 2 version, Is it possible that you
called tts_minimal_clear twice in
the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple
called tts_minimal_clear
on the same slot.
With the following changes:
diff --git a/src/backend/executor/execMRUTupleCache.c
b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache,
MRUCacheKey *key)
TupleTableSlot *tslot = mrucache->tableslot;
int numKeys = mrucache->nkeys;
- ExecClearTuple(pslot);
-
if (key == NULL)
{
+ ExecClearTuple(pslot);
/* Set the probeslot's values based on the current
parameter values */
for (int i = 0; i < numKeys; i++)
pslot->tts_values[i] =
ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
{
mrucache->state =
MRUCACHE_FETCH_NEXT_TUPLE;
-
ExecClearTuple(mrucache->cachefoundslot);
+ //
ExecClearTuple(mrucache->cachefoundslot);
slot =
mrucache->cachefoundslot;
ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
return NULL;
}
- ExecClearTuple(mrucache->cachefoundslot);
+ // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;
ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
v9 has the following result:
1608.048 select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;
> Graphs attached
>
> Looking at profiles, I don't really see any obvious reason as to why
> this is. I'm very much inclined to just pursue the v8 patch (separate
> Result Cache node) and just drop the v9 idea altogether.
>
> David
>
--
Best Regards
Andy Fan
From | Date | Subject | |
---|---|---|---|
Next Message | Marina Polyakova | 2020-11-08 14:59:19 | Re: pgbench stopped supporting large number of client connections on Windows |
Previous Message | Andy Fan | 2020-11-08 13:26:51 | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |