From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org> |
Subject: | Re: pgsql: Add some regression tests that exercise hash join code. |
Date: | 2017-12-04 20:21:42 |
Message-ID: | 20171204202142.spwkwc545vvxmed7@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
On 2017-12-04 12:28:56 +1300, Thomas Munro wrote:
> From 43eeea0bc35204440d262224b56efca958b33428 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
> Date: Mon, 4 Dec 2017 11:52:11 +1300
> Subject: [PATCH] Fix EXPLAIN ANALYZE of hash join when the leader doesn't
> execute.
>
> If a hash join appears in a parallel query, there may be no hash table
> available for explain.c to inspect even though a hash table may have been
> built in other processes. This could happen either because
> parallel_leader_participation was set to off or because the leader happened to
> hit the end of the outer relation immediately (even though the complete
> relation is not empty) and decided not to build the hash table.
>
> Commit bf11e7ee introduced a way for workers to exchange instrumentation via
> the DSM segment for Sort nodes even though they are not parallel-aware. This
> commit does the same for Hash nodes, so that explain.c has a way to find
> instrumentation data from an arbitrary participant that actually built the
> hash table.
Makes sense. Seems ok to only fix this inv11, even if present earlier.
> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 447f69d044e..1ffe635d66c 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2379,34 +2379,62 @@ show_sort_info(SortState *sortstate, ExplainState *es)
> static void
> show_hash_info(HashState *hashstate, ExplainState *es)
> {
> - HashJoinTable hashtable;
> + HashInstrumentation *hinstrument = NULL;
>
> - hashtable = hashstate->hashtable;
> + /*
> + * In a parallel query, the leader process may or may not have run the
> + * hash join, and even if it did it may not have built a hash table due to
> + * timing (if it started late it might have seen no tuples in the outer
> + * relation and skipped building the hash table). Therefore we have to be
> + * prepared to get instrumentation data from a worker if there is no hash
> + * table.
> + */
> + if (hashstate->hashtable)
> + {
> + hinstrument = (HashInstrumentation *)
> + palloc(sizeof(HashInstrumentation));
> + ExecHashGetInstrumentation(hinstrument, hashstate->hashtable);
> + }
> + else if (hashstate->shared_info)
> + {
> + SharedHashInfo *shared_info = hashstate->shared_info;
> + int i;
> +
> + /* Find the first worker that built a hash table. */
> + for (i = 0; i < shared_info->num_workers; ++i)
> + {
> + if (shared_info->hinstrument[i].nbatch > 0)
> + {
> + hinstrument = &shared_info->hinstrument[i];
> + break;
> + }
> + }
> + }
I can't think of a way that, before the parallel HJ patch, results in a
difference between the size of the hashtable between workers. Short of
some participant not having a hashtable at all, that is. Wwe also don't
currently display lookup information etc. So just more or less randomly
choosing one worker's seems ok.
> + case T_HashState:
> + /* even when not parallel-aware */
"..., for stats" or such maybe?
> + ExecHashEstimate((HashState *) planstate, e->pcxt);
> + break;
> case T_SortState:
> /* even when not parallel-aware */
I'd just update that comment as well.
> #endif /* NODEHASH_H */
> diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
> index e05bc04f525..6c322e57c00 100644
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -1980,6 +1980,29 @@ typedef struct GatherMergeState
> struct binaryheap *gm_heap; /* binary heap of slot indices */
> } GatherMergeState;
>
> +/* ----------------
> + * Values displayed by EXPLAIN ANALYZE
> + * ----------------
> + */
> +typedef struct HashInstrumentation
> +{
> + int nbuckets;
> + int nbuckets_original;
> + int nbatch;
> + int nbatch_original;
> + size_t space_peak;
> +} HashInstrumentation;
Maybe I'm being pedantic here, but I think it'd not hurt to explain what
these mean. It's obviously pretty clear what nbuckets/batch mean, but
the difference between original and not is far less clea.r
Looks reasonable these nitpicks aside.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-12-04 20:26:08 | pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i |
Previous Message | Tom Lane | 2017-12-04 16:51:50 | pgsql: Support boolean columns in functional-dependency statistics. |