potential deadlock in parallel hashjoin grow-buckets-barrier and blocking nodes?

From: Luc Vlaming <luc(at)swarm64(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: potential deadlock in parallel hashjoin grow-buckets-barrier and blocking nodes?
Date: 2021-04-13 13:34:07
Message-ID: 3ddf4eab-460d-3cb7-9577-8a4e8f30954d@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Whilst trying to debug a deadlock in some tpc-ds query I noticed
something that could cause problems in the hashjoin implementation and
cause potentially deadlocks (if my analysis is right).

Whilst building the inner hash table, the whole time the grow barriers
are attached (the PHJ_BUILD_HASHING_INNER phase).
Usually this is not a problem, however if one of the nodes blocks
somewhere further down in the plan whilst trying to fill the inner hash
table whilst the others are trying to e.g. extend the number of buckets
using ExecParallelHashIncreaseNumBuckets, they would all wait until the
blocked process comes back to the hashjoin node and also joins the effort.
Wouldn't this give potential deadlock situations? Or why would a worker
that is hashing the inner be required to come back and join the effort
in growing the hashbuckets?

With very skewed workloads (one node providing all data) I was at least
able to have e.g. 3 out of 4 workers waiting in
ExecParallelHashIncreaseNumBuckets, whilst one was in the
execprocnode(outernode). I tried to detatch and reattach the barrier but
this proved to be a bad idea :)

Regards,
Luc

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2021-04-13 13:40:58 Re: [PATCH] Identify LWLocks in tracepoints
Previous Message Alexander Pyhalov 2021-04-13 13:28:40 CTE push down