Re: Speed up Hash Join by teaching ExprState about hashing

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speed up Hash Join by teaching ExprState about hashing
Date: 2024-07-11 04:47:09
Message-ID: CAApHDvpm1Kt8XyoQSASJ3b4wi_TnGNdOOPminNcioG5p7xFJyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 13 May 2024 at 21:23, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> In master, if you look at ExecHashGetHashValue() in nodeHash.c, you
> can see that it calls ExecEvalExpr() and then manually calls the hash
> function on the returned value. This process is repeated once for each
> hash key. This is inefficient for a few reasons:
>
> 1) ExecEvalExpr() will only deform tuples up the max varattno that's
> mentioned in the hash key. That means we might have to deform
> attributes in multiple steps, once for each hash key.
> 2) ExecHashGetHashValue() is very branchy and checks if hashStrict[]
> and keep_nulls on every loop. There's also a branch to check which
> hash functions to use.
> 3) foreach isn't exactly the pinnacle of efficiency either.
>
> All of the above points can be improved by making ExprState handle
> hashing. This means we'll deform all attributes that are needed for
> hashing once, rather than incrementally once per key. This also allows
> JIT compilation of hashing ExprStates, which will make things even
> faster.
>
> The attached patch implements this. Here are some performance numbers.

I've been doing a bit more work on this to start to add support for
faster hashing for hashing needs other than Hash Join. In the
attached, I've added support to give the hash value an initial value.
Support for that is required to allow Hash Aggregate to work. If you
look at what's being done now inside BuildTupleHashTableExt(), you'll
see that "hash_iv" exists there to allow an initial hash value. This
seems to be getting used to allow some variation in hash values
calculated inside parallel workers, per hashtable->hash_iv =
murmurhash32(ParallelWorkerNumber). One of my aims for this patch is
to always produce the same hash value before and after the patch, so
I've gone and implemented the equivalent functionality which can be
enabled or disabled as required depending on the use case.

I've not added support for Hash Aggregate quite yet. I did look at
doing that, but it seems to need quite a bit of refactoring to do it
nicely. The problem is that BuildTupleHashTableExt() receives
keyColIdx with the attribute numbers to hash. The new
ExecBuildHash32Expr() function requires a List of Exprs. It looks
like the keyColIdx array comes directly from the planner which is many
layers up and would need lots of code churn of function signatures to
change. While I could form Vars using the keyColIdx array to populate
the required List of Exprs, I so far can't decide where exactly that
should happen. I think probably the planner should form the Expr List.
It seems a bit strange to be doing makeVar() in the executor.

I currently think that it's fine to speed up Hash Join as phase one
for this patch. I can work more on improving hash value generation in
other locations later.

I'd be happy if someone else were to give this patch a review and
test. One part I struggled a bit with was finding a way to cast the
Size variable down to uint32 in LLVM. I tried to add a new supported
type for uint32 but just couldn't get it to work. Instead, I did:

v_tmp1 = LLVMBuildAnd(b, v_tmp1,
l_sizet_const(0xffffffff), "");

which works and I imagine compiled to the same code as a cast. It
just looks a bit strange.

David

Attachment Content-Type Size
v2-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patch application/octet-stream 36.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-11 04:58:19 Re: relfilenode statistics
Previous Message vignesh C 2024-07-11 04:46:21 Re: Improving the latch handling between logical replication launcher and worker processes.