From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoiding hash join batch explosions with extreme skew and weird stats |
Date: | 2019-12-30 03:34:02 |
Message-ID: | CAAKRu_YsWm7gc_b2nBGWFPE6wuhdOLfc1LBZ786DUzaCPUDXCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
So, I finally have a prototype to share of parallel hashloop fallback.
See the commit message for a full description of the functionality of the
patch.
This patch does contain refactoring of nodeHashjoin.
I have split the Parallel HashJoin and Serial HashJoin state machines
up, as they were diverging in my patch to a point that made for a
really cluttered ExecHashJoinImpl() (ExecHashJoinImpl() is now gone).
The reason I didn't do this refactoring in one patch and then put the
adaptive hashjoin code on top of it is that I might like to make
Parallel HashJoin and Serial HashJoin different nodes.
I think that has been discussed elsewhere and was looking to
understand the rationale for keeping them in the same node.
The patch is a rough prototype. Below are some of the high-level
pieces of work that I plan to do next. (there are many TODOs in the
code as well).
Some of the major outstanding work:
- correctness:
- haven't tried it with anti-joins and don't think it works
- number of batches is not deterministic from run-to-run
- performance:
- join_hash.sql is *much* slower.
While there are loads of performance fixes needed in the patch,
the basic criteria for "falling back" is likely the culprit here.
- There are many bottlenecks (there are several places where a
barrier could be moved to somewhere less hot, an atomic used
instead of a lock, or a method of coordination could be used to
allow workers to do backend-local accounting and aggregate it)
- need to make sure it does not create outer match status files when
it shouldn't (inner joins, for example)
- testing:
- many unexercised cases
- add number of chunks to EXPLAIN (for users and for testing)
- refactoring:
- The match status bitmap should have its own API or, at least,
manipulation of it should be done in a centralized set of
functions
- Rename "chunk" (as in chunks of inner side) to something that is
not already used in the context of memory chunks and, more
importantly, SharedTuplestoreChunk
- Make references to "hashloop fallback" and "adaptive hashjoin"
more consistent
- Rename adaptiveHashjoin.h/.c files and change what is in the files
which are separate from nodeHashjoin.h/.c (depending on outcome of
"new node")
- The state machines are big and unwieldy now, so, there is probably
some larger restructuring that could be done
- Should probably use the ParallelHashJoinBatchAccessor to access
the ParallelHashJoinBatch everywhere (realized this recently)
--
Melanie Plageman
Attachment | Content-Type | Size |
---|---|---|
v3-0001-hashloop-fallback.patch | application/octet-stream | 147.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2019-12-30 05:10:39 | Re: [HACKERS] Block level parallel vacuum |
Previous Message | Amit Kapila | 2019-12-30 02:55:28 | Re: [HACKERS] Block level parallel vacuum |