Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2019-06-04 19:08:57
Message-ID: CAAKRu_ZNRFuHGUPkzr7Nmh_vdzWLwQb=sc80SFfkP-D4sbuoKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 4, 2019 at 6:05 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, May 19, 2019 at 7:07 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> wrote:
> > Unfortunately that bits-in-order scheme doesn't work for parallel
> > hash, where the SharedTuplestore tuples seen by each worker are
> > non-deterministic. So perhaps in that case we could use the
> > HEAP_TUPLE_HAS_MATCH bit in the outer tuple header itself, and write
> > the whole outer batch back out each time through the loop. That'd
> > keep the tuples and match bits together, but it seems like a lot of
> > IO...
>
> So, I think the case you're worried about here is something like:
>
> Gather
> -> Parallel Hash Left Join
> -> Parallel Seq Scan on a
> -> Parallel Hash
> -> Parallel Seq Scan on b
>
> If I understand ExecParallelHashJoinPartitionOuter correctly, we're
> going to hash all of a and put it into a set of batch files before we
> even get started, so it's possible to identify precisely which tuple
> we're talking about by just giving the batch number and the position
> of the tuple within that batch. So while it's true that the
> individual workers can't use the number of tuples they've read to know
> where they are in the SharedTuplestore, maybe the SharedTuplestore
> could just tell them. Then they could maintain a paged bitmap of the
> tuples that they've matched to something, indexed by
> position-within-the-tuplestore, and those bitmaps could be OR'd
> together at the end.
>
> Crazy idea, or...?
>
>
That idea does sound like it could work. Basically a worker is given a
tuple and a bit index (process this tuple and if it matches go flip
the bit at position 30) in its own bitmap, right?

I need to spend some time understanding how SharedTupleStore works and
how workers get tuples, so what I'm saying might not make sense.

One question I have is, how would the OR'd together bitmap be
propagated to workers after the first chunk? That is, when there are
no tuples left in the outer bunch, for a given inner chunk, would you
load the bitmaps from each worker into memory, OR them together, and
then write the updated bitmap back out so that each worker starts with
the updated bitmap?

--
Melanie Plageman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-06-04 19:15:47 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Robert Haas 2019-06-04 19:08:24 Re: Avoiding hash join batch explosions with extreme skew and weird stats