Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2019-06-06 23:31:46
Message-ID: CAAKRu_bYcBaL=+XTqLKebD8Hq+Wz=4v7jrsVEHqwbOstrEVHww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 4, 2019 at 12:08 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Jun 4, 2019 at 2:47 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Oops! You are totally right.
> > I will amend the idea:
> > For each chunk on the inner side, loop through both the original batch
> > file and the unmatched outer tuples file created for the last chunk.
> > Emit any matches and write out any unmatched tuples to a new unmatched
> > outer tuples file.
> >
> > I think, in the worst case, if no tuples from the outer have a match,
> > you end up writing out all of the outer tuples for each chunk on the
> > inner side. However, using the match bit in the tuple header solution
> > would require this much writing.
> > Probably the bigger problem is that in this worst case you would also
> > need to read double the number of outer tuples for each inner chunk.
> >
> > However, in the best case it seems like it would be better than the
> > match bit/write everything from the outer side out solution.
>
> I guess so, but the downside of needing to read twice as many outer
> tuples for each inner chunk seems pretty large. It would be a lot
> nicer if we could find a way to store the matched-bits someplace other
> than where we are storing the tuples, what Thomas called a
> bits-in-order scheme, because then the amount of additional read and
> write I/O would be tiny -- one bit per tuple doesn't add up very fast.
>
> In the scheme you propose here, I think that after you read the
> original outer tuples for each chunk and the unmatched outer tuples
> for each chunk, you'll have to match up the unmatched tuples to the
> original tuples, probably by using memcmp() or something. Otherwise,
> when a new match occurs, you won't know which tuple should now not be
> emitted into the new unmatched outer tuples file that you're going to
> produce. So I think what's going to happen is that you'll read the
> original batch file, then read the unmatched tuples file and use that
> to set or not set a bit on each tuple in memory, then do the real work
> setting more bits, then write out a new unmatched-tuples file with the
> tuples that still don't have the bit set. So your unmatched tuple
> file is basically a list of tuple identifiers in the least compact
> form imaginable: the tuple is identified by the entire tuple contents.
> That doesn't seem very appealing, although I expect that it would
> still win for some queries.
>
>
I'm not sure I understand why you would need to compare the original
tuples to the unmatched tuples file.

This is the example I used to try and reason through it.

let's say you have a batch (you are joining two single column tables)
and your outer side is:
5,7,9,11,10,11
and your inner is:
7,10,7,12,5,9
and for the inner, let's say that only two values can fit in memory,
so it is split into 3 chunks:
7,10 | 7,12 | 5,9
The first time you iterate through the outer side (joining it to the
first chunk), you emit as matched
7,7
10,10
and write to unmatched tuples file
5
9
11
11
The second time you iterate through the outer side (joining it to the
second chunk) you emit as matched
7,7
Then, you iterate again through the outer side a third time to join it
to the unmatched tuples in the unmatched tuples file (from the first
chunk) and write the following to a new unmatched tuples file:
5
9
11
11
The fourth time you iterate through the outer side (joining it to the
third chunk), you emit as matched
5,5
9,9
Then you iterate a fifth time through the outer side to join it to the
unmatched tuples in the unmatched tuples file (from the second chunk)
and write the following to a new unmatched tuples file:
11
11
Now that all chunks from the inner side have been processed, you can
loop through the final unmatched tuples file, NULL-extend, and emit
them

Wouldn't that work?

--
Melanie Plageman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2019-06-06 23:33:31 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Alvaro Herrera 2019-06-06 23:02:42 Re: behaviour change - default_tablesapce + partition table