Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2019-06-04 19:08:24
Message-ID: CA+TgmobX_PpKSNisuuGefiSBcPr6wiLmbEWUdP3uQz1tsYYpdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 4, 2019 at 2:47 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Oops! You are totally right.
> I will amend the idea:
> For each chunk on the inner side, loop through both the original batch
> file and the unmatched outer tuples file created for the last chunk.
> Emit any matches and write out any unmatched tuples to a new unmatched
> outer tuples file.
>
> I think, in the worst case, if no tuples from the outer have a match,
> you end up writing out all of the outer tuples for each chunk on the
> inner side. However, using the match bit in the tuple header solution
> would require this much writing.
> Probably the bigger problem is that in this worst case you would also
> need to read double the number of outer tuples for each inner chunk.
>
> However, in the best case it seems like it would be better than the
> match bit/write everything from the outer side out solution.

I guess so, but the downside of needing to read twice as many outer
tuples for each inner chunk seems pretty large. It would be a lot
nicer if we could find a way to store the matched-bits someplace other
than where we are storing the tuples, what Thomas called a
bits-in-order scheme, because then the amount of additional read and
write I/O would be tiny -- one bit per tuple doesn't add up very fast.

In the scheme you propose here, I think that after you read the
original outer tuples for each chunk and the unmatched outer tuples
for each chunk, you'll have to match up the unmatched tuples to the
original tuples, probably by using memcmp() or something. Otherwise,
when a new match occurs, you won't know which tuple should now not be
emitted into the new unmatched outer tuples file that you're going to
produce. So I think what's going to happen is that you'll read the
original batch file, then read the unmatched tuples file and use that
to set or not set a bit on each tuple in memory, then do the real work
setting more bits, then write out a new unmatched-tuples file with the
tuples that still don't have the bit set. So your unmatched tuple
file is basically a list of tuple identifiers in the least compact
form imaginable: the tuple is identified by the entire tuple contents.
That doesn't seem very appealing, although I expect that it would
still win for some queries.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2019-06-04 19:08:57 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Melanie Plageman 2019-06-04 18:47:46 Re: Avoiding hash join batch explosions with extreme skew and weird stats