Re: Wrong results from Parallel Hash Full Join

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Wrong results from Parallel Hash Full Join
Date: 2023-04-20 00:43:15
Message-ID: CAAKRu_bdwDN_aHVctHcc9VoDP9av7LUMeuLbch1fHD2ESouw1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 19, 2023 at 3:20 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote:
> > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> > > Ultimately this is probably fine. If we wanted to modify one of the
> > > existing tests to cover the multi-batch case, changing the select
> > > count(*) to a select * would do the trick. I imagine we wouldn't want
> to
> > > do this because of the excessive output this would produce. I wondered
> > > if there was a pattern in the tests for getting around this.
> >
> > You could use explain (ANALYZE). But the output is machine-dependant in
> > various ways (which is why the tests use "explain analyze so rarely).
>
> I think with sufficient options it's not machine specific. We have a bunch
> of
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) ..
> in our tests.
>

Cool. Yea, so ultimately these options are almost enough but memory
usage changes from execution to execution. There are some tests which do
regexp_replace() on the memory usage part of the EXPLAIN ANALYZE output
to allow us to still compare the plans. However, I figured if I was
already going to go to the trouble of using regexp_replace(), I might as
well write a function that returns the "Actual Rows" field from the
EXPLAIN ANALYZE output.

The attached patch does that. I admittedly mostly copy-pasted the
plpgsql function from similar examples in other tests, and I suspect it
may be overkill and also poorly written.

The nice thing about this approach is that we can modify some of the
existing tests in join_hash.sql to use this function and cover the code
to reset the matchbit for serial hashjoin, single batch parallel
hashjoin, and all batches of parallel multi-batch hashjoin without any
additional queries. (I'll leave testing match bit resetting with the
skew hashtable and match bit resetting in case of a rescan for another
day.)

I was able to delete the tests added in 558c9d75fe, as they became
redundant.

I wonder if any other tests are in need of an EXPLAIN (ANALYZE,
MEMORY_USAGE OFF) option? Perhaps it is quite unusual to only require a
deterministic field like 'Actual Rows'. If we had that option we could
also remove the extra EXPLAIN invocations before the actual query
executions.

- Melanie

Attachment Content-Type Size
v1-0001-Test-multi-batch-PHJ-match-bit-initialization.patch text/x-patch 12.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2023-04-20 00:47:07 Re: Wrong results from Parallel Hash Full Join
Previous Message Michael Paquier 2023-04-20 00:42:08 Re: Remove io prefix from pg_stat_io columns