Re: Wrong results from Parallel Hash Full Join

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Wrong results from Parallel Hash Full Join
Date: 2023-04-20 15:50:45
Message-ID: ZEFfVUeNzQaeHy7R@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 19, 2023 at 08:47:07PM -0400, Melanie Plageman wrote:
> On Wed, Apr 19, 2023 at 8:41 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > On Wed, Apr 19, 2023 at 12:20:51PM -0700, Andres Freund wrote:
> > > On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote:
> > > > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> > > > > Ultimately this is probably fine. If we wanted to modify one of the
> > > > > existing tests to cover the multi-batch case, changing the select
> > > > > count(*) to a select * would do the trick. I imagine we wouldn't want to
> > > > > do this because of the excessive output this would produce. I wondered
> > > > > if there was a pattern in the tests for getting around this.
> > > >
> > > > You could use explain (ANALYZE). But the output is machine-dependant in
> > > > various ways (which is why the tests use "explain analyze so rarely).
> > >
> > > I think with sufficient options it's not machine specific.
> >
> > It *can* be machine specific depending on the node type..
> >
> > In particular, for parallel workers, it shows "Workers Launched: ..",
> > which can vary even across executions on the same machine. And don't
> > forget about "loops=".
> >
> > Plus:
> > src/backend/commands/explain.c: "Buckets: %d Batches: %d Memory Usage: %ldkB\n",
> >
> > > We have a bunch of
> > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) ..
> > > in our tests.
> >
> > There's 81 uses of "timing off", out of a total of ~1600 explains. Most
> > of them are in partition_prune.sql. explain analyze is barely used.
> >
> > I sent a patch to elide the machine-specific parts, which would make it
> > easier to use. But there was no interest.
>
> While I don't know about other use cases, I would have used that here.
> Do you still have that patch laying around? I'd be interested to at
> least review it.

https://commitfest.postgresql.org/41/3409/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2023-04-20 16:42:40 Re: Memory leak from ExecutorState context?
Previous Message Melanie Plageman 2023-04-20 15:49:49 Re: Wrong results from Parallel Hash Full Join