From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Milhiser <craig(at)milhiser(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker |
Date: | 2024-10-16 09:19:14 |
Message-ID: | CA+hUKGKOoxiGBqv=RA-e8=LXNhdmRCC_rOYbpaqbL16=4wvJ3A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Oct 14, 2024 at 10:16 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 10/14/24 13:26, Tom Lane wrote:
> > Interesting point. If memory serves (I'm too tired to actually look)
> > the planner considers the statistical most-common-value when
> > estimating whether an unsplittable hash bucket is likely to be too
> > big. It does *not* think about null values ... but it ought to.
Right, there might be something to think about there. There might
also be an opportunity to treat NULL-key tuples specially during
execution since they can't possibly match.
> As I see it, it is just an oversight in the resizing logic: batch 0
> doesn't change the estimated_size value at all - I think because it
> doesn't matter for this batch - it can't be treated as exhausted by
> definition. Because of that, parallel HashJoin doesn't detect extreme
> skew, caused by duplicates in this batch. NULLS is just our luck - they
> correspond to hash value 0 and fall into this batch.
> See the attachment for a sketch of the solution.
Thanks Andrei, I mostly agree with your analysis, but I came up with a
slightly different patch. I think we should check for extreme skew if
old_batch->space_exhausted (the parent partition). Your sketch always
does it for batch 0, which works for these examples but I don't think
it's strictly correct: if batch 0 didn't run out of memory, it might
falsely report extreme skew just because it had (say) 0 or 1 tuples.
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-extreme-skew-detection-in-Parallel-Hash-Join.patch | text/x-patch | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | bjdev.gthb | 2024-10-16 10:10:40 | Re: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results |
Previous Message | Tender Wang | 2024-10-16 09:18:55 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |