| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Milhiser <craig(at)milhiser(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker |
| Date: | 2024-10-16 09:19:14 |
| Message-ID: | CA+hUKGKOoxiGBqv=RA-e8=LXNhdmRCC_rOYbpaqbL16=4wvJ3A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Mon, Oct 14, 2024 at 10:16 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 10/14/24 13:26, Tom Lane wrote:
> > Interesting point. If memory serves (I'm too tired to actually look)
> > the planner considers the statistical most-common-value when
> > estimating whether an unsplittable hash bucket is likely to be too
> > big. It does *not* think about null values ... but it ought to.
Right, there might be something to think about there. There might
also be an opportunity to treat NULL-key tuples specially during
execution since they can't possibly match.
> As I see it, it is just an oversight in the resizing logic: batch 0
> doesn't change the estimated_size value at all - I think because it
> doesn't matter for this batch - it can't be treated as exhausted by
> definition. Because of that, parallel HashJoin doesn't detect extreme
> skew, caused by duplicates in this batch. NULLS is just our luck - they
> correspond to hash value 0 and fall into this batch.
> See the attachment for a sketch of the solution.
Thanks Andrei, I mostly agree with your analysis, but I came up with a
slightly different patch. I think we should check for extreme skew if
old_batch->space_exhausted (the parent partition). Your sketch always
does it for batch 0, which works for these examples but I don't think
it's strictly correct: if batch 0 didn't run out of memory, it might
falsely report extreme skew just because it had (say) 0 or 1 tuples.
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-extreme-skew-detection-in-Parallel-Hash-Join.patch | text/x-patch | 3.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | bjdev.gthb | 2024-10-16 10:10:40 | Re: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results |
| Previous Message | Tender Wang | 2024-10-16 09:18:55 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |