From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | James Coleman <jtc331(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Date: | 2019-11-10 17:29:25 |
Message-ID: | 20191110172925.mlkglpwzwego3wtp@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sun, Nov 10, 2019 at 09:15:06AM -0500, James Coleman wrote:
>On Sat, Nov 9, 2019 at 10:44 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>>
>> On Sun, Nov 10, 2019 at 3:25 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>> > So I should have run the earlier attached plan with VERBOSE, but
>> > here's the interesting thing: the parallel hash node's seq scan node
>> > outputs two columns: let's call them (from the redacted plan)
>> > items.system_id and items.account_id. The first (system_id) is both
>> > not null and unique; the second (account_id) definitely has massive
>> > skew. I'm not very up-to-speed on how the hash building works, but I
>> > would have (perhaps naïvely?) assumed that the first column being
>> > unique would make the hash keys very likely not to collide in any
>> > significantly skewed way. Am I missing something here?
>>
>> Hrm. So the compound key is unique then. I was assuming up until now
>> that it had duplicates. The hashes of the individual keys are
>> combined (see ExecHashGetHashValue()), so assuming there is nothing
>> funky about the way citext gets hashed (and it's not jumping out at
>> me), your unique keys should give you uniform hash values and thus
>> partition size, and repartitioning should be an effective way of
>> reducing hash table size. So now it sounds like you have a simple
>> case of underestimation, but now I'm confused about how you got a
>> 344MB hash table with work_mem = 150MB:
>
Why would the compound key matter? The join only does this
Hash Cond: (group_details.group_number = items.system_id)
so the only thing that really matters when it comes to skew is
system_id, i.e. the citext column. The other column is used by
aggregation, but that's irrelevant here, no?
>Looking at the source, citext's hash is a pretty standard call to
>hash_any, so I don't see how that would lead to any oddities (barring
>an intentional hash collision etc., but this is real data).
>
>Do you have any theories about where the underestimation is happening?
>It knows the number of rows reasonably well. The one thing I'm not
>sure about yet is the row width = 16. The account_id is a bigint, so
>we can assume it knows the size properly there. The system_id being
>citext...I think the 8 bytes it has for that is probably a bit low on
>average, but I'm not sure yet by how much (I'm going to run a query to
>find out). Hmm. Query came back, and average length is just shy of
>7.9...so the 16 byte row size is looking pretty good. So I'm at a loss
>of why/what it would be underestimating (does it know about
>uniqueness/non-uniqueness? could that be a factor?)?
>
I think the easiest thing we can do is running a couple of queries
collecting useful stats, like
-- the most frequent system_id values
select system_id, count(*), sum(length(system_id)) from items
gtoup by system_id
order by count(*) desc limit 100;
-- the largest group by system_id
select system_id, count(*), sum(length(system_id)) from items
gtoup by system_id
order by sum(length(system_id)) desc limit 100;
That should tell us if there's something off.
>I also don't know why it seems to regularly fail on the primary, but
>not on the sync, unless we adjust the work_mem up. I've double-checked
>all GUCs and the only differences are things related to replication
>that you'd expect to be different on primary/replica.
>
>I know there are some things that execute differently on replicas, so
>I assume it's something like that, but I don't know what specifically
>would cause this here.
>
Hmm, I haven't realized it fails on primary more often. That's certainly
strange, I don't have a very good idea why that could be happening. Do
the query plans look the same? How does the EXPLAIN ANALYZE look like?
>> Buckets: 4194304 (originally 4194304) Batches: 32768 (originally
>> 4096) Memory Usage: 344448kB
>>
>> And I'm confused about what was different when it wanted the crazy
>> number of batches.
>
>I'm not quite sure how to find out; if you have any ideas, I'd love to
>hear them. The one thing I can think of to try is to slowly increase
>work_mem (since setting it to 500MB reproduced the error on the
>replica) and see if the bucket info starts to trend up.
>
Chances are it actually happened to detect skew and disabled the growth,
hence the table grew above work_mem. We should probably add info about
this (skew, disabling growth) to EXPLAIN ANALYZE verbose mode.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | James Coleman | 2019-11-10 19:24:10 | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Previous Message | James Coleman | 2019-11-10 14:15:06 | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |