From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: AllocSetEstimateChunkSpace() |
Date: | 2020-03-25 19:42:35 |
Message-ID: | 20200325194235.3fciqzxvbyg7q2c3@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-03-24 18:12:03 -0700, Jeff Davis wrote:
> Attached is a small patch that introduces a simple function,
> AllocSetEstimateChunkSpace(), and uses it to improve the estimate for
> the size of a hash entry for hash aggregation.
>
> Getting reasonable estimates for the hash entry size (including the
> TupleHashEntryData, the group key tuple, the per-group state, and by-
> ref transition values) is important for multiple reasons:
>
> * It helps the planner know when hash aggregation is likely to spill,
> and how to cost it.
>
> * It helps hash aggregation regulate itself by setting a group limit
> (separate from the memory limit) to account for growing by-ref
> transition values.
>
> * It helps choose a good initial size for the hash table. Too large of
> a hash table will crowd out space that could be used for the group keys
> or the per-group state.
>
> Each group requires up to three palloc chunks: one for the group key
> tuple, one for the per-group states, and one for a by-ref transition
> value. Each chunk can have a lot of overhead: in addition to the chunk
> header (16 bytes overhead), it also rounds the value up to a power of
> two (~25% overhead). So, it's important to account for this chunk
> overhead.
As mention on im/call: I think this is mainly an argument for combining
the group tuple & per-group state allocations. I'm kind of fine with
afterwards taking the allocator overhead into account.
I still don't buy that its useful to estimate the by-ref transition
value overhead. We just don't have anything even have close enough to a
meaningful value to base this on. Even if we want to consider the
initial transition value or something, we'd be better off initially
over-estimating the size of the transition state by a lot more than 25%
(I am thinking more like 4x or so, with a minumum of 128 bytes or
so). Since this is about the initial size of the hashtable, we're better
off with a too small table, imo. A "too large" table is more likely to
end up needing to spill when filled to only a small degree.
I am kind of wondering if there's actually much point in trying to be
accurate here at all. Especially when doing this from the
planner. Since, for a large fraction of aggregates, we're going to very
roughly guess at transition space anyway, it seems like a bunch of
"overhead factors" could turn out to be better than trying to be
accurate on some parts, while still widely guessing at transition space.
But I'm not sure.
> I considered making it a method of a memory context, but the planner
> will call this function before the hash agg memory context is created.
> It seems to make more sense for this to just be an AllocSet-specific
> function for now.
-1 to this approach. I think it's architecturally the wrong direction to
add more direct calls to functions within specific contexts.
On 2020-03-25 11:46:31 -0700, Jeff Davis wrote:
> On Tue, 2020-03-24 at 18:12 -0700, Jeff Davis wrote:
> > I considered making it a method of a memory context, but the planner
> > will call this function before the hash agg memory context is
> > created.
>
> I implemented this approach also; attached.
>
> It works a little better by having access to the memory context, so it
> knows set->allocChunkLimit. It also allows it to work with the slab
> allocator, which needs the memory context to return a useful number.
> However, this introduces more code and awkwardly needs to use
> CurrentMemoryContext when called from the planner (because the actual
> memory context we're try to estimate for doesn't exist yet).
Yea, the "context needs to exist" part sucks. I really don't want to add
calls directly into AllocSet from more places though. And just ignoring
the parameters to the context seems wrong too.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-03-25 19:44:28 | Re: plan cache overhead on plpgsql expression |
Previous Message | Justin Pryzby | 2020-03-25 19:39:23 | Re: Allow continuations in "pg_hba.conf" files |