From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Edmund Horner <ejrh00(at)gmail(dot)com> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Tid scan improvements |
Date: | 2018-11-24 02:46:17 |
Message-ID: | 8d68f680-1a1c-142d-428d-987eb6d16db0@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/24/18 1:56 AM, Edmund Horner wrote:
> On Fri, 23 Nov 2018 at 07:03, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 11/22/18 8:41 AM, David Rowley wrote:
>>> ...
>>> 3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
>>> allocation until it reaches the required size. See how
>>> MakeSharedInvalidMessagesArray() does it. Doing it this way ensures
>>> we always have a power of two sized array which is much nicer if we
>>> ever reach the palloc() limit as if the array is sized at the palloc()
>>> limit / 2 + 1, then if we try to double it'll fail. Of course, it's
>>> unlikely to be a problem here, but... the question would be how to
>>> decide on the initial size.
>>
>> I think it kinda tries to do that in some cases, by doing this:
>>
>> *numAllocRanges *= 2;
>> ...
>> tidRanges = (TidRange *)
>> repalloc(tidRanges,
>> *numAllocRanges * sizeof(TidRange));
>>
>> The problem here is that what matters is not numAllocRanges being 2^N,
>> but the number of bytes allocated being 2^N. Because that's what ends up
>> in AllocSet, which keeps lists of 2^N chunks.
>>
>> And as TidRange is 12B, so this is guaranteed to waste memory, because
>> no matter what the first factor is, the result will never be 2^N.
>
> For simplicity, I think making it a strict doubling of capacity each
> time is fine. That's what we see in numerous other places in the
> backend code.
>
Sure.
> What we don't really see is intentionally setting the initial capacity
> so that each subsequent capacity is close-to-but-not-exceeding a power
> of 2 bytes. You can't really do that optimally if working in terms of
> whole numbers of items that aren't each a power of 2 size. This step,
> there may be 2/3 of an item spare; next step, we'll have a whole item
> spare that we're not going to use.
Ah, I missed the detail with setting initial size.
> So we could keep track in terms of bytes allocated, and then figure
> out how many items we can fit at the current time.
>
> In my opinion, such complexity is overkill for Tid scans.
>
> Currently, we try to pick an initial size based on the number of
> expressions. We assume each expression will yield one range, and
> allow that a saop expression might require us to enlarge the array.
>
> Again, for simplicity, we should scrap that and pick something like
> floor(256/sizeof(TidRange)) = 21 items, with about 1.5% wastage.
>
Probably. I don't think it'd be a lot of code to do the exact sizing,
but you're right 1.5% is close enough. As long as there is a comment
explaining the initial sizing, I'm fine with that.
If I could suggest one more thing, I'd define a struct combining the
array of ranges, numRanges and numAllocRangeslike:
typedef struct TidRanges
{
int numRanges;
int numAllocRanges;
TidRange ranges[FLEXIBLE_ARRAY_MEMBER];
} TidRanges;
and use that instead of the plain array. I find it easier to follow
compared to passing the various fields directly (sometimes as a value,
sometimes pointer to the value, etc.).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Paul A Jungwirth | 2018-11-24 02:55:54 | Re: SQL:2011 PERIODS vs Postgres Ranges? |
Previous Message | Anne Marie Harm | 2018-11-24 02:31:23 | could not connect to server, in order to operate pgAdmin/PostgreSQL |