From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN improvements part2: fast scan |
Date: | 2014-03-12 17:42:32 |
Message-ID: | CAPpHfduJZiJAiR7qfOt2mH6kNbuBoQMbYJb9+7LgEb5pv53eHA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 12, 2014 at 8:29 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:
> On 03/12/2014 12:09 AM, Tomas Vondra wrote:
>
>> Hi all,
>>
>> a quick question that just occured to me - do you plan to tweak the cost
>> estimation fot GIN indexes, in this patch?
>>
>> IMHO it would be appropriate, given the improvements and gains, but it
>> seems to me gincostestimate() was not touched by this patch.
>>
>
> Good point. We have done two major changes to GIN in this release cycle:
> changed the data page format and made it possible to skip items without
> fetching all the keys ("fast scan"). gincostestimate doesn't know about
> either change.
>
> Adjusting gincostestimate for the more compact data page format seems
> easy. When I hacked on that, I assumed all along that gincostestimate
> doesn't need to be changed as the index will just be smaller, which will be
> taken into account automatically. But now that I look at gincostestimate,
> it assumes that the size of one item on a posting tree page is a constant 6
> bytes (SizeOfIptrData), which is no longer true. I'll go fix that.
>
> Adjusting for the effects of skipping is harder. gincostestimate needs to
> do the same preparation steps as startScanKey: sort the query keys by
> frequency, and call consistent function to split the keys intao "required"
> and "additional" sets. And then model that the "additional" entries only
> need to be fetched when the other keys match. That's doable in principle,
> but requires a bunch of extra code.
>
> Alexander, any thoughts on that? It's getting awfully late to add new code
> for that, but it sure would be nice somehow take fast scan into account.
Preparation we do in startScanKey requires knowledge of estimate size of
posting lists/trees. We do this estimate by traversal to leaf pages. I
think gincostestimate is expected to be way more cheap. So, we probably
need so more rough estimate there, don't we?
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-03-12 17:44:29 | Re: COPY table FROM STDIN doesn't show count tag |
Previous Message | Robert Haas | 2014-03-12 17:41:05 | Re: The case against multixact GUCs |