Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2013-11-19 23:06:34
Message-ID: CAPpHfdtZOhsmz_uf5BfdXcj5W3zk3K-PYxFc8-7eQw-nPXX1DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com>wrote:

> On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 14.11.2013 19:26, Alexander Korotkov wrote:
>>
>>> On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
>>> hlinnakangas(at)vmware(dot)com
>>>
>>>> wrote:
>>>>
>>>
>>> On 28.06.2013 22:31, Alexander Korotkov wrote:
>>>>
>>>> Now, I got the point of three state consistent: we can keep only one
>>>>> consistent in opclasses that support new interface. exact true and
>>>>> exact
>>>>> false values will be passed in the case of current patch consistent;
>>>>> exact
>>>>> false and unknown will be passed in the case of current patch
>>>>> preConsistent. That's reasonable.
>>>>>
>>>>>
>>>> I'm going to mark this as "returned with feedback". For the next
>>>> version,
>>>> I'd like to see the API changed per above. Also, I'd like us to do
>>>> something about the tidbitmap overhead, as a separate patch before
>>>> this, so
>>>> that we can assess the actual benefit of this patch. And a new test case
>>>> that demonstrates the I/O benefits.
>>>>
>>>
>>>
>>> Revised version of patch is attached.
>>> Changes are so:
>>> 1) Patch rebased against packed posting lists, not depends on additional
>>> information now.
>>> 2) New API with tri-state logic is introduced.
>>>
>>
>> Thanks! A couple of thoughts after a 5-minute glance:
>>
>> * documentation
>>
>
> Will provide documented version this week.
>
>
>> * How about defining the tri-state consistent function to also return a
>> tri-state? True would mean that the tuple definitely matches, false means
>> the tuple definitely does not match, and Unknown means it might match. Or
>> does return value true with recheck==true have the same effect? If I
>> understood the patch, right, returning Unknown or True wouldn't actually
>> make any difference, but it's conceivable that we might come up with more
>> optimizations in the future that could take advantage of that. For example,
>> for a query like "foo OR (bar AND baz)", you could immediately return any
>> tuples that match foo, and not bother scanning for bar and baz at all.
>
>
> The meaning of recheck flag when input contains unknown is undefined now.
> :)
> For instance, we could define it in following ways:
> 1) Like returning Unknown meaning that consistent with true of false
> instead of input Unknown could return either true or false.
> 2) Consistent with true of false instead of input Unknown could return
> recheck. This meaning is probably logical, but I don't see any usage of it.
>
> I'm not against idea of tri-state returning value for consisted, because
> it's logical continuation of its tri-state input. However, I don't see
> usage of distinguish True and Unknown in returning value for now :)
>
> In example you give we can return foo immediately, but we have to create
> full bitmap. So we anyway will have to scan (bar AND baz). We could skip
> part of trees for bar and baz. But it's possible only when foo contains
> large amount of sequential TIDS so we can be sure that we didn't miss any
> TIDs. This seems to be very narrow use-case for me.
>
> Another point is that one day we probably could immediately return tuples
> in gingettuple. And with LIMIT clause and no sorting we can don't search
> for other tuples. However, gingettuple was removed because of reasons of
> concurrency. And my patches for index-based ordering didn't return it in
> previous manner: it collects all the results and then returns them
> one-by-one.
>

I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure
out that I don't understand how GinFuzzySearchLimit works. Why with
GinFuzzySearchLimit startScan can return without doing startScanKey? Is it
a bug?

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oskari Saarenmaa 2013-11-19 23:08:27 Re: [PATCH] configure: allow adding a custom string to PG_VERSION
Previous Message Paul Ramsey 2013-11-19 22:40:00 Traffic jams in fn_extra