From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PoC: Partial sort |
Date: | 2013-12-18 11:53:06 |
Message-ID: | CAPpHfdvih8Q04Kd_ezcGoFOFa4mYEgw8Ci5eT3gS41+BVhHa-Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> Hi,
>
> > > > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > > time=0.097..0.099 rows=10 loops=1)
> > > > -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > > time=0.096..0.097 rows=10 loops=1)
> > > > Sort Key: v1, v2
> > > > Sort Method: top-N heapsort Memory: 25kB
> > > > -> Index Scan using test_v1_idx on test
> (cost=0.42..47604.42
> > > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > > > Total runtime: 0.125 ms
> > > > (6 rows)
> > >
> > > Is that actually all that beneficial when sorting with a bog standard
> > > qsort() since that doesn't generally benefit from data being
> pre-sorted?
> > > I think we might need to switch to a different algorithm to really
> > > benefit from mostly pre-sorted input.
> > >
> >
> > In this patch I don't do full sort of dataset. For instance, index
> returns
> > data ordered by first column and we need to order them also by second
> > column.
>
> Ah, that makes sense.
>
> > But, I don't think we should expect pre-sorted values of second column
> > inside a group.
>
> Yes, if you do it that way, there doesn't seem to any need to assume
> that any more than we usually do.
>
> I think you should make the explain output reflect the fact that we're
> assuming v1 is presorted and just sorting v2. I'd be happy enough with:
> Sort Key: v1, v2
> Partial Sort: v2
> or even just
> "Partial Sort Key: [v1,] v2"
> but I am sure others disagree.
>
Sure, I just didn't change explain output yet. It should look like what you
propose.
> > > *partial-knn-1.patch*
>
> > > Rechecking from the heap means adding a sort node though, which I don't
> > > see here? Or am I misunderstanding something?
>
> > KNN-GiST contain RB-tree of scanned items. In this patch item is
> rechecked
> > inside GiST and reinserted into same RB-tree. It appears to be much
> easier
> > implementation for PoC and also looks very efficient. I'm not sure what
> is
> > actually right design for it. This is what I like to discuss.
>
> I don't have enough clue about gist to say wether it's the right design,
> but it doesn't look wrong to my eyes. It'd probably be useful to export
> the knowledge that we are rechecking and how often that happens to the
> outside.
> While I didn't really look into the patch, I noticed in passing that you
> pass a all_dead variable to heap_hot_search_buffer without using the
> result - just pass NULL instead, that performs a bit less work.
Useful notice, thanks.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2013-12-18 11:59:12 | Re: [PATCH] SQL assertions prototype |
Previous Message | Alexander Korotkov | 2013-12-18 11:51:08 | Re: PoC: Partial sort |