Quick Links

Re: [HACKERS] [PATCH] Incremental sort

From:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [PATCH] Incremental sort
Date:	2017-12-08 12:42:06
Message-ID:	CAPpHfds+CL_Kwkxmk9nwG=pzAqoxYEnJJrRZFRoRNMRC6Hn+QQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Nov 22, 2017 at 12:01 AM, Thomas Munro <
thomas(dot)munro(at)enterprisedb(dot)com> wrote:

> >> I gather that you have
> >> determined empirically that it's better to be able to sort groups of
> >> at least MIN_GROUP_SIZE than to be able to skip the comparisons on the
> >> leading attributes, but why is that the case?
> >
> > Right. The issue that not only case of one tuple per group could cause
> > overhead, but few tuples (like 2 or 3) is also case of overhead. Also,
> > overhead is related not only to sorting. While investigate of regression
> > case provided by Heikki [1], I've seen extra time spent mostly in extra
> > copying of sample tuple and comparison with that. In order to cope this
> > overhead I've introduced MIN_GROUP_SIZE which allows to skip copying
> sample
> > tuples too frequently.
>
> I see. I wonder if there could ever be a function like
> ExecMoveTuple(dst, src). Given the polymorphism involved it'd be
> slightly complicated and you'd probably have a general case that just
> copies the tuple to dst and clears src, but there might be a bunch of
> cases where you can do something more efficient like moving a pointer
> and pin ownership. I haven't really thought that through and
> there may be fundamental problems with it...
>

ExecMoveTuple(dst, src) would be good. But, it would be hard to implement
"moving a pointer and pin ownership" principle in our current
infrastructure. It's because source and destination can have different
memory contexts. AFAICS, we can't just move memory area between memory
contexts: we have to allocate new area, then memcpy, and then deallocate
old area.

> If you're going to push the tuples into the sorter every time, then I
> guess there are some special cases that could allow future
> optimisations: (1) if you noticed that every prefix was different, you
> can skip the sort operation (that is, you can use the sorter as a dumb
> tuplestore and just get the tuples out in the same order you put them
> in; not sure if Tuplesort supports that but it presumably could),

In order to notice that every prefix is different, I have to compare every
prefix. But that may introduce an overhead. So, there reason why I
introduced MIN_GROUP_SIZE is exactly to not compare every prefix...

> (2)
> if you noticed that every prefix was the same (that is, you have only
> one prefix/group in the sorter) then you could sort only on the suffix
> (that is, you could somehow tell Tuplesort to ignore the leading
> columns),

Yes, I did so before. But again, after introducing MIN_GROUP_SIZE, I
missed knowledge whether all the prefixes were the same or different. This
is why, I've to sort by full column list for now...

(3) as a more complicated optimisation for intermediate
> group sizes 1 < n < MIN_GROUP_SIZE, you could somehow number the
> groups with an integer that increments whenever you see the prefix
> change, and somehow tell tuplesort.c to use that instead of the
> leading columns.

That is interesting idea. The reason we have an overhead in comparison
with plain sort is that we do extra comparison (and copying), but knowledge
of this comparison result is lost for sorting itself. Thus, sorting can
"reuse" prefix comparison, and overhead would be lower. But the problem is
that we have to reformat tuples before putting them into tuplesort. I
wonder if tuple reformatting could eat potential performance win...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Re: [HACKERS] [PATCH] Incremental sort at 2017-11-21 21:01:33 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2017-12-08 13:06:45	Re: [HACKERS] [PATCH] Incremental sort
Previous Message	Alexander Korotkov	2017-12-08 11:59:42	Re: [HACKERS] Proposal for CSN based snapshots