Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-03 00:57:05
Message-ID: CAAaqYe8KUqeybHdAJJ=V0jS3JztS0wiNmNu9i1VY3m-uoMXkzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 2, 2020 at 8:46 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > Hi,
> >
> > Thanks, the v52 looks almost ready. I've been looking at the two or
> > three things I mentioned, and I have a couple of comments.
> >
> >
> > 1) /* XXX comparison_cost shouldn't be 0? */
> >
> > I'm not worried about this, because this is not really intriduced by
> > this patch - create_sort_path has the same comment/issue, so I think
> > it's acceptable to do the same thing for incremental sort.
>
> Sounds good.
>
> > 2) INITIAL_MEMTUPSIZE
> >
> > tuplesort.c does this:
> >
> > #define INITIAL_MEMTUPSIZE Max(1024, \
> > ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
> >
> > supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
> > But I think it fails to do that, for a couple of reasons.
> >
> > Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
> > minimum (without padding), so with 1024 elements it's guaranteed to be
> > at least 21kB - so exceeding the threshold. The maximum value is
> > something like 256.
> >
> > Secondly, the second part of the formula is guaranteed to get us over
> > the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then
> >
> > ALLOCSET_SEPARATE_THRESHOLD / 30 = 273
> >
> > but we end up using 274, resulting in 8220B array. :-(
> >
> > So I guess the formula should be more like
> >
> > Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))
> >
> > or something like that.
> >
> > FWIW I think the whole hypothesis that selecting the array size below
> > ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
> > allocate this only once (or very few times), and if we need to grow the
> > array we'll hit the threshold anyway.
> >
> > I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
> > may be OK too.
>
> That was a part of the patch I haven't touched since I inherited it,
> and I didn't feel like I knew enough about the Postgres memory
> management to make a determination on whether the reasoning made
> sense.
>
> So I' happy to use a constant as suggested.

I take that back. That code hasn't changed, it's just moved. Here's
the current code in tuplesort_begin_common on master:

/*
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
*/
state->memtupsize = Max(1024,
ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1);

I'm not sure we ought to change that in this patch...

James

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-04-03 01:07:21 Re: WAL usage calculation patch
Previous Message James Coleman 2020-04-03 00:46:30 Re: [PATCH] Incremental sort (was: PoC: Partial sort)