From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimising compactify_tuples() |
Date: | 2020-09-11 05:44:24 |
Message-ID: | CA+hUKG+SkE2vtS_owMNJyM0FbEHOdaUcOi4wHvkoq09HwGjBcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 11, 2020 at 1:45 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Thu, 10 Sep 2020 at 10:40, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > I wonder if we could also identify a range at the high end that is
> > already correctly sorted and maximally compacted so it doesn't even
> > need to be copied out.
>
> I've experimented quite a bit with this patch today. I think I've
> tested every idea you've mentioned here, so there's quite a lot of
> information to share.
>
> I did write code to skip the copy to the separate buffer for tuples
> that are already in the correct place and with a version of the patch
> which keeps tuples in their traditional insert order (later lineitem's
> tuple located earlier in the page) I see a generally a very large
> number of tuples being skipped with this method. See attached
> v4b_skipped_tuples.png. The vertical axis is the number of
> compactify_tuple() calls during the benchmark where we were able to
> skip that number of tuples. The average skipped tuples overall calls
> during recovery was 81 tuples, so we get to skip about half the tuples
> in the page doing this on this benchmark.
Excellent.
> > So one question is whether we want to do the order-reversing as part
> > of this patch, or wait for a more joined-up project to make lots of
> > code paths collude on making scan order match memory order
> > (corellation = 1). Most or all of the gain from your patch would
> > presumably still apply if did the exact opposite and forced offset
> > order to match reverse-item ID order (correlation = -1), which also
> > happens to be the initial state when you insert tuples today; you'd
> > still tend towards a state that allows nice big memmov/memcpy calls
> > during page compaction. IIUC currently we start with correlation -1
> > and then tend towards correlation = 0 after many random updates
> > because we can't change the order, so it gets scrambled over time.
> > I'm not sure what I think about that.
>
> So I did lots of benchmarking with both methods and my conclusion is
> that I think we should stick to the traditional INSERT order with this
> patch. But we should come back and revisit that more generally one
> day. The main reason that I'm put off flipping the tuple order is that
> it significantly reduces the number of times we hit the preordered
> case. We go to all the trouble of reversing the order only to have it
> broken again when we add 1 more tuple to the page. If we keep this
> the traditional way, then it's much more likely that we'll maintain
> that pre-order and hit the more optimal memmove code path.
Right, that makes sense. Thanks for looking into it!
> I've also attached another tiny patch that I think is pretty useful
> separate from this. It basically changes:
>
> LOG: redo done at 0/D518FFD0
>
> into:
>
> LOG: redo done at 0/D518FFD0 system usage: CPU: user: 58.93 s,
> system: 0.74 s, elapsed: 62.31 s
+1
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2020-09-11 05:48:03 | Re: Optimising compactify_tuples() |
Previous Message | Hunter, James | 2020-09-11 05:32:13 | Re: Fix for parallel BTree initialization bug |