From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimising compactify_tuples() |
Date: | 2020-09-09 22:39:56 |
Message-ID: | CA+hUKGLhcvXFMvAG6O1UsE2sN2CUkMBNwdOFAHGEv4Szco_yXQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 10, 2020 at 2:34 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I think you were adequately caffeinated. You're right that this is
> fairly simple to do, but it looks even more simple than looping twice
> of the array. I think it's just a matter of looping over the
> itemidbase backwards and putting the higher itemid tuples at the end
> of the page. I've done it this way in the attached patch.
Yeah, I was wondering how to make as much of the algorithm work in a
memory-forwards direction as possible (even the item pointer access),
but it was just a hunch. Once you have the adjacent-tuple merging
thing so you're down to just a couple of big memcpy calls, it's
probably moot anyway.
> I also added a presorted path which falls back on doing memmoves
> without the temp buffer when the itemidbase array indicates that
> higher lineitems all have higher offsets. I'm doing the presort check
> in the calling function since that loops over the lineitems already.
> We can just memmove the tuples in reverse order without overwriting
> any yet to be moved tuples when these are in order.
Great.
I wonder if we could also identify a range at the high end that is
already correctly sorted and maximally compacted so it doesn't even
need to be copied out.
+ * Do the tuple compactification. Collapse memmove calls for adjacent
+ * tuples.
s/memmove/memcpy/
> With the attached v3, performance is better. The test now runs
> recovery 65.6 seconds, vs master's 148.5 seconds. So about 2.2x
> faster.
On my machine I'm seeing 57s, down from 86s on unpatched master, for
the simple pgbench workload from
https://github.com/macdice/redo-bench/. That's not quite what you're
reporting but it still blows the doors off the faster sorting patch,
which does it in 74s.
> We should probably consider what else can be done to try to write
> pages with tuples for earlier lineitems earlier in the page. VACUUM
> FULLs and friends will switch back to the opposite order when
> rewriting the heap.
Yeah, and also bulk inserts/COPY. Ultimately if we flipped our page
format on its head that'd come for free, but that'd be a bigger
project with more ramifications.
So one question is whether we want to do the order-reversing as part
of this patch, or wait for a more joined-up project to make lots of
code paths collude on making scan order match memory order
(corellation = 1). Most or all of the gain from your patch would
presumably still apply if did the exact opposite and forced offset
order to match reverse-item ID order (correlation = -1), which also
happens to be the initial state when you insert tuples today; you'd
still tend towards a state that allows nice big memmov/memcpy calls
during page compaction. IIUC currently we start with correlation -1
and then tend towards correlation = 0 after many random updates
because we can't change the order, so it gets scrambled over time.
I'm not sure what I think about that.
PS You might as well post future patches with .patch endings so that
the cfbot tests them; it seems pretty clear now that a patch to
optimise sorting (as useful as it may be for future work) can't beat a
patch to skip it completely. I took the liberty of switching the
author and review names in the commitfest entry to reflect this.
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2020-09-09 23:07:00 | Re: v13: show extended stats target in \d |
Previous Message | Alvaro Herrera | 2020-09-09 22:22:30 | Re: v13: show extended stats target in \d |