From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PG 12 draft release notes |
Date: | 2019-06-12 20:16:11 |
Message-ID: | 20190612201611.hodu37grgiyoarcz@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 21, 2019 at 02:22:53PM -0700, Peter Geoghegan wrote:
> On Tue, May 21, 2019 at 1:51 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > My concern here (which I believe Alexander shares) is that it doesn't
> > > make sense to group these two items together. They're two totally
> > > unrelated pieces of work. Alexander's work does more or less help with
> > > lock contention with writes, whereas the feature that that was merged
> > > with is about preventing index bloat, which is mostly helpful for
> > > reads (it helps writes to the extent that writes are also reads).
> > >
> > > The release notes go on to say that this item "gives better
> > > performance for UPDATEs and DELETEs on indexes with many duplicates",
> > > which is wrong. That is something that should have been listed below,
> > > under the "duplicate index entries in heap-storage order" item.
> >
> > OK, I understand how the lock stuff improves things, but I have
> > forgotten how indexes are made smaller. Is it because of better page
> > split logic?
>
> That is clearly the main reason, though suffix truncation (which
> represents that trailing/suffix columns in index tuples from branch
> pages have "negative infinity" sentinel values) also contributes to
> making indexes smaller.
>
> The page split stuff was mostly added by commit fab250243 ("Consider
> secondary factors during nbtree splits"), but commit f21668f32 ("Add
> "split after new tuple" nbtree optimization") added to that in a way
> that really helped the TPC-C indexes. The TPC-C indexes are about 40%
> smaller now.
First, my apologies in getting to this so late. Peter Geoghegan
supplied me with slides and a video to study, and I now understand how
complex the btree improvements are. Here is a video of Peter's
presentation at PGCon:
https://www.youtube.com/watch?v=p5RaATILoiE
What I would like to do is to type them out here, and if I got it right,
I can then add these details to the release notes.
The over-arching improvement to btree in PG 12 is the ordering of index
entries by tid so all entries are unique. As Peter has stated, many
optimizations come out of that:
1. Since all index tuples are ordered, you can move from one leaf page
to the next without keeping a lock on the internal page that references
it, increasing concurrency.
2. When inserting a duplicate value in the index, we used to try a few
pages to see if there was space, then "get tired" and just split a page
containing duplicates. Now that there are no duplicates (because
duplicate key fields are sorted by tid) the system knows exactly what
page the index tuple belongs on, and inserts or splits the page as
necessary.
3. Pivot tuples are used on internal pages and as min/max indicators on
leaf pages. These pivot tuples are now trimmed if their trailing key
fields are not significant. For example, if an index is
field1/field2/field3, and the page contains values for which field1==5
and none that field1==6, there is no need to include field2 and field3
in the pivot tuple --- it can just list the pivot as field1==5,
field2=infinity. This is called suffix truncation.
Page splits used to just split the page in half, which minimizes the
number of page splits, but sometimes causes lots of wasted space. The
new code tries to split to reduce the length of pivot tuples, which ends
up being more efficient in space savings because the pivot tuples are
shorter, and the leaf pages end up being more tightly packed. This is
particularly true for ever-increasing keys because we often end up
creating a new empty page, rather than splitting an existing page.
4. Vacuum's removal of index tuples in indexes with many duplicates is
faster since it can more quickly find desired tuples.
Did I cover everything?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-06-12 20:18:36 | Re: pgbench rate limiting changes transaction latency computation |
Previous Message | Guillaume Lelarge | 2019-06-12 19:42:07 | Re: Quitting the thes |