Quick Links

Re: PG 13 release notes, first draft

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PG 13 release notes, first draft
Date:	2020-05-12 00:14:01
Message-ID:	20200512001401.GG4666@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, May 11, 2020 at 05:05:29PM -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2020 at 4:10 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > think that you should point out that deduplication works by storing
> > > the duplicates in the obvious way: Only storing the key once per
> > > distinct value (or once per distinct combination of values in the case
> > > of multi-column indexes), followed by an array of TIDs (i.e. a posting
> > > list). Each TID points to a separate row in the table.
> >
> > These are not details that should be in the release notes since the
> > internal representation is not important for its use.
>
> I am not concerned about describing the specifics of the on-disk
> representation, and I don't feel too strongly about the storage
> parameter (leave it out). I only ask that the wording convey the fact
> that the deduplication feature is not just a quantitative improvement
> -- it's a qualitative behavioral change, that will help data
> warehousing in particular. This wasn't the case with the v12 work on
> B-Tree duplicates (as I said last year, I thought of the v12 stuff as
> fixing a problem more than an enhancement).
>
> With the deduplication feature added to Postgres v13, the B-Tree code
> can now gracefully deal with low cardinality data by compressing the
> duplicates as needed. This is comparable to bitmap indexes in
> proprietary database systems, but without most of their disadvantages
> (in particular around heavyweight locking, deadlocks that abort
> transactions, etc). It's easy to imagine this making a big difference
> with analytics workloads. The v12 work made indexes with lots of
> duplicates 15%-30% smaller (compared to v11), but the v13 work can
> make them 60% - 80% smaller in many common cases (compared to v12). In
> extreme cases indexes might even be ~12x smaller (though that will be
> rare).

Agreed. How is this?

This allows efficient btree indexing of low cardinality columns.
Users upgrading with pg_upgrade will need to use REINDEX to make use of
this feature.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Re: PG 13 release notes, first draft at 2020-05-12 00:05:29 from Peter Geoghegan

Responses

Re: PG 13 release notes, first draft at 2020-05-12 00:33:40 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2020-05-12 00:17:44	Re: PG 13 release notes, first draft
Previous Message	David G. Johnston	2020-05-12 00:13:38	Event trigger code comment duplication