From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PG 13 release notes, first draft |
Date: | 2020-05-12 00:14:01 |
Message-ID: | 20200512001401.GG4666@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 11, 2020 at 05:05:29PM -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2020 at 4:10 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > think that you should point out that deduplication works by storing
> > > the duplicates in the obvious way: Only storing the key once per
> > > distinct value (or once per distinct combination of values in the case
> > > of multi-column indexes), followed by an array of TIDs (i.e. a posting
> > > list). Each TID points to a separate row in the table.
> >
> > These are not details that should be in the release notes since the
> > internal representation is not important for its use.
>
> I am not concerned about describing the specifics of the on-disk
> representation, and I don't feel too strongly about the storage
> parameter (leave it out). I only ask that the wording convey the fact
> that the deduplication feature is not just a quantitative improvement
> -- it's a qualitative behavioral change, that will help data
> warehousing in particular. This wasn't the case with the v12 work on
> B-Tree duplicates (as I said last year, I thought of the v12 stuff as
> fixing a problem more than an enhancement).
>
> With the deduplication feature added to Postgres v13, the B-Tree code
> can now gracefully deal with low cardinality data by compressing the
> duplicates as needed. This is comparable to bitmap indexes in
> proprietary database systems, but without most of their disadvantages
> (in particular around heavyweight locking, deadlocks that abort
> transactions, etc). It's easy to imagine this making a big difference
> with analytics workloads. The v12 work made indexes with lots of
> duplicates 15%-30% smaller (compared to v11), but the v13 work can
> make them 60% - 80% smaller in many common cases (compared to v12). In
> extreme cases indexes might even be ~12x smaller (though that will be
> rare).
Agreed. How is this?
This allows efficient btree indexing of low cardinality columns.
Users upgrading with pg_upgrade will need to use REINDEX to make use of
this feature.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2020-05-12 00:17:44 | Re: PG 13 release notes, first draft |
Previous Message | David G. Johnston | 2020-05-12 00:13:38 | Event trigger code comment duplication |