Re: Questions about btree_gin vs btree_gist for low cardinality columns

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Questions about btree_gin vs btree_gist for low cardinality columns
Date: 2019-05-29 17:59:51
Message-ID: CAMa1XUixPDxyYSy8BN4_spUc518H4WUO-URYYf6DV0=pLmV6Fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel <finzelj(at)gmail(dot)com> wrote:

> I have been hoping for clearer direction from the community about
> specifically btree_gin indexes for low cardinality columns (as well as low
> cardinality multi-column indexes). In general there is very little
> discussion about this both online and in the docs. Rather, the emphasis
> for GIN indexes discussed is always on full text search of JSON indexing,
> not btree_gin indexes.
>
> However, I have never been happy with the options open to me for indexing
> low cardinality columns and was hoping this could be a big win. Often I
> use partial indexes as a solution, but I really want to know how many use
> cases btree_gin could solve better than either a normal btree or a partial
> index.
>
> Here are my main questions:
>
> 1.
>
> "The docs say regarding *index only scans*: The index type must support
> index-only scans. B-tree indexes always do. GiST and SP-GiST indexes
> support index-only scans for some operator classes but not others. Other
> index types have no support. The underlying requirement is that the index
> must physically store, or else be able to reconstruct, the original data
> value for each index entry. As a counterexample, GIN indexes cannot support
> index-only scans because each index entry typically holds only part of the
> original data value."
>
> This is confusing to say "B-tree indexes always do" and "GIN indexes
> cannot support index-only scans", when we have a btree_gin index type.
> Explanation please ???
>
> Is it true that for a btree_gin index on a regular column, "each index
> entry typically holds only part of the original data value"? Do these
> still not support index only scans? Could they? I can't see why they
> shouldn't be able to for a single indexed non-expression field?
>
> 2.
>
> Lack of index only scans is definitely a downside. However, I see
> basically identical performance, but way less memory and space usage, for
> gin indexes. In terms of read-only performance, if index only scans are
> not a factor, why not always recommend btree_gin indexes instead of regular
> btree for low cardinality fields, which will yield similar performance but
> use far, far less space and resources?
>
> 3.
>
> This relates to 2. I understand the write overhead can be much greater
> for GIN indexes, which is why the fastupdate feature exists. But again, in
> those discussions in the docs, it appears to me they are emphasizing that
> penalty more for full text or json GIN indexes. Does the same overhead
> apply to a btree_gin index on a regular column with no expressions?
>
> Those are my questions.
>
> FYI, I can see an earlier thread about this topic (
> https://www.postgresql.org/message-id/flat/E260AEE7-95B3-4142-9A4B-A4416F1701F0%40aol.com#5def5ce1864298a3c0ba2d4881a660c2)
> but a few questions were left unanswered and unclear there.
>
> I first started seriously considering using btree_gin indexes for low
> cardinality columns, for example some text field with 30 unique values
> across 100 million rows, after reading a summary of index types from
> Bruce's article: https://momjian.us/main/writings/pgsql/indexing.pdf
>
> This article was also helpful but yet again I wonder it's broader
> viability:
> http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/
>
>
> Thank you!
> Jeremy
>

Could anyone shed any light on these questions? I appreciate it.

Thanks,
Jeremy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2019-05-29 18:32:28 Re: POSTGRES_FSM_RELATIONS CRITICAL: DB control fsm relations used: 79569 of 80000 (99%)
Previous Message Julie Nishimura 2019-05-29 16:41:36 Re: POSTGRES_FSM_RELATIONS CRITICAL: DB control fsm relations used: 79569 of 80000 (99%)