Quick Links

Questions about btree_gin vs btree_gist for low cardinality columns

From:	Jeremy Finzel <finzelj(at)gmail(dot)com>
To:	Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	Questions about btree_gin vs btree_gist for low cardinality columns
Date:	2019-05-24 15:25:54
Message-ID:	CAMa1XUiRZsg7Y5OBvau=orfVv=w=Rq19RDHQRySnF6RS=-+y2g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I have been hoping for clearer direction from the community about
specifically btree_gin indexes for low cardinality columns (as well as low
cardinality multi-column indexes). In general there is very little
discussion about this both online and in the docs. Rather, the emphasis
for GIN indexes discussed is always on full text search of JSON indexing,
not btree_gin indexes.

However, I have never been happy with the options open to me for indexing
low cardinality columns and was hoping this could be a big win. Often I
use partial indexes as a solution, but I really want to know how many use
cases btree_gin could solve better than either a normal btree or a partial
index.

Here are my main questions:

"The docs say regarding *index only scans*: The index type must support
index-only scans. B-tree indexes always do. GiST and SP-GiST indexes
support index-only scans for some operator classes but not others. Other
index types have no support. The underlying requirement is that the index
must physically store, or else be able to reconstruct, the original data
value for each index entry. As a counterexample, GIN indexes cannot support
index-only scans because each index entry typically holds only part of the
original data value."

This is confusing to say "B-tree indexes always do" and "GIN indexes cannot
support index-only scans", when we have a btree_gin index type.
Explanation please ???

Is it true that for a btree_gin index on a regular column, "each index
entry typically holds only part of the original data value"? Do these
still not support index only scans? Could they? I can't see why they
shouldn't be able to for a single indexed non-expression field?

Lack of index only scans is definitely a downside. However, I see
basically identical performance, but way less memory and space usage, for
gin indexes. In terms of read-only performance, if index only scans are
not a factor, why not always recommend btree_gin indexes instead of regular
btree for low cardinality fields, which will yield similar performance but
use far, far less space and resources?

This relates to 2. I understand the write overhead can be much greater for
GIN indexes, which is why the fastupdate feature exists. But again, in
those discussions in the docs, it appears to me they are emphasizing that
penalty more for full text or json GIN indexes. Does the same overhead
apply to a btree_gin index on a regular column with no expressions?

Those are my questions.

FYI, I can see an earlier thread about this topic (
https://www.postgresql.org/message-id/flat/E260AEE7-95B3-4142-9A4B-A4416F1701F0%40aol.com#5def5ce1864298a3c0ba2d4881a660c2)
but a few questions were left unanswered and unclear there.

I first started seriously considering using btree_gin indexes for low
cardinality columns, for example some text field with 30 unique values
across 100 million rows, after reading a summary of index types from
Bruce's article: https://momjian.us/main/writings/pgsql/indexing.pdf

This article was also helpful but yet again I wonder it's broader
viability:
http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/

Thank you!
Jeremy

Responses

Re: Questions about btree_gin vs btree_gist for low cardinality columns at 2019-05-29 17:59:51 from Jeremy Finzel
Re: Questions about btree_gin vs btree_gist for low cardinality columns at 2019-06-01 02:52:29 from Morris de Oryx
Re: Questions about btree_gin vs btree_gist for low cardinality columns at 2019-06-02 16:25:07 from Jeff Janes

Browse pgsql-general by date

	From	Date	Subject
Next Message	Julie Nishimura	2019-05-24 17:16:18	Re: POSTGRES_FSM_RELATIONS CRITICAL: DB control fsm relations used: 79569 of 80000 (99%)
Previous Message	Adrian Klaver	2019-05-24 14:19:53	Re: POSTGRES_FSM_RELATIONS CRITICAL: DB control fsm relations used: 79569 of 80000 (99%)