Re: Questions about btree_gin vs btree_gist for low cardinality columns

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Jeremy Finzel <finzelj(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Questions about btree_gin vs btree_gist for low cardinality columns
Date: 2019-06-03 00:24:15
Message-ID: CAKqnccgCwxP2Yp8P5fibzrFnW22N=r7fGySzTztMRnfkyfzECQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks to Tom Lane and Jeff Janes for chiming in with the level of detail
they're able to provide.

As an outsider-who-now-loves-Postgres, I don't know the history or deep
details of all of the various index types. (Obviously.) As a long-time
database programmer, I can say that low-cardinality fields are *very*
common cases. So whatever Postgres can offer to make for optimal searches
and aggregates on such columns would be of immediate, ongoing, and
widespread value.

As an example, we're dealing with millions of rows where we often want to
find or summarize by a category value. So, maybe 6-10 categories that are
used in various queries. It's not realistic for us to anticipate every
field combination the category field is going to be involved in to lay down
multi-column indexes everywhere.

I've used a system that handled this situation with a B-tree for the
distinct values, and a subordinate data structure for the associated key
(TIDs in PG, I guess.) They either stored a packed list of addresses, or a
compressed bitmap on the whole table, depending on the number of associated
entries. Seemed to work pretty well for queries. That also sounds very
like a btree_gin index in Postgres. (Without the compressed, on-disk bitmap
option.)

Getting back to the day-to-day, what would you recommend using for a
single-column index on a low-cardinality column (really low)? And, yes,
we'll happily use blocking queries up front to reduce the number of rows
under inspection, but that's 1) not always possible and 2) definitely not
always predictable in advance. So I'm looking for the best case for stupid
searches ;-)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom K 2019-06-03 04:48:04 Re: psql: FATAL: the database system is starting up
Previous Message Zahir Lalani 2019-06-02 23:52:15 Re: PG10 upgrade issue