Re: Questions about btree_gin vs btree_gist for low cardinality columns

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: Steven Winfield <Steven(dot)Winfield(at)cantabcapital(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Questions about btree_gin vs btree_gist for low cardinality columns
Date: 2019-06-03 10:11:48
Message-ID: CAKqncci+mtt-_5fdcOiNaxvtJF1ij5_dOTfda1t41mN0yVA=fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I didn't notice Bloom filters in the conversation so far, and have been
waiting for *years* for a good excuse to use a Bloom filter. I ran into
them years back in Splunk, which is a distributed log store. There's an
obvious benefit to a probabalistic tool like a Bloom filter there since
remote lookup (and/or retrieval from cold storage) is quite expensive,
relative to a local, hashed lookup. I haven't tried them in Postgres.

In the case of a single column with a small set of distinct values over a
large set of rows, how would a Bloom filter be preferable to, say, a GIN
index on an integer value?

I have to say, this is actually a good reminder in my case. We've got a lot
of small-distinct-values-big-rows columns. For example, "server_id",
"company_id", "facility_id", and so on. Only a handful of parent keys with
many millions of related rows. Perhaps it would be conceivable to use a
Bloom index to do quick lookups on combinations of such values within the
same table. I haven't tried Bloom indexes in Postgres, this might be worth
some experimenting.

Is there any thought in the Postgres world of adding something like
Oracle's bitmap indexes?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steven Winfield 2019-06-03 10:33:17 RE: Questions about btree_gin vs btree_gist for low cardinality columns
Previous Message Karsten Hilbert 2019-06-03 10:03:31 CREATE DATABASE ... TEMPLATE ... vs checksums