Re: Slow "not in array" operation

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: Marco Colli <collimarco91(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-13 10:46:10
Message-ID: CAKqnccgb4WC23znobRf5rdRBrP-tdijd0ZMLj311TMx7q_AYRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Disclaimer: Out over my skis again.

From what you say here, and over on SO, it sounds like you've got two
problems:

* Matching on *huge *numbers of records because of common tags.

* A dynamic collection of tags as they're customer driven/configured.

An "ideal" solution might look like a bit-index for each tag+tuple, but
Postgres does not have such a structure. The closest I've seen are Bloom
filter based indexes. That's likely not going to work here as you don't
know the collection of tags at any one time. If, however, you create your
own frequency count estimates for tags, you may well find that there are a
small number of common tags, and a large number of rare tags. That would be
good to find out. If you do have some super common (non selective) tags,
then perhaps a Bloom index based on that collection could be effective. Or
expression indexes on the very common tags. In your SaaS setup, you might
need counts/indexes tied to some kind of customer/tenancy distinction ID,
understood. But, for simplicity, I'm just saying a single set of frequency
counts, etc.

Here's a recent article on Bloom filter based indexes in Postgres that
looks decent:
https://www.percona.com/blog/2019/06/14/bloom-indexes-in-postgresql/

>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Otten 2019-11-13 11:18:16 Re: Slow "not in array" operation
Previous Message Marco Colli 2019-11-13 09:20:02 Re: Slow "not in array" operation