Re: Slow "not in array" operation

From: Rick Otten <rottenwindfish(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-13 11:18:16
Message-ID: CAMAYy4Jsyo4_Gs1hRvWd1vYnpPftAGWzx=OdhLAZC_WiU5CuTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Nov 13, 2019 at 5:47 AM Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
wrote:

> Disclaimer: Out over my skis again.
>
> From what you say here, and over on SO, it sounds like you've got two
> problems:
>
> * Matching on *huge *numbers of records because of common tags.
>
> * A dynamic collection of tags as they're customer driven/configured.
>
> An "ideal" solution might look like a bit-index for each tag+tuple, but
> Postgres does not have such a structure. The closest I've seen are Bloom
> filter based indexes. That's likely not going to work here as you don't
> know the collection of tags at any one time. If, however, you create your
> own frequency count estimates for tags, you may well find that there are a
> small number of common tags, and a large number of rare tags. That would be
> good to find out. If you do have some super common (non selective) tags,
> then perhaps a Bloom index based on that collection could be effective. Or
> expression indexes on the very common tags. In your SaaS setup, you might
> need counts/indexes tied to some kind of customer/tenancy distinction ID,
> understood. But, for simplicity, I'm just saying a single set of frequency
> counts, etc.
>
> Here's a recent article on Bloom filter based indexes in Postgres that
> looks decent:
> https://www.percona.com/blog/2019/06/14/bloom-indexes-in-postgresql/
>

One other question might be whether you are always querying for a specific
tag or small set of tags, or if your queries are for relatively random
tags. ie, if you are always looking for the same 2 or 3 tags, then maybe
you could use a functional index or trigger-populate a new column on
insert/update that indicates whether those tags are present.

It is possible that you want a Graph model for this data instead of a
Relational model. ie, if you are finding a bunch of users with common
features, you may find traversing a graph (such as Neo4j - or if you _have_
to stay with a PG backend, something like Cayley.io) to be much more
efficient and flexible.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2019-11-13 11:30:10 Re: Slow "not in array" operation
Previous Message Morris de Oryx 2019-11-13 10:46:10 Re: Slow "not in array" operation