Quick Links

Re: Slow "not in array" operation

From:	Rick Otten <rottenwindfish(at)gmail(dot)com>
To:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Slow "not in array" operation
Date:	2019-11-13 11:18:16
Message-ID:	CAMAYy4Jsyo4_Gs1hRvWd1vYnpPftAGWzx=OdhLAZC_WiU5CuTQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Nov 13, 2019 at 5:47 AM Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
wrote:

> Disclaimer: Out over my skis again.
>
> From what you say here, and over on SO, it sounds like you've got two
> problems:
>
> * Matching on *huge *numbers of records because of common tags.
>
> * A dynamic collection of tags as they're customer driven/configured.
>
> An "ideal" solution might look like a bit-index for each tag+tuple, but
> Postgres does not have such a structure. The closest I've seen are Bloom
> filter based indexes. That's likely not going to work here as you don't
> know the collection of tags at any one time. If, however, you create your
> own frequency count estimates for tags, you may well find that there are a
> small number of common tags, and a large number of rare tags. That would be
> good to find out. If you do have some super common (non selective) tags,
> then perhaps a Bloom index based on that collection could be effective. Or
> expression indexes on the very common tags. In your SaaS setup, you might
> need counts/indexes tied to some kind of customer/tenancy distinction ID,
> understood. But, for simplicity, I'm just saying a single set of frequency
> counts, etc.
>
> Here's a recent article on Bloom filter based indexes in Postgres that
> looks decent:
> https://www.percona.com/blog/2019/06/14/bloom-indexes-in-postgresql/
>

One other question might be whether you are always querying for a specific
tag or small set of tags, or if your queries are for relatively random
tags. ie, if you are always looking for the same 2 or 3 tags, then maybe
you could use a functional index or trigger-populate a new column on
insert/update that indicates whether those tags are present.

It is possible that you want a Graph model for this data instead of a
Relational model. ie, if you are finding a bunch of users with common
features, you may find traversing a graph (such as Neo4j - or if you _have_
to stay with a PG backend, something like Cayley.io) to be much more
efficient and flexible.

In response to

Re: Slow "not in array" operation at 2019-11-13 10:46:10 from Morris de Oryx

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeff Janes	2019-11-13 11:30:10	Re: Slow "not in array" operation
Previous Message	Morris de Oryx	2019-11-13 10:46:10	Re: Slow "not in array" operation