Re: Slow "not in array" operation

From: Marco Colli <collimarco91(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-12 21:40:07
Message-ID: CAFvCgN5eATzgQcx2TtqM4yXzksxNVPVs_CtnDTpt1OvDXCnQJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I am not a PostgreSQL expert, however I think that the following
algorithm should be possible and fast:

1. find the bitmap of all subscriptions in a project that are not trashed
(it can use the index and takes only ~500ms)
2. find the bitmap of all subscriptions that match the above condition and
HAVE the tag (~7s)
3. calculate [bitmap 1] - [bitmap 2] to find the subscriptions of the
project that DON'T HAVE the tag

On Tue, Nov 12, 2019 at 9:50 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Marco Colli <collimarco91(at)gmail(dot)com> writes:
> > 3) Here's the query plan that I get after disabling the seq scan:
>
> > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> > time=94972.253..94972.254 rows=1 loops=1)
>
> So, this is slower than the seqscan, which means the planner made the
> right choice.
>
> You seem to be imagining that there's some way the index could be used
> with the NOT clause, but there isn't. Indexable clauses are of the form
> indexed_column indexable_operator constant
> and there's no provision for a NOT in that. If we had a "not contained
> in" array operator, the NOT could be folded to be of this syntactic form,
> but it's highly unlikely that any index operator class would consider such
> an operator to be a supported indexable operator. It doesn't lend itself
> to searching an index.
>
> So the planner is doing the best it can, which in this case is a
> full-table scan.
>
> A conceivable solution, if the tags array is a lot smaller than
> the table as a whole and the table is fairly static, is that you could
> make a btree index on the tags array and let the planner fall back
> to an index-only scan that is just using the index as a cheaper
> source of the array data. (This doesn't work for your existing GIST
> index because GIST can't reconstruct the original arrays on-demand.)
> I suspect though that this wouldn't win much, even if you disregard
> the maintenance costs for the extra index. The really fundamental
> problem here is that a large fraction of the table satisfies the
> NOT-in condition, and no index is going to beat a seqscan by all that
> much when that's true. Indexes are good at retrieving small portions
> of tables.
>
> regards, tom lane
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2019-11-12 23:33:28 Re: Slow "not in array" operation
Previous Message Tom Lane 2019-11-12 20:50:51 Re: Slow "not in array" operation