Quick Links

Re: Slow "not in array" operation

From:	Marco Colli <collimarco91(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Slow "not in array" operation
Date:	2019-11-12 21:40:07
Message-ID:	CAFvCgN5eATzgQcx2TtqM4yXzksxNVPVs_CtnDTpt1OvDXCnQJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I am not a PostgreSQL expert, however I think that the following
algorithm should be possible and fast:

1. find the bitmap of all subscriptions in a project that are not trashed
(it can use the index and takes only ~500ms)
2. find the bitmap of all subscriptions that match the above condition and
HAVE the tag (~7s)
3. calculate [bitmap 1] - [bitmap 2] to find the subscriptions of the
project that DON'T HAVE the tag

On Tue, Nov 12, 2019 at 9:50 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Marco Colli <collimarco91(at)gmail(dot)com> writes:
> > 3) Here's the query plan that I get after disabling the seq scan:
>
> > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> > time=94972.253..94972.254 rows=1 loops=1)
>
> So, this is slower than the seqscan, which means the planner made the
> right choice.
>
> You seem to be imagining that there's some way the index could be used
> with the NOT clause, but there isn't. Indexable clauses are of the form
> indexed_column indexable_operator constant
> and there's no provision for a NOT in that. If we had a "not contained
> in" array operator, the NOT could be folded to be of this syntactic form,
> but it's highly unlikely that any index operator class would consider such
> an operator to be a supported indexable operator. It doesn't lend itself
> to searching an index.
>
> So the planner is doing the best it can, which in this case is a
> full-table scan.
>
> A conceivable solution, if the tags array is a lot smaller than
> the table as a whole and the table is fairly static, is that you could
> make a btree index on the tags array and let the planner fall back
> to an index-only scan that is just using the index as a cheaper
> source of the array data. (This doesn't work for your existing GIST
> index because GIST can't reconstruct the original arrays on-demand.)
> I suspect though that this wouldn't win much, even if you disregard
> the maintenance costs for the extra index. The really fundamental
> problem here is that a large fraction of the table satisfies the
> NOT-in condition, and no index is going to beat a seqscan by all that
> much when that's true. Indexes are good at retrieving small portions
> of tables.
>
> regards, tom lane
>

In response to

Re: Slow "not in array" operation at 2019-11-12 20:50:51 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeff Janes	2019-11-12 23:33:28	Re: Slow "not in array" operation
Previous Message	Tom Lane	2019-11-12 20:50:51	Re: Slow "not in array" operation