Re: Slow "not in array" operation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marco Colli <collimarco91(at)gmail(dot)com>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Slow "not in array" operation
Date: 2019-11-12 20:50:51
Message-ID: 28041.1573591851@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Marco Colli <collimarco91(at)gmail(dot)com> writes:
> 3) Here's the query plan that I get after disabling the seq scan:

> Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> time=94972.253..94972.254 rows=1 loops=1)

So, this is slower than the seqscan, which means the planner made the
right choice.

You seem to be imagining that there's some way the index could be used
with the NOT clause, but there isn't. Indexable clauses are of the form
indexed_column indexable_operator constant
and there's no provision for a NOT in that. If we had a "not contained
in" array operator, the NOT could be folded to be of this syntactic form,
but it's highly unlikely that any index operator class would consider such
an operator to be a supported indexable operator. It doesn't lend itself
to searching an index.

So the planner is doing the best it can, which in this case is a
full-table scan.

A conceivable solution, if the tags array is a lot smaller than
the table as a whole and the table is fairly static, is that you could
make a btree index on the tags array and let the planner fall back
to an index-only scan that is just using the index as a cheaper
source of the array data. (This doesn't work for your existing GIST
index because GIST can't reconstruct the original arrays on-demand.)
I suspect though that this wouldn't win much, even if you disregard
the maintenance costs for the extra index. The really fundamental
problem here is that a large fraction of the table satisfies the
NOT-in condition, and no index is going to beat a seqscan by all that
much when that's true. Indexes are good at retrieving small portions
of tables.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marco Colli 2019-11-12 21:40:07 Re: Slow "not in array" operation
Previous Message Michael Lewis 2019-11-12 20:31:06 Re: Slow "not in array" operation