From: | Marco Colli <collimarco91(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Slow "not in array" operation |
Date: | 2019-11-12 21:40:07 |
Message-ID: | CAFvCgN5eATzgQcx2TtqM4yXzksxNVPVs_CtnDTpt1OvDXCnQJA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I am not a PostgreSQL expert, however I think that the following
algorithm should be possible and fast:
1. find the bitmap of all subscriptions in a project that are not trashed
(it can use the index and takes only ~500ms)
2. find the bitmap of all subscriptions that match the above condition and
HAVE the tag (~7s)
3. calculate [bitmap 1] - [bitmap 2] to find the subscriptions of the
project that DON'T HAVE the tag
On Tue, Nov 12, 2019 at 9:50 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marco Colli <collimarco91(at)gmail(dot)com> writes:
> > 3) Here's the query plan that I get after disabling the seq scan:
>
> > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> > time=94972.253..94972.254 rows=1 loops=1)
>
> So, this is slower than the seqscan, which means the planner made the
> right choice.
>
> You seem to be imagining that there's some way the index could be used
> with the NOT clause, but there isn't. Indexable clauses are of the form
> indexed_column indexable_operator constant
> and there's no provision for a NOT in that. If we had a "not contained
> in" array operator, the NOT could be folded to be of this syntactic form,
> but it's highly unlikely that any index operator class would consider such
> an operator to be a supported indexable operator. It doesn't lend itself
> to searching an index.
>
> So the planner is doing the best it can, which in this case is a
> full-table scan.
>
> A conceivable solution, if the tags array is a lot smaller than
> the table as a whole and the table is fairly static, is that you could
> make a btree index on the tags array and let the planner fall back
> to an index-only scan that is just using the index as a cheaper
> source of the array data. (This doesn't work for your existing GIST
> index because GIST can't reconstruct the original arrays on-demand.)
> I suspect though that this wouldn't win much, even if you disregard
> the maintenance costs for the extra index. The really fundamental
> problem here is that a large fraction of the table satisfies the
> NOT-in condition, and no index is going to beat a seqscan by all that
> much when that's true. Indexes are good at retrieving small portions
> of tables.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2019-11-12 23:33:28 | Re: Slow "not in array" operation |
Previous Message | Tom Lane | 2019-11-12 20:50:51 | Re: Slow "not in array" operation |