Re: Bad query plan when you add many OR conditions

From: Marco Colli <collimarco91(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bad query plan when you add many OR conditions
Date: 2020-01-10 14:53:22
Message-ID: CAFvCgN7yQFLjwLSs7VPLp8VxUrGTy5uzmQ8_GfraXTUSQdyoQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Before trying other solutions I would like to make PG use an index-only
scan (it should be fast enough for our purpose).

I have tried to disable the other indexes and forced PG to use this index
(which includes all the fields of the query):
index_subscriptions_on_project_id_and_created_at_and_tags

The problem is that the query plan is this:
https://gist.github.com/collimarco/03f3dde372f001485518b8deca2f3b24#file-index_scan_instead_of_index_only-txt

As you can see it is a *index scan* and not an *index only* scan... I don't
understand why. The index includes all the fields used by the query... so
an index only scan should be possible.

On Fri, Jan 10, 2020 at 2:34 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Fri, Jan 10, 2020 at 12:03:39PM +0100, Marco Colli wrote:
> > I have added this index which would allow an index only scan:
> > "index_subscriptions_on_project_id_and_created_at_and_tags" btree
> > (project_id, created_at DESC, tags) WHERE trashed_at IS NULL
>
> Are those the only columns in subscriptions ?
>
> > But Postgresql continues to use this index (which has less information
> and
> > then requires slow access to disk):
> > "index_subscriptions_on_project_id_and_created_at" btree (project_id,
> > created_at DESC)
>
> Did you vacuum the table ?
> Did you try to "explain" the query after dropping the 1st index (like:
> begin;
> DROP INDEX..; explain analyze..; rollback).
>
> Also, is the first (other) index btree_gin (you can \dx to show
> extensions) ?
>
> I think it needs to be a gin index to search tags ?
>
> On Fri, Jan 10, 2020 at 01:42:24PM +0100, Marco Colli wrote:
> > I would like to try your solution but I read that ALTER TABLE... SET
> > STATISTICS locks the table... Since it is just an experiment and we
> don't
> > know if it actually works it would be greate to avoid locking a large
> table
> > (50M) in production.
>
> I suggest to CREATE TABLE test_subscriptions (LIKE subscriptions INCLUDING
> ALL); INSERT INTO test_subscriptions SELECT * FROM subscriptions; ANALYZE
> test_subscriptions;
>
> Anyway, ALTER..SET STATS requires a strong lock but for only a brief moment
> (assuming it doesn't have to wait). Possibly you'd be ok doing SET
> statement_timeout='1s'; ALTER TABLE....
>
> > Does CREATE STATISTICS lock the table too?
>
> You can check by SET client_min_messages=debug; SET lock_timeout=333; SET
> log_lock_waits=on;
> Looks like it needs ShareUpdateExclusiveLock.
>
> > Does statistics work on an array field like tags? (I can't find any
> > information)
>
> It think it'd be data type agnostic. And seems to work with arrays.
>
> On Fri, Jan 10, 2020 at 02:30:27PM +0100, Marco Colli wrote:
> > @Justin Pryzby I have tried this as you suggested:
> >
> > CREATE STATISTICS statistics_on_subscriptions_project_id_and_tags ON
> > project_id, tags FROM subscriptions;
> > VACUUM ANALYZE subscriptions;
> >
> > Unfortunately nothing changes and Postgresql continues to use the wrong
> > plan (maybe stats don't work well on array fields like tags??).
>
> It'd help to see SELECT stxddependencies FROM pg_statistic_ext WHERE
> stxoid='subscriptions'::regclass
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2020-01-10 15:18:02 Re: Bad query plan when you add many OR conditions
Previous Message Justin Pryzby 2020-01-10 13:34:47 Re: Bad query plan when you add many OR conditions