Quick Links

Re: Slow "not in array" operation

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Michael Lewis <mlewis(at)entrata(dot)com>
Cc:	Marco Colli <collimarco91(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Slow "not in array" operation
Date:	2019-11-12 19:53:34
Message-ID:	20191112195334.GP2923@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, Nov 12, 2019 at 12:20:10PM -0700, Michael Lewis wrote:
> It is very interesting to me that the optimizer chose a parallel sequential
> scan rather than an index scan on either of your indexes that start
> with project_id that also reference trashed_at.

Maybe because of low correlation on any of those columns?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE tablename='subscriptions' AND attname IN ('project_id','tags') ORDER BY 1 DESC;

Maybe clustering the table on project_id (and ANALYZEing) would help, but that
might need to be done consistently.

Michael previously suggested partitioning which, if done on project_id,
would then no longer need to be specially CLUSTERed.

Is the plan for the fast query the same as in August ?
https://www.postgresql.org/message-id/CAFvCgN4UijKTYiOF61Tyd%2BgHvF_oqnMabatS9%2BDcX%2B_PK2SHRw%40mail.gmail.com

Justin

In response to

Re: Slow "not in array" operation at 2019-11-12 19:20:10 from Michael Lewis

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Marco Colli	2019-11-12 20:06:34	Re: Slow "not in array" operation
Previous Message	Michael Lewis	2019-11-12 19:20:10	Re: Slow "not in array" operation