Re: How to deal with analyze gathering irrelevant stats

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Rémi Chatenay <remi(dot)chatenay(at)doctolib(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: How to deal with analyze gathering irrelevant stats
Date: 2021-01-11 16:29:04
Message-ID: 20210111162904.GS1849@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jan 11, 2021 at 04:50:12PM +0100, Rémi Chatenay wrote:
> We are experiencing some slow queries due to the query planner using an
> incorrect index. It is using an unoptimized index because the stats are
> computed during the night when the data is not the same as during the day.
>
> CREATE INDEX index_conversations_on_user_id_and_status ON
> public.conversations USING btree (user_id, status);
>
> CREATE INDEX index_conversations_on_status ON public.conversations USING
> btree (status)
>
> The slow query is the following:
>
> SELECT id FROM conversations WHERE status = 'in_progress' AND user_id = 123
>
> There are hundreds of conversations with a status 'in_progress' at a given
> time during the day but virtually none during the night.
>
> So when the analyze is run during the night, PG then thinks that using the
> index_conversations_on_status will return almost no rows and so it uses
> this index instead of the combined one.
>
> When the analyze is run during the day, PG correctly uses the right index
> (index_conversations_on_user_id_and_status)

> We currently run a manual weekly vacuum analyze during the night. I'm
> wondering what are our possible solutions. One is to manually run the
> analyze during the day but is there a way to tell PG to run the auto
> analyze at a given time of the day for example ? I guess we are not the
> first ones to have data patterns that differ between when the analyze is
> run and the query is run.

I think you could run manual ANALYZE during the day just for this one column:
ANALYZE conversations (status);

If it takes too long or causes a performance issue, you could do:
SET default_statistics_target=10;
ANALYZE conversations (status);

You could also change to make autovacuum do this on its own, by setting:
ALTER TABLE conversations SET (autovacuum_analyze_scale_factor=0.005);

If that works but too slow, then maybe ALTER TABLE .. SET STATISTICS 10.

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Adam Brusselback 2021-01-11 16:37:33 Re: How to deal with analyze gathering irrelevant stats
Previous Message Rémi Chatenay 2021-01-11 15:50:12 How to deal with analyze gathering irrelevant stats