Re: Postgres using the wrong index index

From: Matt Dupree <matt(dot)dupree(at)heap(dot)io>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres using the wrong index index
Date: 2021-08-24 00:53:15
Message-ID: CAMOk8kpnuDtA7jh9YueU37d39xaQE97rWZE+VvYfdK-susgU1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I increased (and decreased) the stats target for the column and
re-analyzed. Didn't make a difference.

Is it possible that the row estimate is off because of a column other than
time? I looked at the # of events in that time period and 1.8 million is
actually a good estimate. What about the
((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <>
0) condition in the filter? It makes sense that Postgres wouldn't have a
way to estimate how selective this condition is.

On Tue, Aug 17, 2021 at 2:52 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Mon, Aug 16, 2021 at 11:22:44AM -0400, Matt Dupree wrote:
> > > Is either half of the AND estimated correctly? If you do a query
> > > with only ">=", and a query with only "<=", do either of them give an
> > > accurate rowcount estimate ?
> >
> > Dropping >= results in the correct index being used. Dropping <= doesn't
> > have this effect.
>
> This doesn't answer the question though: are the rowcount estimes accurate
> (say
> within 10%).
>
> It sounds like interpolating the histogram is giving a poor result, at
> least
> over that range of values. It'd be interesting to see the entire
> histogram.
>
> You might try increasing (or decreasing) the stats target for that column,
> and
> re-analyzing.
>
> Your histogram bounds are for ~38 months of data, and your query is for the
> previous month (July).
>
> $ date -d @1530186399
> Thu Jun 28 06:46:39 CDT 2018
> $ date -d @1629125609
> Mon Aug 16 09:53:29 CDT 2021
>
> $ date -d @1627369200
> Tue Jul 27 02:00:00 CDT 2021
> $ date -d @1624777200
> Sun Jun 27 02:00:00 CDT 2021
>
> The timestamp column has ndistinct near -1, similar to a continuous
> distribution, so I'm not sure why the estimate would be so bad.
>
> --
> Justin
>

--

K. Matt Dupree

Data Science Engineer
321.754.0526 | matt(dot)dupree(at)heap(dot)io

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2021-08-24 01:38:19 Re: Postgres using the wrong index index
Previous Message Nagaraj Raj 2021-08-23 17:54:33 Re: pg_restore schema dump to schema with different name