From: | Matt Dupree <matt(dot)dupree(at)heap(dot)io> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Postgres using the wrong index index |
Date: | 2021-08-24 00:53:15 |
Message-ID: | CAMOk8kpnuDtA7jh9YueU37d39xaQE97rWZE+VvYfdK-susgU1Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I increased (and decreased) the stats target for the column and
re-analyzed. Didn't make a difference.
Is it possible that the row estimate is off because of a column other than
time? I looked at the # of events in that time period and 1.8 million is
actually a good estimate. What about the
((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <>
0) condition in the filter? It makes sense that Postgres wouldn't have a
way to estimate how selective this condition is.
On Tue, Aug 17, 2021 at 2:52 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Mon, Aug 16, 2021 at 11:22:44AM -0400, Matt Dupree wrote:
> > > Is either half of the AND estimated correctly? If you do a query
> > > with only ">=", and a query with only "<=", do either of them give an
> > > accurate rowcount estimate ?
> >
> > Dropping >= results in the correct index being used. Dropping <= doesn't
> > have this effect.
>
> This doesn't answer the question though: are the rowcount estimes accurate
> (say
> within 10%).
>
> It sounds like interpolating the histogram is giving a poor result, at
> least
> over that range of values. It'd be interesting to see the entire
> histogram.
>
> You might try increasing (or decreasing) the stats target for that column,
> and
> re-analyzing.
>
> Your histogram bounds are for ~38 months of data, and your query is for the
> previous month (July).
>
> $ date -d @1530186399
> Thu Jun 28 06:46:39 CDT 2018
> $ date -d @1629125609
> Mon Aug 16 09:53:29 CDT 2021
>
> $ date -d @1627369200
> Tue Jul 27 02:00:00 CDT 2021
> $ date -d @1624777200
> Sun Jun 27 02:00:00 CDT 2021
>
> The timestamp column has ndistinct near -1, similar to a continuous
> distribution, so I'm not sure why the estimate would be so bad.
>
> --
> Justin
>
--
K. Matt Dupree
Data Science Engineer
321.754.0526 | matt(dot)dupree(at)heap(dot)io
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-08-24 01:38:19 | Re: Postgres using the wrong index index |
Previous Message | Nagaraj Raj | 2021-08-23 17:54:33 | Re: pg_restore schema dump to schema with different name |