Re: Odd Row Estimates in Query Plan (rows=75)

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Don Seiler <don(at)seiler(dot)us>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Odd Row Estimates in Query Plan (rows=75)
Date: 2018-08-15 19:39:45
Message-ID: f7751be2-57f6-e75e-fbd3-8f38e51c8fbc@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 08/15/2018 12:31 PM, Don Seiler wrote:
> PostgreSQL 9.6.6 on CentOS.
>
> We have a report query that has gone from maybe a few seconds to run to
> a few minutes to run since mid-July. Looking at the output of EXPLAIN
> ANALYZE, the row count estimates are way off, even though this table was
> just analyzed a day or so ago. What's more bizarre to me is that the row
> count esimate is *always* 75 for every node of the plan, where the
> actual rows is in the hundreds or thousands. This table is one of the
> busiest tables in our production database (many inserts and updates). It
> is autovacuumed and autoanalyzed a few times per week, although I'm
> looking to change it to a nightly manual schedule to avoid daytime
> autovacuums.
>
> Hash Join  (cost=1869142.34..1869146.15 rows=75 width=88) (actual
> time=179877.869..179878.011 rows=759 loops=1)
>    Hash Cond: (stores.pkey = lt.store_pkey)
>    Buffers: shared hit=1654593 read=331897 dirtied=249
>    ->  Seq Scan on stores  (cost=0.00..2.77 rows=77 width=22) (actual
> time=0.007..0.023 rows=78 loops=1)
>          Buffers: shared hit=2
>    ->  Hash  (cost=1869141.40..1869141.40 rows=75 width=50) (actual
> time=179877.847..179877.847 rows=759 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 73kB
>          Buffers: shared hit=1654591 read=331897 dirtied=249
>          ->  Subquery Scan on lt  (cost=1869138.59..1869141.40 rows=75
> width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
>                Buffers: shared hit=1654591 read=331897 dirtied=249
>                ->  GroupAggregate  (cost=1869138.59..1869140.65 rows=75
> width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
>                      Group Key: lts.store_pkey, lts.owner,
> (date_trunc('minute'::text, lts.date_gifted))
>                      Filter: (count(*) IS NOT NULL)
>                      Buffers: shared hit=1654591 read=331897 dirtied=249
>                      ->  Sort  (cost=1869138.59..1869138.78 rows=75
> width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
>                            Sort Key: lts.store_pkey,
> lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
>                            Sort Method: quicksort  Memory: 757kB
>                            Buffers: shared hit=1654591 read=331897
> dirtied=249
>                            ->  Index Scan using gifts_date_added on
> gifts lts  (cost=0.56..1869136.25 rows=75 width=42) (actual
> time=190.657..179870.165 rows=6731 loops=1)
>                                  Index Cond: ((date_added > '2018-07-14
> 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
> 14:14:21'::timestamp without time zone))
>                                  Filter: ((date_gifted >= '2018-08-13
> 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
> 14:14:21'::timestamp without time zone))
>                                  Rows Removed by Filter: 938197
>                                  Buffers: shared hit=1654591
> read=331897 dirtied=249
>  Planning time: 0.426 ms
>  Execution time: 179893.894 ms
>
> I don't have a version of this query from prior to this summer, but
> getting explain plan for older data from older sandboxes show a similar
> plan.

I don't have an answer, just a question:

Can you provide the actual query and the table schema?

>
> Sidenote: I am suggesting that an index be added on the date_gifted
> field as that is far more selective and avoids throwing rows away.
> However I'm very interested in why every node dealing with the gifts
> table thinks rows=75 when the actual is much, much higher. And 75 seems
> like too round of a number to be random?
>
> --
> Don Seiler
> www.seiler.us <http://www.seiler.us>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Don Seiler 2018-08-15 20:03:14 Re: Odd Row Estimates in Query Plan (rows=75)
Previous Message Tom Lane 2018-08-15 19:31:30 Re: Code of Conduct plan