Re: Odd Row Estimates in Query Plan (rows=75)

From: Don Seiler <don(at)seiler(dot)us>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Odd Row Estimates in Query Plan (rows=75)
Date: 2018-08-15 20:03:14
Message-ID: CAHJZqBBqL0xPtz0CEG4r1J8T4X1LML=eTZaHPVw92uKcmGzHdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here's the query, obfuscated manually by me:

SELECT
'Foo' as system_function,
stores.name as store,
lt.owner,
lt.minute_of_day,
lt.records
FROM
foo.stores
LEFT OUTER JOIN
(SELECT
lts.store_pkey,
lts.owner,
date_trunc('minute', lts.date_gifted) as minute_of_day,
count(*) as records
FROM foo.gifts lts
WHERE
lts.date_added > '2017-07-14 11:13:05'
AND lts.date_added < '2017-08-13 14:14:21'
AND lts.date_gifted >= '2017-08-13 11:13:05'
AND lts.date_gifted < '2017-08-13 14:14:21'
GROUP BY 1,2,3
ORDER BY 1
) lt ON lt.store_pkey = stores.pkey
WHERE lt.records IS NOT NULL;

The foo.gifts table is pretty much the core table of our database. It's big
and very active. There is an index on date_added but not yet on
date_gifted.

I'm working to re-write the query while the dev sees if we even need this
query anymore.

On Wed, Aug 15, 2018 at 2:39 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 08/15/2018 12:31 PM, Don Seiler wrote:
>
>> PostgreSQL 9.6.6 on CentOS.
>>
>> We have a report query that has gone from maybe a few seconds to run to a
>> few minutes to run since mid-July. Looking at the output of EXPLAIN
>> ANALYZE, the row count estimates are way off, even though this table was
>> just analyzed a day or so ago. What's more bizarre to me is that the row
>> count esimate is *always* 75 for every node of the plan, where the actual
>> rows is in the hundreds or thousands. This table is one of the busiest
>> tables in our production database (many inserts and updates). It is
>> autovacuumed and autoanalyzed a few times per week, although I'm looking to
>> change it to a nightly manual schedule to avoid daytime autovacuums.
>>
>> Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual
>> time=179877.869..179878.011 rows=759 loops=1)
>> Hash Cond: (stores.pkey = lt.store_pkey)
>> Buffers: shared hit=1654593 read=331897 dirtied=249
>> -> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual
>> time=0.007..0.023 rows=78 loops=1)
>> Buffers: shared hit=2
>> -> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual
>> time=179877.847..179877.847 rows=759 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 73kB
>> Buffers: shared hit=1654591 read=331897 dirtied=249
>> -> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75
>> width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
>> Buffers: shared hit=1654591 read=331897 dirtied=249
>> -> GroupAggregate (cost=1869138.59..1869140.65 rows=75
>> width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
>> Group Key: lts.store_pkey, lts.owner,
>> (date_trunc('minute'::text, lts.date_gifted))
>> Filter: (count(*) IS NOT NULL)
>> Buffers: shared hit=1654591 read=331897 dirtied=249
>> -> Sort (cost=1869138.59..1869138.78 rows=75
>> width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
>> Sort Key: lts.store_pkey,
>> lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
>> Sort Method: quicksort Memory: 757kB
>> Buffers: shared hit=1654591 read=331897
>> dirtied=249
>> -> Index Scan using gifts_date_added on
>> gifts lts (cost=0.56..1869136.25 rows=75 width=42) (actual
>> time=190.657..179870.165 rows=6731 loops=1)
>> Index Cond: ((date_added > '2018-07-14
>> 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
>> 14:14:21'::timestamp without time zone))
>> Filter: ((date_gifted >= '2018-08-13
>> 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
>> 14:14:21'::timestamp without time zone))
>> Rows Removed by Filter: 938197
>> Buffers: shared hit=1654591 read=331897
>> dirtied=249
>> Planning time: 0.426 ms
>> Execution time: 179893.894 ms
>>
>> I don't have a version of this query from prior to this summer, but
>> getting explain plan for older data from older sandboxes show a similar
>> plan.
>>
>
> I don't have an answer, just a question:
>
> Can you provide the actual query and the table schema?
>
>
>> Sidenote: I am suggesting that an index be added on the date_gifted field
>> as that is far more selective and avoids throwing rows away. However I'm
>> very interested in why every node dealing with the gifts table thinks
>> rows=75 when the actual is much, much higher. And 75 seems like too round
>> of a number to be random?
>>
>> --
>> Don Seiler
>> www.seiler.us <http://www.seiler.us>
>>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

--
Don Seiler
www.seiler.us

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Evan Rempel 2018-08-15 20:13:10 Re: How to revoke privileged from PostgreSQL's superuser
Previous Message Adrian Klaver 2018-08-15 19:39:45 Re: Odd Row Estimates in Query Plan (rows=75)