Optimizing count(), but Explain estimates wildly off

From: Chema <chema(at)interneta(dot)org>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Optimizing count(), but Explain estimates wildly off
Date: 2024-02-27 00:25:19
Message-ID: CALdEsqPScs2bEBxUZC9NO7-Dq2ZGaTQ58gpW=6eiEfq0BbXEsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear pgsqlers,

I'm trying to optimize simple queries on two tables (tenders & items) with
a couple million records. Besides the resulting records, the app also
displays the count of total results. Doing count() takes as much time as
the other query (which can be 30+ secs), so it's an obvious target for
optimization. I'm already caching count() results for the most common
conditions (country & year) in a material table, which practically halves
response time. The tables are updated sparingly, and only with bulk
COPYs. Now I'm looking for ways to optimize queries with other conditions.

Reading around, seems many people are still using this 2005 snippet
<https://www.postgresql.org/message-id/20050810133157.GA46247@winnie.fuhr.org>
to obtain the row count estimate from Explain:

CREATE FUNCTION count_estimate(query text) RETURNS integer AS $$DECLARE
rec record;
rows integer;BEGIN
FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
EXIT WHEN rows IS NOT NULL;
END LOOP;
RETURN rows;END;$$ LANGUAGE plpgsql VOLATILE STRICT;

Is this still the current best practice? Any tips to increase precision?
Currently it can estimate the actual number of rows for over *or* under a
million, as seen on the sample queries (1,955,297 instead of 1,001,200;
162,080 instead of 1,292,010).

Any other tips to improve the query are welcome, of course. There's a big
disparity between the two sample queries plans even though only the
filtered country changes.

I already raised default_statistics_target up to 2k (the planner wasn't
using indexes at all with low values). Gotta get it even higher? These are
my custom settings:

shared_buffers = 256MB # min 128kB
work_mem = 128MB # min 64kB
maintenance_work_mem = 254MB # min 1MB
effective_cache_size = 2GB
default_statistics_target = 2000
random_page_cost = 1.0 # same scale as above

Sample query:

Explain Analyze
Select * from tenders inner join items on transaction_id =
tender_transaction_id
where country = 'Colombia'
and "date" >= '2023-01-01' and "date" < '2024-01-01'
QUERY PLAN
Gather (cost=253837.99..1506524.32 rows=1955297 width=823) (actual
time=51433.592..63239.809 rows=1001200 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=252837.99..1309994.62 rows=814707
width=823) (actual time=51361.920..61729.142 rows=333733 loops=3)
Hash Cond: (items.tender_transaction_id = tenders.transaction_id)
-> Parallel Seq Scan on items (cost=0.00..1048540.46 rows=3282346
width=522) (actual time=1.689..56887.108 rows=2621681 loops=3)
-> Parallel Hash (cost=247919.56..247919.56 rows=393475
width=301) (actual time=2137.473..2137.476 rows=333733 loops=3)
Buckets: 1048576 Batches: 1 Memory Usage: 219936kB
-> Parallel Bitmap Heap Scan on tenders
(cost=16925.75..247919.56 rows=393475 width=301) (actual
time=385.315..908.865 rows=333733 loops=3)
Recheck Cond: ((country = 'Colombia'::text) AND (date
>= '2023-01-01'::date) AND (date < '2024-01-01'::date))
Heap Blocks: exact=24350
-> Bitmap Index Scan on tenders_country_and_date_index
(cost=0.00..16689.67 rows=944339 width=0) (actual time=423.213..423.214
rows=1001200 loops=1)
Index Cond: ((country = 'Colombia'::text) AND
(date >= '2023-01-01'::date) AND (date < '2024-01-01'::date))
Planning Time: 12.784 ms
JIT:
Functions: 33
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 14.675 ms, Inlining 383.349 ms, Optimization 1023.521
ms, Emission 651.442 ms, Total 2072.987 ms
Execution Time: 63378.033 ms

Explain Analyze
Select * from tenders inner join items on transaction_id =
tender_transaction_id
where country = 'Mexico'
and "date" >= '2023-01-01' and "date" < '2024-01-01'
QUERY PLAN
Gather (cost=1000.99..414258.70 rows=162080 width=823) (actual
time=52.538..7006.128 rows=1292010 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=0.99..397050.70 rows=67533 width=823) (actual
time=40.211..4087.081 rows=430670 loops=3)
-> Parallel Index Scan using tenders_country_and_date_index on
tenders (cost=0.43..45299.83 rows=32616 width=301) (actual
time=4.376..59.760 rows=1218 loops=3)
Index Cond: ((country = 'Mexico'::text) AND (date >=
'2023-01-01'::date) AND (date < '2024-01-01'::date))
-> Index Scan using items_tender_transaction_id_index on items
(cost=0.56..10.67 rows=11 width=522) (actual time=0.321..3.035 rows=353
loops=3655)
Index Cond: (tender_transaction_id = tenders.transaction_id)
Planning Time: 7.808 ms
JIT:
Functions: 27
Options: Inlining false, Optimization false, Expressions true, Deforming
true
Timing: Generation 17.785 ms, Inlining 0.000 ms, Optimization 5.080 ms,
Emission 93.274 ms, Total 116.138 ms
Execution Time: 7239.427 ms

Thanks in advance!

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2024-02-27 04:36:13 Re: Optimizing count(), but Explain estimates wildly off
Previous Message James Pang 2024-02-26 06:01:57 Re: FW: huge SubtransSLRU and SubtransBuffer wait_event