Quick Links

Re: Optimizing count(), but Explain estimates wildly off

From:	Chema <chema(at)interneta(dot)org>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Optimizing count(), but Explain estimates wildly off
Date:	2024-03-04 19:13:56
Message-ID:	CALdEsqPjCeZ_RFfb5G8BAed+9tooPF=rXuL2sLNhAySwMcckXA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

>
> > -> Parallel Seq Scan on pricescope_items (cost=0.00..1027794.01
> rows=3277101 width=522) (actual time=0.753..41654.507 rows=2621681 loops=3)
>
> Why does it take over 41 seconds to read a table with less than
> 3 million rows? Are the rows so large? Is the tabe bloated?
> What is the size of the table as measured with pg_relation_size()
> and pg_table_size()?

There's one JSON column in each table with a couple fields, and a column
with long texts in Items.

-- pg_table_size, pg_relation_size, pg_indexes_size, rows
nametable_sizerelation_sizeindex_sizerow_estimate
tenders 1,775,222,784
1,630,461,952 3,815,567
items 8,158,773,248
6,052,470,784 7,865,043
check_postgres gave a 1.4 bloat score to tenders, 1.9 to items. I had a
duplicate index on transaction_id (one hand made, other from the unique
constraint) and other text column indexes with 0.3-0.5 bloat scores. After
Vacuum Full Analyze; sizes are greatly reduced, specially Items:

-- pg_table_size, pg_relation_size, pg_indexes_size, rows
nametable_sizerelation_sizeindex_sizerow_estimate
tenders 1,203,445,760 1,203,421,184 500,482,048 3,815,567
items 4,436,189,184 4,430,790,656 2,326,118,400 7,865,043

There were a couple mass deletions which probably caused the bloating.
Autovacuum is on defaults, but I guess it doesn't take care of that.
Still, performance seems about the same.

The planner is now using an Index Scan for Colombia without the subselect
hack, but subselect takes ~200ms less in avg, so might as well keep doing
it.

Row estimate is still +1M so still can't use that, but at least now it
takes less than 10s to get the exact count with all countries.

In response to

Re: Optimizing count(), but Explain estimates wildly off at 2024-03-01 08:57:13 from Laurenz Albe

Responses

Re: Optimizing count(), but Explain estimates wildly off at 2024-03-05 01:50:00 from Greg Sabino Mullane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	sud	2024-03-04 19:39:16	Is partition pruning impacted by data type
Previous Message	Laurenz Albe	2024-03-01 08:57:13	Re: Optimizing count(), but Explain estimates wildly off