Wrong estimate in query plan

From: Eelke Klein <eelke(at)bolt(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Subject: Wrong estimate in query plan
Date: 2013-10-24 06:42:15
Message-ID: CALEkvvymV3dcGY09aXz_08fy7H=vDe5PUSX2K0VgTxvUsrntzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In a complex query I have query I noticed that the planner made a bad
estimate for a join between two tables in the query (I made sure statistics
were up to date).

The join was on a single column. In the first table are about 985 rows. All
rows except one have a NULL value in the join column the one value in that
column is 1. The other table has 237240 rows and all rows have a 1 in the
join column. This column cannot contain NULL values. There is a foreign key
constraint between the join columns.

In pgAdmin I had a look at the statistices for the two columns.

Column in first table
Null Fraction 0.998985
Average Width 4
Distinct Values -1
Most Common Values
Most Common Frequencies
Histogram Bounds
Correlation

What I noticed is that are no most common values mentioned ofcourse the
value 1 only occurs once in the column but as all other values are NULL you
could argue it is a common value.

Column in second table:
Null Fraction 0
Average Width 4
Distinct Values 1
Most Common Values {1}
Most Common Frequencies {1}
Histogram Bounds
Correlation 1

Looks fine :)

Relevant part of AXPLAIN ANALYZE output
' -> Hash Join (cost=40.16..6471.62
rows=241 width=58) (actual time=0.486..102.979 rows=237240 loops=1)'
' Hash Cond: (kb.filiaal_id =
fil.filiaal)'
' -> Seq Scan on kassabon kb
(cost=0.00..5539.40 rows=237240 width=42) (actual time=0.036..28.562
rows=237240 loops=1)'
' -> Hash (cost=27.85..27.85
rows=985 width=20) (actual time=0.434..0.434 rows=1 loops=1)'
' Buckets: 1024 Batches: 1
Memory Usage: 1kB'
' -> Seq Scan on relatie fil
(cost=0.00..27.85 rows=985 width=20) (actual time=0.003..0.382 rows=985
loops=1)'

Notice how it expects 240 rows but gets 237240.

I wondered if this should be reported as a bug? It goes wrong I think
because of the statistics of the column in the first table give no
information about the values present except the NULL values.
Another thought I had was that the planner could have known there was a
matching row in the first table for each row in the second table because
there is a foreign key constraint between the two.

Regards, Eelke

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Amit Langote 2013-10-24 09:40:46 GIN fast update technique and work_mem
Previous Message Raghu Ram 2013-10-24 05:38:24 Re: pg_dumpall from a script