From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "'Viktor Leis *EXTERN*'" <leis(at)in(dot)tum(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Experimental evaluation of PostgreSQL's query optimizer |
Date: | 2015-12-21 08:22:45 |
Message-ID: | A737B7A37273E048B164557ADEF4A58B5378A976@ntex2010i.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Viktor Leis wrote:
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf
>
> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
> 3. It seems to me that one obvious way to avoid many bad situations
> would be to disable nested loop joins when the inner relation is NOT
> an index scan.
>
> I hope this will be of interest to some of you.
I have read the paper with great interest, and I have some comments.
- The paper mentions that the "Join Order Benchmark" has high cross-table
correlation, and this correlation is responsible for bad cardinality
estimates that cause bad plans with all RDBMS.
Wouldn't it be interesting to do the same experiment with a different
real-word data sets to see if that is indeed typical and not an
idiosyncrasy of that specific benchmark?
- The paper suggests that sampling the base tables is preferable to
using statistics because it gives better estimates, but I think that that
is only a win with long running, complicated, data warehouse style queries.
For the typical OLTP query it would incur intolerable planning times.
Any ideas on that?
- From my experience in tuning SQL queries I can confirm your one finding,
namely that bad cardinality estimates are the prime source for bad
plan choices.
Perhaps it would be valuable to start thinking about statistics for
inter-table correlation. What about something as "simple" as a factor
per (joinable) attribute pair that approximates the total row count
of a join on these attributes, divided by the planner's estimate?
- I also can corroborate your finding that nested loop joins are often
harmful, particularly when the inner loop is a sequential scan.
One of the first things I do when investigating bad performance of a query
whose plan has a nestend loop join is to set enable_nestloop to "off"
and see if that makes a difference, and it often does.
Maybe it would be a win to bias the planner against nested loop joins.
This is dreaming, but it might be nice to have some number as to how
reliable a certain estimate is, which is high if the estimate is, say,
derived from a single filter on a base table and sinks as more conditions
are involved or numbers pulled out of thin air.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2015-12-21 08:27:31 | Re: Freeze avoidance of very large table. |
Previous Message | Jesper Pedersen | 2015-12-21 07:50:36 | Re: Additional LWLOCK_STATS statistics |