From: | Mark Lewis <mark(dot)lewis(at)mir3(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Brian Herlihy <btherl(at)yahoo(dot)com(dot)au>, Postgresql Performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Simple join optimized badly? |
Date: | 2006-10-11 15:07:40 |
Message-ID: | 1160579260.8082.121.camel@archimedes |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Tom,
I'm interested in the problem of cross-column statistics from a
theoretical perspective. It would be interesting to sit down and try to
reason out a useful solution, or at very least to understand the problem
better so I can anticipate when it might come and eat me.
>From my understanding, the main problem is that if PG knows the
selectivity of n conditions C1,C2,...,Cn then it doesn't know whether
the combined selectivity will be C1*C2*...*Cn (conditions are
independent) or max(C1,C2,...,Cn) (conditions are strictly dependent),
or somewhere in the middle. Therefore, row estimates could be orders of
magnitude off.
I suppose a common example would be a table with a serial primary key
column and a timestamp value which is always inserted as
CURRENT_TIMESTAMP, so the two columns are strongly correlated. If the
planner guesses that 1% of the rows of the table will match pk>1000000,
and 1% of the rows of the table will match timestamp > X, then it would
be nice for it to know that if you specify both "pk>1000000 AND
timestamp>X" that the combined selectivity is still only 1% and not 1% *
1% = 0.01%.
As long as I'm sitting down and reasoning about the problem anyway, are
there any other types of cases you're aware of where some form of cross-
column statistics would be useful? In the unlikely event that I
actually come up with a brilliant and simple solution, I'd at least like
to make sure that I'm solving the right problem :)
Thanks,
Mark Lewis
On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote:
> Brian Herlihy <btherl(at)yahoo(dot)com(dot)au> writes:
> > What would it take for hints to be added to postgres?
>
> A *whole lot* more thought and effort than has been expended on the
> subject to date.
>
> Personally I have no use for the idea of "force the planner to do
> exactly X given a query of exactly Y". You don't have exactly Y
> today, tomorrow, and the day after (if you do, you don't need a
> hint mechanism at all, you need a mysql-style query cache).
> IMHO most of the planner mistakes we see that could be fixed via
> hinting are really statistical estimation errors, and so the right
> level to be fixing them at is hints about how to estimate the number
> of rows produced for given conditions. Mind you that's still a plenty
> hard problem, but you could at least hope that a hint of that form
> would be useful for more than one query.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
From | Date | Subject | |
---|---|---|---|
Next Message | Brendan Curran | 2006-10-11 16:53:41 | Re: Scrub one large table against another |
Previous Message | Bruce Momjian | 2006-10-11 14:53:30 | Re: Simple join optimized badly? |