From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | Viktor Leis <leis(at)in(dot)tum(dot)de> |
Cc: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Experimental evaluation of PostgreSQL's query optimizer |
Date: | 2015-12-22 01:49:44 |
Message-ID: | CAMsr+YGpbK_OvSp+gE_Q4PuV2dWHc1ByJSv4MbM2vTHi4pNieg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21 December 2015 at 20:53, Viktor Leis <leis(at)in(dot)tum(dot)de> wrote:
> I think your suggestion amounts to caching the cardinalities of all
> two-way joins. One major issue is that for a query like
>
> select * from r1, r2 where r1.x = r2.y and r1.a = ? and r2.b;
>
> it depends on the specific values of r1.a and r2.b whether there is
> any (anti-)correlation. And obviously one cannot store correction
> factors for each value of a and b.
>
>
I see a parallel with indexing and partial indexes here.
We obviously cannot afford to keep cross-table correlations for every
possible pairing of join conditions across every possible set of joined
tables. Much like we can't afford to keep indexes for every possible set of
columns, but even worse.
Much as we let users CREATE INDEX to tell us what cols to index, maybe we
should let them CREATE a cross-table join statistics collector for a
particular set of tables, optionally qualified with a filter condition just
like we do on partial indexes, and optionally transformed via an immutable
expression like we do for expression indexes, e.g.:
CREATE JOIN STATISTICS ON t1 JOIN t2 ON (t1.col1 = t2.col2);
CREATE JOIN STATISTICS ON t1 JOIN t2 ON (lower(t1.col1) = lower(t2.col2))
WHERE t1.othercol IS NOT NULL;
CREATE JOIN STATISTICS ON t1 JOIN t2 ON (t1.colx = t2.colx AND t1.coly =
t2.coly);
plus a simplified form like
CREATE JOIN STATISTICS ON t1 JOIN t2 USING (somecol);
That way we let an admin who's tuning queries direct effort at problem
areas. It's not automagical, but it's an area where tools could analyze
pg_stat_statements to direct effort, much like is currently done for index
creation. Like index creation I don't think it's practical to do this
entirely automatically and behind the scenes since collecting the stats for
all possibilities rapidly gets prohibitive.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2015-12-22 01:51:56 | Re: Declarative partitioning |
Previous Message | Craig Ringer | 2015-12-22 01:40:38 | Re: Experimental evaluation of PostgreSQL's query optimizer |