Quick Links

Idea for reducing planning time

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Idea for reducing planning time
Date:	2000-12-13 23:14:56
Message-ID:	15124.976749296@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I've been looking into Brian Hirt's complaint that 7.0.3 and 7.1 are
lots slower than 7.0.2 in planning big joins. The direct cause is that
since we now deduce implied equality clauses, the system has more
potential join paths than it used to --- for example, given "WHERE
a.x = b.y AND b.y = c.z", the old system would not consider joining a to
c and then adding b, because it didn't have a joinqual relating a and c.
Now it does. There's not a lot to be done about that, but I've been
looking to see if we can make any offsetting speedups.

While digging around, I've realized that the planner is still carrying
around a lot more paths than it really needs to. The rule in add_path()
is that we will keep a path if it is cheaper OR differently sorted/
better sorted than any other path for the same rel. But what is not
taken into account is whether the sort ordering of a path actually has
any potential usefulness. Before saving a path on the grounds that it's
got an otherwise unobtainable sort ordering, we should check to see if
that sort ordering is really going to be useful for a later mergejoin
or for the final output ordering. It turns out we already have code to
check that for basic indexscan paths --- see useful_for_mergejoin() and
useful_for_ordering() in indxpath.c. But we fail to make the same sort
of test on paths for join relations, with the result that we carry along
a lot more paths than we could possibly need, and that costs huge
amounts of time.

An example of what's going on is that given a query with
FROM a, b, ... other rels ...
WHERE a.w = 1 AND a.x = 2 AND b.y = 3 AND b.z = 4 ...
if w,x,y,z all have indexes we will consider indexscans on all four
of those indexes. Which is fine. But we will then consider nestloops
and mergejoins of a with b that use these four indexscans as the outer
path, and therefore yield results that are sorted by w,x,y,z
respectively. Those paths will be carried as possible paths for a+b
because they offer different sort orders, even if we have no further
use for those sort orderings. And then we have a combinatorial blowup
in the number of paths considered at higher join levels. We should
instead consider that these paths have no useful sort ordering, and
throw away all but the cheapest.

What I'm thinking of doing is truncating the recorded pathkeys of a path
at the first sortkey that's not useful for either a higher-level
mergejoin clause or the requested final output sort ordering. Then the
logic inside add_path() wouldn't change, but it would only be considering
useful pathkeys and not useless ones.

I'm trying to resist the temptation to make this change right now :-).
It's not quite a bug fix --- well, maybe you could call it a performance
bug fix --- so I'm kind of thinking it shouldn't be done during beta.
OTOH I seem to have lost the argument that Vadim shouldn't commit VACUUM
performance improvements during beta, so maybe this should go in too.
What do you think?

regards, tom lane

Responses

Re: Idea for reducing planning time at 2000-12-14 00:04:34 from Alfred Perlstein
Re: Idea for reducing planning time at 2000-12-15 18:32:05 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Karl DeBisschop	2000-12-13 23:21:24	Re: RPM changes for 7.1.
Previous Message	Martin A. Marques	2000-12-13 23:08:05	Re: Why vacuum?