From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins) |
Date: | 1999-02-02 02:55:41 |
Message-ID: | 199902020255.VAA08357@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> It took a certain amount of patience to run these cases with profiling
> turned on :-(, but here are the results:
Interesting numbers.
>
> 7T+12I 7T+7I 7T
>
> Runtime, sec 1800 457 59
> equal() calls 585 mil 157 mil 21.7 mil
> A fairly decent approximation is that the runtime varies as the
> square of the number of create_nestloop_path calls. That number
> in turn seems to vary as the factorial of the number of tables,
> with a weaker but still strong term involving the number of indexes.
> I understand the square and the factorial terms, but the form of
> the dependency on the index count isn't real clear.
This is exactly what I would expect. See optimizer/README. The
factorial makes a lot of sense. It evaluates all joins join, removes
the cheapest from the list, and goes to find the next best one.
8 compares
7 compares
6 compares
...
Here is a good illustration of what you are seeing:
> 6!
720
> 8!
40320
That's why I reduced GEQO from 8 to 6.
>
> It might be worthwhile to try a GEQO threshold based on the number of
> tables plus half the number of indexes on those tables. I have no idea
> where to find the number of indexes, so I'll just offer the idea for
> someone else to try.
Sounds like a good idea. I can easily get that information. The
optimizer does that lower in the code. Perhaps we can just move the
GEQO test to after the index stuff is created. I will be able to look
at it after the TEMP stuff.
> The main thing that jumps out from the profiles is that on these complex
> searches, the optimizer spends practically *all* of its time down inside
> better_path, scanning the list of already known access paths to see if a
> proposed new path is a duplicate of one already known. (Most of the time
> it is not, as shown by the small number of times path_is_cheaper gets
> called from better_path.) This scanning is O(N^2) in the number of paths
> considered, whereas it seems that all the other work is no worse than O(N)
> in the number of paths. The bottom of the duplicate check is equal(),
> which as you can see is getting called a horrendous number of times.
>
> It would be worth our while to try to eliminate this mostly-unsuccessful
> comparison operation.
>
> I wonder whether it would work to simply always add the new path to the
> list, and rely on the later "prune" step to get rid of duplicates?
> The prune code is evidently far more efficient at getting rid of useless
> entries than better_path is, because it's nowhere near the top of the
> profile even though it's processing nearly as many paths.
I think I optimized prune.c white a bit around 6.1. I broke it during
the optimization, and Vadim fixed it for me.
One of my TODO items has been to 100% understand the optimizer. Haven't
had time to do that yet. Been on my list for a while.
>
> Anybody here know much about how the optimizer works? I'm hesitant to
> hack on it myself.
Let me help you.
--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From | Date | Subject | |
---|---|---|---|
Next Message | Vadim Mikheev | 1999-02-02 02:58:03 | Re: [HACKERS] Re: [GENERAL] big bad join problems |
Previous Message | Roland Roberts | 1999-02-02 02:17:07 | Re: [DOCS] Re: [HACKERS] Re: [COMMITTERS] [WEBMASTER] 'www/html main.html' |