Re: Query plan for very large number of joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: philb(at)vodafone(dot)ie
Cc: pgsql-performance(at)postgreSQL(dot)org
Subject: Re: Query plan for very large number of joins
Date: 2005-06-03 17:29:03
Message-ID: 6292.1117819743@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

<philb(at)vodafone(dot)ie> writes:
> I've attached the schema and query text, hopefully it will be of some use
> to you. Note that both are taken from the HyperUBL project
> (https://hyperubl.dev.java.net/) Sadly, at this stage I think it's
> time for me to try alternatives to either Hibernate or Postgresql.

Thanks. Profiling on 7.4 I get this for an EXPLAIN (after vacuum
analyzing the database):

% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
61.66 618.81 618.81 2244505819 0.00 0.00 compare_path_costs
15.01 769.44 150.63 1204882 0.00 0.00 add_path
8.08 850.57 81.13 772077 0.00 0.00 nth
3.76 888.27 37.70 1113598 0.00 0.00 nconc
2.59 914.30 26.03 233051 0.00 0.00 find_joininfo_node
2.23 936.70 22.40 30659124 0.00 0.00 bms_equal
1.14 948.14 11.44 39823463 0.00 0.00 equal
0.77 955.84 7.70 83300 0.00 0.00 find_base_rel

This is with no special planner settings. Obviously the problem is that
it's considering way too many different paths. We did do something
about that in 8.0 (basically, throw away paths with "nearly the same"
cost) ... but the bottom line didn't improve a whole lot. CVS tip
profile for the same case is

% cumulative self self total
time seconds seconds calls s/call s/call name
38.37 176.41 176.41 53344348 0.00 0.00 list_nth_cell
35.26 338.52 162.11 196481 0.00 0.00 get_rte_attribute_is_dropped
5.42 363.44 24.92 233051 0.00 0.00 find_joininfo_node
4.72 385.14 21.70 30659416 0.00 0.00 bms_equal
4.09 403.95 18.81 53344348 0.00 0.00 list_nth
2.31 414.58 10.63 37347920 0.00 0.00 equal
1.40 421.03 6.45 83299 0.00 0.00 find_base_rel
1.08 426.01 4.98 617917 0.00 0.00 SearchCatCache
0.90 430.13 4.12 5771640 0.00 0.00 AllocSetAlloc

The get_rte_attribute_is_dropped calls (and list_nth/list_nth_cell,
which are mostly being called from there) arise from a rather hastily
added patch that prevents failure when a JOIN list in a stored view
refers to a since-dropped column of an underlying relation. I had not
realized that that check could have O(N^2) behavior in deeply nested
joins, but it does. Obviously we'll have to rethink that.

After that it looks like the next hotspot is find_joininfo_node
(and bms_equal which is mostly getting called from there). We could
maybe fix that by rethinking the joininfo data structure --- right now
it's a collection of simple Lists, which betrays the planner's Lispy
heritage ;-). Again, that's not something I've ever seen at the top
of a profile before --- there may be some O(N^2) behavior involved
here too, but I've not analyzed it in detail.

It does look like 8.0 would be about a factor of 2 faster for you
than 7.4, but the real fix will probably have to wait for 8.1.

Also: the 8.0 problem is definitely an O(N^2) type of deal, which means
if you could reduce the depth of nesting by a factor of 2 the cost would
go down 4x. You said this was an automatically generated query, so
there may not be much you can do about it, but if you could parenthesize
the FROM list a bit more intelligently the problem would virtually go
away. What you have is effectively

FROM ((((a left join b) left join c) left join d) ....

so the nesting goes all the way down. With something like

FROM ((a left join b) left join c ...)
left join
((d left join e) left join f ...)

the max nesting depth would be halved. I don't understand your schema
at all so I'm not sure what an appropriate nesting might look like, but
maybe there is a short-term workaround to be found there. (This will
*not* help on 7.4, as the bottleneck there is completely different.)

regards, tom lane

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Poe 2005-06-03 18:45:29 Postgresql and Software RAID/LVM
Previous Message J. Andrew Rogers 2005-06-03 17:18:45 Re: Filesystem