Quick Links

Re: Query plan for very large number of joins

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	philb(at)vodafone(dot)ie
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query plan for very large number of joins
Date:	2005-06-02 16:02:28
Message-ID:	429F2D94.3030805@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

philb(at)vodafone(dot)ie wrote:
> Hi,
>
> I am using PostgreSQL (7.4) with a schema that was generated
> automatically (using hibernate). The schema consists of about 650
> relations. One particular query (also generated automatically)
> consists of left joining approximately 350 tables.

May I be the first to offer an "ouch"!

> At this stage, most tables are empty and those with values have less
> than 50 entries. The query takes about 90 seconds to execute (on a
> P4, 2.6Ghz).
>
> All of the relations have a primary key which is indexed and all of
> the joins are on foreign keys which are explicitly declared. I've
> checked the obvious tunables (effective_cache_size, shared_memory and
> sort_buffer) but changing these has had no effect. The system has a
> total of 750MB RAM, I've varied the shared memory up to 256MB and the
> sort buffer up to 128MB without affecting the performance.

The sort-mem is the only thing I can see helping with a single query.

> Running the query as a JDBC prepared statement indicates that the
> query optimiser is spending a negligable amount of time on the task
> (~ 36 ms) compared to the executor (~ 90 seconds). The output of
> EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested
> Loop Left Join" and all of the scans are of type "Seq Scan". I have
> refrained from posting the query and the query plan since these are
> 80K and 100K apiece but if anyone wants to see them I can certainly
> forward them on.

Well, if most tables are small then a seq-scan makes sense. Does it look
like it's estimating the number of rows badly anywhere? I'm not sure the
list will accept attachments that large - is it possible to upload them
somewhere accessible?

> My (uninformed) suspicion is that the optimiser has failed over to
> the default plan on the basis of the number of tables in the join. My
> question is, is there anyone out there using PostgreSQL with this
> size of schema? Is there anything that can be done to bring about the
> order of magnitude increase in speed that I need?

Well - the genetic planner must surely be kicking in here (see the
run-time configuration chapter of the manuals, query-planning,
geqo_threshold). However, I'm not sure how much leeway there is in
planning a largely left-joined query.

It could be there's some overhead in the executor that's only noticable
with hundreds of tables involved, you're running at about 0.25 secs per
join.

I take it you have no control over the schema or query, so there's not
much fiddling you can do. You've tried sort_mem, so there are only two
things I can think of:
1. Try the various enable_xxx config settings and see if disabling
seq-scan or the relevant join-type does anything (I'm not sure it will)
2. Try against 8.0 - there may be some improvement there.

Other people on this list have experience on larger systems than me, so
they may be able to help more.
--
Richard Huxton
Archonet Ltd

In response to

Query plan for very large number of joins at 2005-06-02 15:25:25 from philb

Responses

Re: Query plan for very large number of joins at 2005-06-02 16:26:26 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matthew T. O'Connor	2005-06-02 16:10:27	Re: How to avoid database bloat
Previous Message	ohp	2005-06-02 15:57:04	Re: SURVEY: who is running postgresql on 8 or more CPUs?