Re: Query Join Performance

From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Richard Huxton" <dev(at)archonet(dot)com>, "PostgreSQL SQL List" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Query Join Performance
Date: 2007-04-26 00:57:04
Message-ID: bf05e51c0704251757o311dd849n2c6a52e1c68b933e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 4/25/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Aaron Bono" <postgresql(at)aranya(dot)com> writes:
> > The biggest problem I notice is when I add a join from a child table
> > (zip_city) to a parent table (zip). I have filtered the child table
> down to
> > about 650 records but when I add the join to the parent which has over
> > 800,000 records, performance tanks.
>
> It shouldn't surprise you that joining a 900k row table to an 800k row
> table
> isn't cheap.
>
> It would certainly be better if the thing delayed the join to nearby_zip
> until after it had done the restrictive join. Your problem is it
> doesn't realize that that join condition is restrictive:
>
> > Nested Loop (cost=45779.82..147990502.45 rows=451678770 width=40)
> (actual
> ^^^^^^^^^
> > time=5404.943..20151.684 rows=653 loops=1)
> > Join Filter: ((abs(("inner".longitude - "outer".longitude)) <=
> > 0.833333333333333::double precision) AND (abs(("inner".latitude -
> > "outer".latitude)) <= 0.833333333333333::double precision))
>
> which is hardly surprising since the condition is phrased in a way that
> isn't amenable to statistical analysis. You might want to look into
> using PostGIS for this sort of thing --- it provides operators that are
> better suited to the problem domain, and also allow some modicum of
> intelligence in the rowcount estimates.
>
> Another bad misestimation is here:
>
> > -> Bitmap Heap Scan on zip
> > (cost=32.91..8933.90rows=4261 width=8) (actual time=
> > 0.272..0.275 rows=1 loops=1)
> > Recheck Cond: ((zip_cd)::text = '66105'::text)
> > -> Bitmap Index Scan on
> zip_zip_cd_key (cost=
> > 0.00..32.91 rows=4261 width=0) (actual time=0.250..0.250 rows=1 loops=1)
> > Index Cond: ((zip_cd)::text =
> '66105'::text)
>
> The error of 4000x here contributes directly to the error in the
> top-level row estimate; but this one is a simple scalar condition and I'd
> expect our stats code to be able to deal with it. Are the stats on zip
> up-to-date? Maybe you need to increase the stats target for it.
>
> regards, tom lane
>

Looks like a vacuum analyze did the trick. Performance is beautiful now. I
should have tried that earlier.

I thought I had the auto vacuum turned on (PostgreSQL 8.1) but I guess it
doesn't do analyze?

Anyway, I will schedule a vacuum analyze nightly - it is low usage and won't
cause any problems.

Thanks!

--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Alvaro Herrera 2007-04-26 03:07:16 Re: Query Join Performance
Previous Message Tom Lane 2007-04-25 22:23:22 Re: Query Join Performance