From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Aaron Bono" <postgresql(at)aranya(dot)com> |
Cc: | "Richard Huxton" <dev(at)archonet(dot)com>, "PostgreSQL SQL List" <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Query Join Performance |
Date: | 2007-04-25 22:23:22 |
Message-ID: | 6573.1177539802@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
"Aaron Bono" <postgresql(at)aranya(dot)com> writes:
> The biggest problem I notice is when I add a join from a child table
> (zip_city) to a parent table (zip). I have filtered the child table down to
> about 650 records but when I add the join to the parent which has over
> 800,000 records, performance tanks.
It shouldn't surprise you that joining a 900k row table to an 800k row table
isn't cheap.
It would certainly be better if the thing delayed the join to nearby_zip
until after it had done the restrictive join. Your problem is it
doesn't realize that that join condition is restrictive:
> Nested Loop (cost=45779.82..147990502.45 rows=451678770 width=40) (actual
^^^^^^^^^
> time=5404.943..20151.684 rows=653 loops=1)
> Join Filter: ((abs(("inner".longitude - "outer".longitude)) <=
> 0.833333333333333::double precision) AND (abs(("inner".latitude -
> "outer".latitude)) <= 0.833333333333333::double precision))
which is hardly surprising since the condition is phrased in a way that
isn't amenable to statistical analysis. You might want to look into
using PostGIS for this sort of thing --- it provides operators that are
better suited to the problem domain, and also allow some modicum of
intelligence in the rowcount estimates.
Another bad misestimation is here:
> -> Bitmap Heap Scan on zip
> (cost=32.91..8933.90rows=4261 width=8) (actual time=
> 0.272..0.275 rows=1 loops=1)
> Recheck Cond: ((zip_cd)::text = '66105'::text)
> -> Bitmap Index Scan on zip_zip_cd_key (cost=
> 0.00..32.91 rows=4261 width=0) (actual time=0.250..0.250 rows=1 loops=1)
> Index Cond: ((zip_cd)::text = '66105'::text)
The error of 4000x here contributes directly to the error in the
top-level row estimate; but this one is a simple scalar condition and I'd
expect our stats code to be able to deal with it. Are the stats on zip
up-to-date? Maybe you need to increase the stats target for it.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Aaron Bono | 2007-04-26 00:57:04 | Re: Query Join Performance |
Previous Message | Aaron Bono | 2007-04-25 20:39:55 | Re: Query Join Performance |