From: | strk(at)refractions(dot)net |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org, postgis-devel(at)postgis(dot)refractions(dot)net |
Subject: | Re: join selectivity |
Date: | 2004-12-23 09:33:51 |
Message-ID: | 20041223093351.GC96913@freek.keybit.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 16, 2004 at 01:56:29PM -0500, Tom Lane wrote:
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > ... But in the case of <column> <op>
> > <unknown constant>, if we're estimating the number of rows to return then
> > that becomes harder
>
> I didn't say it was easy ;-). The existing selectivity functions can't
> do better than a rough guess in such cases, and I don't expect you can
> either.
Tom, correct me if I'm wrong.
Doing some tests I've found out that the returned value from the
JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2
are not 'base' table, rather relations with a number of
rows once again estimated by other selectivity functions.
For example, if JOINSEL always returns 1.0, you get a different
'estimated' number of rows for a Nested Loop depending on the
presence of a condition filtering one of the tables.
Example:
test1 has 34 rows
test2 has 32 rows
a full join makes the estimate=1088 rows ( 34*32 )
a join with a filter on test2 makes estimate=34 ( 34*1 ? )
strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom;
NOTICE: LWGEOM_gist_joinsel called (returning 1.000000)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.37..32.17 rows=1088 width=36) (actual time=0.193..70.691 rows=983 loops=1)
Join Filter: ("inner".geom && "outer".geom)
-> Seq Scan on test2 (cost=0.00..4.32 rows=32 width=4) (actual time=0.074..0.267 rows=32 loops=1)
-> Materialize (cost=3.37..3.71 rows=34 width=32) (actual time=0.002..0.026 rows=34 loops=32)
-> Seq Scan on test1 (cost=0.00..3.34 rows=34 width=32) (actual time=0.042..0.159 rows=34 loops=1)
Total runtime: 71.426 ms
(6 rows)
trk=# explain analyze select * from test1, test2 where test1.geom && test2.geom and test2.id = 1;
NOTICE: LWGEOM_gist_joinsel called (returning 1.000000)
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..8.17 rows=34 width=44) (actual time=0.179..2.704 rows=17 loops=1)
Join Filter: ("inner".geom && "outer".geom)
-> Seq Scan on test2 (cost=0.00..4.40 rows=1 width=8) (actual time=0.078..0.208 rows=1 loops=1)
Filter: (id = 1)
-> Seq Scan on test1 (cost=0.00..3.34 rows=34 width=36) (actual time=0.041..0.181 rows=34 loops=1)
Total runtime: 2.819 ms
(6 rows)
Now, is the number 1 what has been estimated by
the RESTRICT selectivity estimator for
SERIAL = <constant> ?
If it is, does our JOINSEL function have access to this
information ?
TIA
--strk;
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Meskes | 2004-12-23 09:46:26 | Connection without database name |
Previous Message | strk | 2004-12-23 09:17:15 | Re: join selectivity |