From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | WIP patch for parameterized inner paths |
Date: | 2012-01-17 05:06:54 |
Message-ID: | 4643.1326776814@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Well, since I see other committers sending in patches the day after the
nominal commitfest deadline, I don't feel too bad about being a bit late
as well. Attached is a WIP patch for parameterized paths, along the
lines we have discussed before:
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php
http://archives.postgresql.org/pgsql-hackers/2011-05/msg01136.php
http://archives.postgresql.org/pgsql-hackers/2011-11/msg00543.php
While this is far from complete, it's sufficient to fix the example
that Andres Freund gave here:
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00889.php
For reference, 8.3 and earlier produced a plan like this for Andres'
example:
Index Scan using c_value_key on c (cost=0.00..24.83 rows=1 width=0)
Index Cond: (value = 1)
Filter: (subplan)
SubPlan
-> Nested Loop (cost=0.00..16.56 rows=1 width=12)
-> Index Scan using b__c_id on b (cost=0.00..8.27 rows=1 width=8)
Index Cond: (c_id = $0)
-> Index Scan using a__b_id on a (cost=0.00..8.27 rows=1 width=8)
Index Cond: (a.b_id = b.b_id)
8.4 through 9.1 produce something like this:
Nested Loop Semi Join (cost=1543.00..4486.27 rows=1 width=0)
Join Filter: (c.c_id = b.c_id)
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4)
Index Cond: (value = 1)
-> Hash Join (cost=1543.00..3853.00 rows=50000 width=4)
Hash Cond: (a.b_id = b.b_id)
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4)
-> Hash (cost=722.00..722.00 rows=50000 width=8)
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8)
which is just as bad as the cost estimate makes it look. With this
patch, HEAD produces:
Nested Loop Semi Join (cost=0.00..24.84 rows=1 width=0)
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4)
Index Cond: (value = 1)
-> Nested Loop (cost=0.00..16.56 rows=1 width=4)
-> Index Scan using b__c_id on b (cost=0.00..8.27 rows=1 width=8)
Index Cond: (c_id = c.c_id)
-> Index Only Scan using a__b_id on a (cost=0.00..8.27 rows=1 width=4
)
Index Cond: (b_id = b.b_id)
which is effectively the same thing as 8.3's plan, although now the idea
that it's a semijoin is explicit.
Although this patch gets the infrastructure in place, there is a lot
left to do, notably:
* Initial generation of parameterized indexpaths in indxpath.c is quite
stupid. To minimize the patch footprint, I left alone as much as I could
of the existing structure in that file whereby indexable join clauses are
selected if they match a specific target outer relation. I think that
that logic needs to be torn apart and instead drive the process by
considering all clauses matching a specific index; which is going to mean
considerable refactoring, but it'll be localized within indxpath.c.
* joinpath.c doesn't yet consider parameterized input paths for
hash and merge joins. This means that in the above example, the
lower-level join can *only* be done as a nestloop; which is okay for the
numbers of rows in this example, but there's a range of rowcounts where
we'd like to have something smarter at that join step. The trick here is
mainly to not consider an unreasonably large number of possible plans.
* The whole thing needs performance testing to see if or how much slower
it makes the planner. I'm not sure that there's much point in that until
the code is less bogus around the above two areas; although if it's really
horridly slower as-is on any complex queries, we've got a problem.
* There are several places where we intentionally lobotomized subquery
pullup to prevent the worst cases of 8.3-to-8.4 plan regressions as a
result of being unable to pass parameters through intermediate joins.
Probably that could get undone now, but I need to do some testing with
the example cases that persuaded us to put those hacks in.
* I think it should now be a fairly localized matter to support nestloop
joins with inner TID scans, which is a capability that's been requested
more than once, even though the use-cases aren't real wide.
* There's assorted now-dead code that I've not bothered to remove yet,
and the costsize.c APIs could stand some more refactoring. This is
just in the nature of cleanup so I left it for a separate patch.
Anyway, I'm hoping to keep hacking at this for a couple more days before
the CF gets into full swing, and hopefully arrive at something committable
for 9.2. I'd really like to get this fixed in this cycle, since it's
been a problem for several releases now.
Comments?
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
parameterized-paths-1.patch | text/x-patch | 178.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-01-17 05:33:22 | Re: Should we add crc32 in libpgport? |
Previous Message | Robert Haas | 2012-01-17 03:48:54 | Re: Why is CF 2011-11 still listed as "In Progress"? |