From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000 |
Date: | 2009-09-15 09:47:37 |
Message-ID: | 4AAF62B9.2000003@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas wrote:
> Hi, I'm reviewing this patch for the 2009-09 CommitFest.
Thank you!
> It doesn't seem to compile.
Looks like the patch has bitrotted, sorry about that. Attached is an
updated version. I've also pushed this to my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "heapfetch".
> Actually, before I even tried compiling this, I was looking through
> the joinpath.c changes, since that is an area of the code with which I
> have some familiarity. As I'm sure you're aware, the lack of
> commenting makes it quite difficult to understand what this is trying
> to do, and the functions are poorly named. It isn't self-explanatory
> what "bubbling up" means, even in the limited context of joinpath.c.
Yeah, understood :-(. The bubbling up refers to moving HeapFetch nodes
above a join, which I explained earlier in this thread:
http://archives.postgresql.org/message-id/4A90448F.4060601@enterprisedb.com,
> Leaving that aside, I think that the approach here is likely wrong;
> the decision about when to perform a heap fetch doesn't seem to be
> based on cost, which I think it needs to be.
Right, cost estimation and making choices based on it still missing.
> Consider A IJ B, with
> the scan over A implemented as an index scan. It seems to me that if
> the join selectivity is < 1, then assuming there's a choice, we
> probably want to join A to B and then do the heap fetches against A
> afterwards. But if the join selectivity is > 1 (consider, for
> example, a cross join), we probably want to do the heap fetches first.
Hmm, good point. I didn't consider that join selectivity can be > 1.
A more common scenario is that there's an additional filter condition on
the HeapFetch, with a selectivity < 1. It can then cheaper to perform
the heap fetches first and only join the remaining rows that satisfy the
filter condition.
It could also be cheaper to perform HeapFetch first if there's a lot of
dead tuples in the table, as the heap fetch weeds them out. I'm not sure
if we can estimate that in a meaningful way.
> Am I right to think that the heap fetch node is not optional? i.e.
> even if only columns in the index are ever used, we need to make sure
> that a heap fetch node gets inserted to check visibility. Is that
> right?
Correct. The plan is to eventually use the visibility map to skip the
heap access altogether, but we'll need to make the visibility map 100%
reliable first. We'll still need a HeapFetch node to perform the
visibility check, but it could perform it against the visibility map
instead of the heap.
> I also think that the use of the term index-only quals is misleading.
> It seemed good when you first posted to -hackers about it, but looking
> at the patch, it means we have both index quals and index-only quals,
> and it seems like that might not be the greatest naming. Maybe we
> could call them index-tuple quals or index-evaluated quals or
> something.
Hmm, I'm not too fond of the naming either. Not sure I like those
alternatives any better, though.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
indexam-api-changes-3.patch | text/x-diff | 23.2 KB |
heapfetch-2.patch | text/x-diff | 118.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Meskes | 2009-09-15 10:06:01 | Re: clang's static checker report. |
Previous Message | std pik | 2009-09-15 09:34:29 | hardware information |