Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Pardon my ignorance, but where exactly is the extra overhead
> coming from? Searching for a predicate lock?
Right. As each tuple is read we need to ensure that there is a
predicate lock to cover it. Since finer-grained locks can be
combined into coarser-grained locks we need to start with the fine
grained and move toward checking the coarser grains, to avoid
missing a lock during promotion. So for each tuple we calculate a
hash, find a partition, lock it, and lookup the tuple as a lock
target. When that's not found we do the same thing for the page.
When that's not found we do the same thing for the relation.
But we acquired a relation lock up front, when we determined that
this would be a heap scan, so we could short-circuit this whole
thing if within the heapgettup_pagemode function we could determine
that this was a scan of the whole relation.
The profiling also showed that it was spending an obscene amount of
time calculating hash values (over 10% of total run time!). I'm
inclined to think that a less sophisticated algorithm (like just
adding oid, page, and tuple offset numbers) would generate very
*real* savings with the down side being a very hypothetical
*possible* cost to longer chains in the HTAB. But that's a separate
issue, best settled on the basis of benchmarks rather than
theoretical discussions.
-Kevin