From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: EvalPlanQual seems a tad broken |
Date: | 2009-10-23 13:41:33 |
Message-ID: | 9679.1256305293@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> On further review it seems that a better way to do this is to make
> things happen inside the EPQ machinery. We need to "freeze" the rows
> returned by *all* scan nodes, not only the ones referencing real tables
> --- for example, a join against a VALUES scan node would still be a
> problem if we don't lock down the VALUES output, since we could end up
> getting multiple join rows out. This means we can't assume that there
> is a CTID associated with every scan node that EPQ needs to lock down.
> What looks like it would work instead is to pass through the current
> scan tuple for every scan plan node, not only the ones that are FOR
> UPDATE targets. I'm tempted to try to move the responsibility for this
> into execScan.c instead of having all the individual scan plan types
> know about it.
What I had been thinking of when I wrote that was to pass down the
ScanTupleSlots from the outer query's scan nodes. That codes up
nicely but doesn't work at all :-(. As is obvious in hindsight,
the scan nodes are not necessarily still returning the same tuples
that contribute to the current join tuple --- for instance if you
have a sort-and-mergejoin type of plan, all the scans will be at
EOF by the time the top level sees any tuples.
So we need to be able to recover the original scan tuples from the
join tuple, even for scans that are not to be locked. For real
tables this isn't hard, we can pass up the CTID as a junk column
the same as we do for tables that are to be locked. It's harder
for non-table scans though (VALUES, functions, etc). I can see
two conceivably workable alternatives:
1. Pass up the entire row value as a junk whole-row Var.
2. Invent some equivalent to CTID that would allow the row to be
recovered later --- for instance, the row number in a tuplestore.
One problem with this is that not all those scan types use a tuplestore
now, so we'd be adding significant overhead. Also, I'm not entirely
sure that it can work for scan nodes that get reset and rescanned
repeatedly. Some of them clear and refill the tuplestore when they
do that, and the refill isn't necessarily the same row values.
Fortunately, this case probably doesn't arise that much in practice,
so while it needs to work I doubt that performance is critical.
I'm planning to try alternative #1 next, but I wonder if anyone
has a better idea?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-10-23 13:52:34 | Re: plpgsql EXECUTE will not set FOUND |
Previous Message | Cédric Villemain | 2009-10-23 13:23:01 | Re: per table random-page-cost? |