From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: How to share the result data of separated plan |
Date: | 2010-11-08 17:26:22 |
Message-ID: | 26443.1289237182@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
>> My opinion is still the same as here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>>
>> namely, that all we should be worrying about is a tuplestore full of
>> RETURNING tuples. Any other side-effects of a DML subquery should
>> *not* be visible to the calling query, and therefore all this argument
>> about snapshots and seqscan limits is beside the point.
> What happened to:
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?
On the whole, I think that's overspecifying the behavior. Compare what
happens if you do the same thing in one query now; that is, you have
say an UPDATE query and within that you invoke some function that looks
directly at the target table. Is it going to see a consistent view of
the data? No, it's going to see a partially updated table.
The argument I'm making at the moment is that a query containing DML
CTEs should all execute with the same snapshot, meaning that no part of
the query should see the effects of any other part. If, within the
query, you execute volatile functions that look at the target tables,
they will see an unspecified subset of the updates as having been
applied already. This is the same as what happens with a volatile
function in an updating query now, and we've not heard many complaints
about it.
The alternative is to artificially serialize the DML CTEs, which
while it does have some advantages doesn't seem like a win overall.
In particular, defining that as the behavior will greatly constrain
our abilities to optimize anything in future. I think that a typical
use-case will be something like
WITH u AS (UPDATE data_table SET ... RETURNING ...)
INSERT INTO log_table SELECT * FROM u;
If that UPDATE touches a lot of rows, users will be unhappy if the
RETURNING data builds up in memory rather than getting streamed directly
to log_table. (I'm not saying this has to work that way on day zero,
but I am thinking we'd better be able to do it eventually.)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-11-08 17:32:16 | Re: How to share the result data of separated plan |
Previous Message | Hitoshi Harada | 2010-11-08 17:16:33 | Re: How to share the result data of separated plan |