Quick Links

Re: How to share the result data of separated plan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: How to share the result data of separated plan
Date:	2010-11-08 17:26:22
Message-ID:	26443.1289237182@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
>> My opinion is still the same as here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>>
>> namely, that all we should be worrying about is a tuplestore full of
>> RETURNING tuples. Any other side-effects of a DML subquery should
>> *not* be visible to the calling query, and therefore all this argument
>> about snapshots and seqscan limits is beside the point.

> What happened to:
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?

On the whole, I think that's overspecifying the behavior. Compare what
happens if you do the same thing in one query now; that is, you have
say an UPDATE query and within that you invoke some function that looks
directly at the target table. Is it going to see a consistent view of
the data? No, it's going to see a partially updated table.

The argument I'm making at the moment is that a query containing DML
CTEs should all execute with the same snapshot, meaning that no part of
the query should see the effects of any other part. If, within the
query, you execute volatile functions that look at the target tables,
they will see an unspecified subset of the updates as having been
applied already. This is the same as what happens with a volatile
function in an updating query now, and we've not heard many complaints
about it.

The alternative is to artificially serialize the DML CTEs, which
while it does have some advantages doesn't seem like a win overall.
In particular, defining that as the behavior will greatly constrain
our abilities to optimize anything in future. I think that a typical
use-case will be something like

WITH u AS (UPDATE data_table SET ... RETURNING ...)
INSERT INTO log_table SELECT * FROM u;

If that UPDATE touches a lot of rows, users will be unhappy if the
RETURNING data builds up in memory rather than getting streamed directly
to log_table. (I'm not saying this has to work that way on day zero,
but I am thinking we'd better be able to do it eventually.)

regards, tom lane

In response to

Re: How to share the result data of separated plan at 2010-11-08 17:03:10 from Marko Tiikkaja

Responses

Re: How to share the result data of separated plan at 2010-11-08 17:51:58 from Marko Tiikkaja
Re: How to share the result data of separated plan at 2010-11-08 17:55:49 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2010-11-08 17:32:16	Re: How to share the result data of separated plan
Previous Message	Hitoshi Harada	2010-11-08 17:16:33	Re: How to share the result data of separated plan