Re: RETURNING syntax for COPY

From: Ryan Kelly <rpkelly22(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Karol Trzcionka <karlikt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RETURNING syntax for COPY
Date: 2013-05-08 18:16:59
Message-ID: 20130508181659.GA9650@llserver.lakeliving.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 05/08/13, 2013 at 10:55:40AM -0700, David Fetter wrote:
> On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
> > Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > > On 08.05.2013 19:44, Tom Lane wrote:
> > >> No there isn't; what you suggest would require FE/BE protocol
> > >> extensions, making it several orders of magnitude more work than the
> > >> other thing.
> >
> > > I'd imagine that the flow would go something like this:
> >
> > > BE FE
> >
> > > CopyInResponse
> > > CopyData
> > > CopyData
> > > ...
> > > CopyDone
> > > RowDescription
> > > DataRow
> > > DataRow
> > > CommandComplete
> >
> > That would require the backend to buffer the entire query response,
> > which isn't a great idea. I would expect that such an operation would
> > need to interleave CopyData to the backend with DataRow responses. Such
> > a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> > of work (at both ends) for extremely debatable value.
> >
> > The general idea of COPY is to load data as fast as possible,
>
> With utmost respect, that is one of several use cases, and any change
> would need to keep that use case unburdened. A sometimes overlapping
> set of use cases move data in and out of the database in a simple
> manner. In some of these, people might wish to trade some performance
> for the feature.

99% of my uses at work for COPY are as a general data import and export
facility. I often find myself loading CSV files into our database for
analysis and further cleanup, and then use COPY to output queries as CSV
files for consumption by other members of the business.

The recent work for (PRE|POST)PROCESSOR options to COPY is indicative of
the fact that users are not merely using COPY to "load data as fast as
possible".

Other discussions around a COMPRESSED option are more than just a
performance enhancement, in my view, as I oftern receive files
compressed and decompressing the data is just another step standing in
the way of myself importing the data into the database.

Additionally, once I have the data imported, I often take many steps to
cleanup and format the data, prior to applying actual typing to a table
(which invariably fails due to invalid dates, and other nonsense).

COPY ... RETURNING would certainly be useful to apply additional
transformations to the data before finally sending it to its ultimate
destination.

> A particular example would be one where there are several tables to be
> loaded, some with generated columns that the future ones would depend
> on. Yes, it's possible (kinda) to do this with the FDW machinery, but
> the burden is much higher as it requires DDL permission in general
> each time.

I find using the FDW machinery to perform many queries to be much slower
than importing the data once and then running my queries. There is also
no ability to use indexes.

> > so weighing it down with processing options seems like a pretty
> > dubious idea even if the implementation were easy.
>
> Totally agreed that the "fast load/unload" code path must not be
> affected by any such changes.

Agreed here as well.

-Ryan P. Kelly

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Evan D. Hoffman 2013-05-08 18:27:18 pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4
Previous Message Heikki Linnakangas 2013-05-08 18:01:04 Re: Terminology issue: suffix tree