From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | nasbyj(at)amazon(dot)com, surafel3000(at)gmail(dot)com, cmt(at)burggraben(dot)net, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: COPY FROM WHEN condition |
Date: | 2018-11-02 14:21:32 |
Message-ID: | 20181102142132.GJ12677@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 01, 2018 at 10:57:25PM -0400, Corey Huinker wrote:
> >
> > > Are you thinking something like having a COPY command that provides
> > > results in such a way that they could be referenced in a FROM clause
> > > (perhaps a COPY that defines a cursor…)?
> >
> > That would also be nice, but what I was thinking of was that some
> > highly restricted subset of cases of SQL in general could lend
> > themselves to levels of optimization that would be impractical in
> > other contexts.
>
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we don't
> have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
> subtly different from a similar FROM-WHERE.
That's an excellent point.
> Also, what would we be saving computationally? The whole file (or program
> output) has to be consumed no matter what, the columns have to be parsed no
> matter what. At least some of the columns have to be converted to their
> assigned datatypes enough to know whether or not to filter the row, but we
> might be able push that logic inside a copy. I'm thinking of something like
> this:
>
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )
Apologies for bike-shedding, but wouldn't the following be a better
fit with the current COPY?
COPY t(a integer, b numeric, c text, d date, e json) FROM '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)
> WHERE x.d >= '2018-11-01'
>
>
> In this case, there is the *opportunity* to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the data
> type)
That sounds like something that could go inside the WITH extension
I'm proposing above.
[STRICT_TYPE boolean DEFAULT true]?
This might not be something that has to be in version 1.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2018-11-02 14:34:04 | Re: Hash Joins vs. Bloom Filters / take 2 |
Previous Message | Tom Lane | 2018-11-02 14:07:10 | Re: WIP: Avoid creation of the free space map for small tables |