From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Florian Weimer <fweimer(at)redhat(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Claudio Freire <klaussfreire(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: RFC: Async query processing |
Date: | 2014-01-03 16:31:39 |
Message-ID: | CAHyXU0wWfGcXx8T8ZquEZDe4XqohQBiskysPOz=AtzsV8gXvVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 3, 2014 at 9:46 AM, Florian Weimer <fweimer(at)redhat(dot)com> wrote:
> On 01/03/2014 04:20 PM, Tom Lane wrote:
>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next. Which
>> eliminates a whole lot of apps. I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading. It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
>
> The data I encounter has to be distributed across multiple tables. Switching
> between the COPY TO commands would again need client-side buffering and
> heuristics for sizing these buffers. Lengths of runs vary a lot in my case.
>
> I also want to use binary mode as a far as possible to avoid the integer
> conversion overhead, but some columns use custom enum types and are better
> transferred in text mode.
>
> Some INSERTs happen via stored procedures, to implement de-duplication.
>
> These issues could be addressed by using temporary staging tables. However,
> when I did that in the past, this caused pg_shdepend bloat. Carefully
> reusing them when possible might avoid that. Again, due to the variance in
> lengths of runs, the staging tables are not always beneficial.
>
> I understand that pipelining introduces complexity. But solving the issues
> described above is no picnic, either.
Maybe consider using libpqtypes (http://libpqtypes.esilo.com/)? It
transfers most everything in binary (enums notably are handled as
strings). A typical usage of libpqtypes would be to arrange multiple
records into an array on the client then hand them off to a stored
procedure on the server side (perhaps over an asynchronous call while
you assemble the next batch). libpqtypes was written for C
applications with very high performance requirements (for non
performance critical cases we might use json instead). In my
experience it's not too difficult to arrange an assembly/push loop
that amortizes the round trip overhead to zero; it's not as efficient
as COPY but much more flexible and will blow away any scheme that
sends data row per query.
I agree with Tom that major changes to the libpq network stack is
probably not a good idea.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2014-01-03 17:06:11 | Re: RFC: Async query processing |
Previous Message | Alvaro Herrera | 2014-01-03 16:24:27 | Re: REINDEX CONCURRENTLY 2.0 |