From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Manuel Kniep <m(dot)kniep(at)web(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, "fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
Subject: | Re: foreign table batch inserts |
Date: | 2016-05-20 15:18:17 |
Message-ID: | CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20 May 2016 at 15:35, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> You can, however, omit Sync from between messages and send a series of
> protocol messages, like
>
> Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync
>
> to avoid round-trip overheads.
>
>
I implemented what I think is a pretty solid proof of concept of this for
kicks this evening. Attached, including basic test program. Patch attached.
The performance difference over higher latency links is huge, see below.
Demo/test program in src/test/examples/testlibpqbatch.c.
github: https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch
I still need to add the logic for handling an error during a batch by
discarding all input until the next Sync, but otherwise I think it's pretty
reasonable.
The time difference for 10k inserts on the local host over a unix socket
shows a solid improvement:
batch insert elapsed: 0.244293s
sequential insert elapsed: 0.375402s
... but over, say, a connection to a random AWS RDS instance fired up for
the purpose that lives about 320ms away the difference is huge:
batch insert elapsed: 9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a
bit less then an hour based on the latency numbers)
With 500 rows on the remote AWS RDS instance, once the I/O quota is already
saturated:
batch insert elapsed: 1.229024s
sequential insert elapsed: 156.962180s
which is an improvement by a factor of over 120
I didn't compare vs COPY. I'm sure COPY will be faster, but COPY doesn't
let you do INSERT ... ON CONFLICT, do UPDATE, do DELETE, etc. Not without
temp tables and a bunch of hoop jumping anyway. If COPY solved everything
there'd be no point having pipelining.
No docs yet, but if folks think the interface is reasonable I can add them
easily since the comments on each of the new functoins should be easy to
adapt into the SGML docs.
With a bit of polishing I think this can probably go in the next CF, though
I only wrote it as an experiment. Can I get opinions on the API?
The TL;DR API, using the usual async libpq routines, is:
PQbeginBatchMode(conn);
PQsendQueryParams(conn, "BEGIN", 0, NULL, NULL, NULL, NULL, 0);
PQsendPrepare(conn, "my_update", "UPDATE ...");
PQsetnonblocking(conn, 1);
while (!all_responses_received)
{
select(...)
if (can-write)
{
if (app-has-more-data-to-send)
{
PQsendQueryPrepared(conn, "my_update", params-go-here);
}
else if (havent-sent-commit-yet)
{
PQsendQueryParams(conn, "COMMIT", ...);
}
else if (havent-sent-endbatch-yet)
{
PqEndBatch(conn);
}
PQflush(conn);
}
if (can-read)
{
PQconsumeInput(conn);
if (PQisBusy(conn))
continue;
res = PQgetResult(conn);
if (res == NULL)
{
PQgetNextQuery(conn);
continue;
}
/* process results in the same order we sent the commands */
/* client keeps track of that, libpq just supplies the results */
...
}
}
PQendBatch(conn);
Note that:
* PQsendQuery cannot be used as it uses simple query protocol, use
PQsendQueryParams instead;
* Batch supports PQsendQueryParams, PQsendPrepare, PQsendQueryPrepared,
PQsendDescribePrepared, PQsendDescribePortal;
* You don't call PQgetResult after dispatching each query
* Multiple batches may be pipelined, you don't have to wait for one to end
to start another (an advantage over JDBC's API)
* non-blocking mode isn't required, but is strongly advised
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-Draft-of-libpq-async-pipelining-support.patch | text/x-patch | 44.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ildar Musin | 2016-05-20 15:29:00 | Re: Declarative partitioning |
Previous Message | Ronan Dunklau | 2016-05-20 13:41:38 | Possible regression regarding estimating relation width in FDWs |