Re: performance libpq vs JDBC

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Werner Scholtes <Werner(dot)Scholtes(at)heuboe(dot)de>, Divakar Singh <dpsmails(at)yahoo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance libpq vs JDBC
Date: 2010-12-16 15:09:21
Message-ID: AANLkTik-VoMb-8ok3dFTj8nsnAirigAcA9_41BZ2Lbca@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Dec 16, 2010 at 7:14 AM, Richard Huxton <dev(at)archonet(dot)com> wrote:
> On 16/12/10 09:21, Werner Scholtes wrote:
>>
>> I assume that the wire protocol of PostgreSQL allows to transmit
>> multiple rows at once, but libpq doesn't have an interface to access it.
>> Is that right?
>
> Sounds wrong to me. The libpq client is the default reference implementation
> of the protocol. If there were large efficiencies that could be copied, they
> would be.
>
> Anyway - you don't need to assume what's in the protocol. It's documented
> here:
>  http://www.postgresql.org/docs/9.0/static/protocol.html
>
> I'd stick wireshark or some other network analyser on the two sessions - see
> exactly what is different.

There is only one explanation for the difference: they are slamming
data across the wire without waiting for the result. libpq queries
are synchronous: you send a query, wait for the result. This means
for very simple queries like the above you can become network bound.

In C/C++ you can work around this using a couple of different methods.
COPY of course is the fastest, but extremely limiting in what it can
do. We developed libpqtypes (I love talking about libpqtypes) to deal
with this problem. In the attached example, it stacks data into an
array in the client, sends it to the server which unnests and inserts
it. The attached example inserts a million rows in about 11 seconds
on my workstation (client side prepare could knock that down to 8 or
so).

If you need to do something fancy, the we typically create a receiving
function on the server in plpgsql which unnests() the result and makes
decisions, etc. This is extremely powerful and you can compose and
send very rich data to/from postgres in a single query.

merlin

#include "libpq-fe.h"
#include "libpqtypes.h"

#define INS_COUNT 1000000

int main()
{
int i;

PGconn *conn = PQconnectdb("dbname=pg9");
PGresult *res;
if(PQstatus(conn) != CONNECTION_OK)
{
printf("bad connection");
return -1;
}

PQtypesRegister(conn);

PGregisterType type = {"ins_test", NULL, NULL};
PQregisterComposites(conn, &type, 1);

PGparam *p = PQparamCreate(conn);
PGarray arr;
arr.param = PQparamCreate(conn);
arr.ndims = 0;

PGparam *t = PQparamCreate(conn);

for(i=0; i<INS_COUNT; i++)
{
PGint4 a=i;
PGtext b = "some_text";
PGtimestamp c;
PGbytea d;

d.len = 8;
d.data = b;

c.date.isbc = 0;
c.date.year = 2000;
c.date.mon = 0;
c.date.mday = 19;
c.time.hour = 10;
c.time.min = 41;
c.time.sec = 6;
c.time.usec = 0;
c.time.gmtoff = -18000;

PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
PQputf(arr.param, "%ins_test", t);
PQparamReset(t);
}

if(!PQputf(p, "%ins_test[]", &arr))
{
printf("putf failed: %s\n", PQgeterror());
return -1;
}
res = PQparamExec(conn, p, "insert into ins_test select * from
unnest($1) r(a, b, c, d)", 1);

if(!res)
{
printf("got %s\n", PQgeterror());
return -1;
}
PQclear(res);
PQparamClear(p);
PQfinish(conn);
}

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre C 2010-12-16 15:22:40 Re: Help with bulk read performance
Previous Message Krzysztof Nienartowicz 2010-12-16 13:39:11 Re: Help with bulk read performance