Re: fast read of binary data

From: Eildert Groeneveld <eildert(dot)groeneveld(at)fli(dot)bund(dot)de>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: fast read of binary data
Date: 2012-11-22 07:54:04
Message-ID: 1353570844.30293.8.camel@eno.tzv.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mo, 2012-11-12 at 12:18 +0100, Albe Laurenz wrote:
> Eildert Groeneveld wrote:
> > I am currently implementing using a compressed binary storage scheme
> > genotyping data. These are basically vectors of binary data which may be
> > megabytes in size.
> >
> > Our current implementation uses the data type bit varying.
> >
> > What we want to do is very simple: we want to retrieve such records from
> > the database and transfer it unaltered to the client which will do
> > something (uncompressing) with it. As massive amounts of data are to be
> > moved, speed is of great importance, precluding any to and fro
> > conversions.
> >
> > Our current implementation uses Perl DBI; we can retrieve the data ok,
> > but apparently there is some converting going on.
> >
> > Further, we would like to use ODBC from Fortran90 (wrapping the
> > C-library) for such transfers. However, all sorts funny things happen
> > here which look like conversion issues.
> >
> > In old fashioned network database some decade ago (in pre SQL times)
> > this was no problem. Maybe there is someone here who knows the PG
> > internals sufficiently well to give advice on how big blocks of memory
> > (i.e. bit varying records) can between transferred UNALTERED between
> > backend and clients.
>
> Using the C API you can specify binary mode for your data, which
> meand that they won't be converted.
>
> I don't think you will be able to use this with DBI or ODBC,
> but maybe binary corsors can help
> (http://www.postgresql.org/docs/current/static/sql-declare.html)
> but I don't know if DBI or ODBC handles them well.
>
> If you can avoid DBI or ODBC, that would be best.
ok, I did have a look at the libpq librar, and you are right, there is a
way to obtain binary data from the backend through the PQexecParams

res = PQexecParams(conn,
"DECLARE myportal CURSOR FOR select genotype_bits
from v_genotype_data",
0, /* zero param */
NULL, /* let the backend deduce param type */
paramValues,
NULL, /* don't need param lengths since text*/
NULL, /* default to all text params */
1); /* ask for binary results */

genotype_bits is defined as bit varying in the backend. When writing the
results:
for (i = 0; i < PQntuples(res); i++)
{
for (j = 0; j < nFields; j++)
fwrite(PQgetvalue(res, i, j),100000,1,f);
}

it is clear that the results are NOT in binary format:
eg(eno,snp): od -b junk |head
0000000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060

clearly, these are nice 0 and 1 in ASCII and not as I need it as a bit
stream.

Also, (and in line with this) PQgetvalue(res, i, j) seems to be of type
text.

What am I missing?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Heikki Linnakangas 2012-11-22 09:10:57 Re: Hints (was Poor performance using CTE)
Previous Message Jeff Janes 2012-11-22 00:53:56 Re: Hints (was Poor performance using CTE)