From: | Eildert Groeneveld <eildert(dot)groeneveld(at)fli(dot)bund(dot)de> |
---|---|
To: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: fast read of binary data |
Date: | 2012-11-22 07:54:04 |
Message-ID: | 1353570844.30293.8.camel@eno.tzv.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mo, 2012-11-12 at 12:18 +0100, Albe Laurenz wrote:
> Eildert Groeneveld wrote:
> > I am currently implementing using a compressed binary storage scheme
> > genotyping data. These are basically vectors of binary data which may be
> > megabytes in size.
> >
> > Our current implementation uses the data type bit varying.
> >
> > What we want to do is very simple: we want to retrieve such records from
> > the database and transfer it unaltered to the client which will do
> > something (uncompressing) with it. As massive amounts of data are to be
> > moved, speed is of great importance, precluding any to and fro
> > conversions.
> >
> > Our current implementation uses Perl DBI; we can retrieve the data ok,
> > but apparently there is some converting going on.
> >
> > Further, we would like to use ODBC from Fortran90 (wrapping the
> > C-library) for such transfers. However, all sorts funny things happen
> > here which look like conversion issues.
> >
> > In old fashioned network database some decade ago (in pre SQL times)
> > this was no problem. Maybe there is someone here who knows the PG
> > internals sufficiently well to give advice on how big blocks of memory
> > (i.e. bit varying records) can between transferred UNALTERED between
> > backend and clients.
>
> Using the C API you can specify binary mode for your data, which
> meand that they won't be converted.
>
> I don't think you will be able to use this with DBI or ODBC,
> but maybe binary corsors can help
> (http://www.postgresql.org/docs/current/static/sql-declare.html)
> but I don't know if DBI or ODBC handles them well.
>
> If you can avoid DBI or ODBC, that would be best.
ok, I did have a look at the libpq librar, and you are right, there is a
way to obtain binary data from the backend through the PQexecParams
res = PQexecParams(conn,
"DECLARE myportal CURSOR FOR select genotype_bits
from v_genotype_data",
0, /* zero param */
NULL, /* let the backend deduce param type */
paramValues,
NULL, /* don't need param lengths since text*/
NULL, /* default to all text params */
1); /* ask for binary results */
genotype_bits is defined as bit varying in the backend. When writing the
results:
for (i = 0; i < PQntuples(res); i++)
{
for (j = 0; j < nFields; j++)
fwrite(PQgetvalue(res, i, j),100000,1,f);
}
it is clear that the results are NOT in binary format:
eg(eno,snp): od -b junk |head
0000000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060
clearly, these are nice 0 and 1 in ASCII and not as I need it as a bit
stream.
Also, (and in line with this) PQgetvalue(res, i, j) seems to be of type
text.
What am I missing?
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2012-11-22 09:10:57 | Re: Hints (was Poor performance using CTE) |
Previous Message | Jeff Janes | 2012-11-22 00:53:56 | Re: Hints (was Poor performance using CTE) |