Re: Best way to reduce server rounds getting big BLOBs

From: Jorge Arévalo <jorgearevalo(at)libregis(dot)org>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Eduardo Morrás <emorrasg(at)yahoo(dot)es>
Subject: Re: Best way to reduce server rounds getting big BLOBs
Date: 2013-05-16 11:18:00
Message-ID: B94F2D0562D94FFBA94470CD13FFD3EB@libregis.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

First of all, sorry for the duplicate. I accidentally sent the same message twice, and I got responses in both of them. I include here the person who responded to the other messsage

El miércoles 15 de mayo de 2013 a las 20:43, Merlin Moncure escribió:

> On Wed, May 15, 2013 at 11:31 AM, Jorge Arévalo
> <jorgearevalo(at)libregis(dot)org (mailto:jorgearevalo(at)libregis(dot)org)> wrote:
> > Hello,
> >
> > I'd like to know what's the best way to reduce the number of server rounds in a libpq C app that fetches BLOBs from a remote PostgreSQL server.
> >
> > About 75% of the time my app uses is spent querying database. I basically get binary objects (images). I have to fetch all the images from a table. This table can be really big (in number of rows) and each image can be big too.
>
> #1 thing to make sure of when getting big blobs is that you are
> fetching data in binary. If you are not, do so before changing
> anything else (I wrote a library to help do that, libpqtypes).

Yes, I get data in binary. But thanks for the reference to the library.

>
> > I guess I should go for cursors. If I understood the concept of "cursor", basically the query is executed, a ResultSet is generated inside the database server, and the client receives a "pointer" to this ResultSet. You can get all the rows by moving this pointer over the ResultSet, calling the right functions. But you still have to go to the database for each chunk of data. Am I right?
>
> cursors are a way to page through a query result without fetching all
> the data at once. this would be most useful if you are processing one
> row at a time on the client side. but if the client needs all the
> data held in memory, cursors will only help in terms of reducing the
> temporary memory demands while doing the transfer. So it's hard to
> say if it's worth using them until you describe the client side
> requirements a little better.
>
Ok, more context here. The images stored in the database, are really PostGIS Raster images. So, they are stored in this format [1], and are read in this another one [2], as binary objects (you can get a string representation of the format too, but is slower).

My point is to make the GDAL PostGIS Raster driver [3] faster, and my approach is: "instead of one server round per each user's data request, try to get the whole raster data with the fewer number of queries".

GDAL library raises a read operation for each chunk of data, in case of tiled raster formats. PostGIS Raster is a (special) tiled raster format, but one read operation per data chunk means one query (= one trip to server), and that's expensive. Even the simplest app (transform one raster table into a GeoTIFF file) spends 75% of its time in database queries, because this architecture.

Plus, each data read's request requires an intersection between the area desired and the data stored in database (using ST_Intersects [4]). An intersection in the database is more expensive than in memory.

So, my goal is to save server rounds. My thought was: using a cursor, I still have to go to the database, but my data is stored in a ResultSet, and I just need to seek over it, instead of raising expensive queries. Am I wrong? Is there any better approach?

Eduardo, about your response, many thanks. I think blob data compression doesn't apply here. And out-db storage is an option of the PostGIS Raster format, but it's not mandatory.

[1] http://trac.osgeo.org/postgis/browser/trunk/raster/doc/RFC1-SerializedFormat
[2] http://trac.osgeo.org/postgis/browser/trunk/raster/doc/RFC2-WellKnownBinaryFormat
[3] http://trac.osgeo.org/gdal/wiki/frmts_wtkraster.html
[4] http://postgis.net/docs/manual-2.0/ST_Intersects.html

Best regards,
Jorge

> merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message sumita 2013-05-16 12:31:22 Re: FATAL: database "a/system_data" does not exist
Previous Message David Demelier 2013-05-16 08:56:49 Undefined reference with libpq on Visual Studio 2012