Re: reading large BYTEA type is slower than expected

From: "Mark Harris" <mharris(at)esri(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: reading large BYTEA type is slower than expected
Date: 2007-05-18 18:51:58
Message-ID: D7BFFE348C53EF4E8AA0698B1E395FA9085ABE79@flybywire.esri.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

No it is not 3 seconds to read a single value. Multiple records are
read, approximately 120 records if the raster dataset is created with
our application's default configuration.

Please read on to understand why, if you need to.

We are a GIS software company and have two basic types of data, raster
and vector. Both types of data are stored in a BYTEA.

Vector data are representations of geometry stored as a series of
vertices to represent points, lines and polygons. This type of data is
typically 30 to 200 bytes, but can be very large (consider how many
vertices would be required to represent the Pacific Ocean at a detailed
resolution). Vector data does not seem to exhibit the cold fetch issue
(fetch from disk as opposed to fetch from buffer cache).

It is with raster data that we see the problem. Raster data is image
data stored in the database. When we store a georeferenced image in the
database we block it up into tiles. The default tile size is 128 by 128
pixels.
We compress the data using either: LZ77, JPEG or JPEG2000. Typically the
pixel blocks stored in the BYTEA range in size from 800 bytes to 16000
bytes for 8-bit data stored with the default tile size, depending on the
type of compression and the variability of the data.

Our application is capable of mosaicking source images together into
huge raster datasets that can grow into terabytes. Consider the entire
landsat imagery with a resolution of 15 meters mosaicked into one raster
dataset. It requires less than a terabyte to store that data.

For practical reasons, as you can imagine, we construct a reduced
resolution pyramid on the raster base level, allowing applications to
view a reduced resolution level of the raster dataset as the user zooms
out, and a higher resolution level as the user zooms in. The pyramid
levels are also stored as pixel blocks in the table. Each pyramid level
is reduced in resolution by 1/2 in the X and Y dimension. Therefore
pyramid level 1 will be 1/4 of pyramid level 0 (the base).

As the application queries the raster blocks table which stores the
raster data tiles, it will request a raster tiles that fall within the
spatial extent of the window for a particular pyramid level. Therefore
the number of records queried from the raster blocks table containing
the BYTEA column of pixel data is fairly constant. For the screen
resolution of 1680 by 1050 that I am testing with about 120 records will
be fetched from the raster blocks table each time the user pans or
zooms.

Mark

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Friday, May 18, 2007 10:48 AM
To: Mark Harris
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] reading large BYTEA type is slower than expected

"Mark Harris" <mharris(at)esri(dot)com> writes:
> We have recently ported our application to the postgres database. For
> the most part performance has not been an issue; however there is one
> situation that is a problem and that is the initial read of rows
> containing BYTEA values that have an average size of 2 kilobytes or
> greater. For BYTEA values postgres requires as much 3 seconds to read
> the values from disk into its buffer cache.

How large is "large"?

(No, I don't believe it takes 3 sec to fetch a single 2Kb value.)

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Brian Hurt 2007-05-18 18:53:42 Re: 121+ million record table perf problems
Previous Message Joshua D. Drake 2007-05-18 18:51:04 Re: 121+ million record table perf problems