pg_largeobject high overhead

From: Jason Newton <nevion(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: pg_largeobject high overhead
Date: 2014-06-04 07:45:23
Message-ID: CAGou9Mg=9qPYTdh18NDO3LTJtwQN8uRdTwABfkcyMRUt6D_fJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Some scope of setting: I use postgres to manage metadata about field tests
and simulations part of that involves HDF5 files. These hdf5 files are
generated both with field testing and simulations so there's going to be a
modest amount of them - in the 10k region eventually - an older database
I've not yet migrated is 3K after a year. The DB is stored on a 14 drive
zfs raidz2 split over 2 groups of 7 disks of 3TB each along with lots of
field data (mainly images) out of the database and the rest of the inputs
in the database due to complexity of management (versions, variants,
garbage collection).

After encountering some 1GB limits while trying to post 300MB binary
compressed hdf5 files into postgres 9.2 (under transactions) on a 16GB 64
bit linux machine and coming upon the following thread:

http://www.postgresql.org/message-id/CAFj8pRAcfKoiNp2uXeiZOd5kRX29n2ofsoLDh0w6ej7RxKoZyA@mail.gmail.com

I spent some time trying to get things to work as is, raising what limits I
could to no avail. So I decided to upgrade to 9.3 and use large binary
objects rather than making another file store due to a large convenience
of keeping everything in database. I noticed that my 35GB of files has
become 46GB of files, and there are 18522822 enteries in pg_largeobject
where as I only have 257 files ranging from 30MB to 400MB. To reiterate
the data is compressed via several HDF filers, so postgres isn't going to
do any better. It looks like unless there's 30% overhead for using
pg_largeobject which is pretty expensive!

I also came across this which mentions disabling of compression, which
could improve the efficiency:
http://www.postgresql.org/message-id/CAHyXU0w_tNxBFHuWGHKCZF7GkE_jUQaT0f+dNcgTX8yx0+z3ew@mail.gmail.com

So this leads to the following questions:

1. How can I disable TOAST compression or whatever is going on in
pg_largeobject?
2. How in the world is 2KB block size for _large_ binary object a
reasonable size? As far as I can tell, it is introducing a very large
overhead.
3. Should I be changing LOBLKSIZE if this is the main factor of the 30%
overhead? Is this straight forward? Long term, is that going to bite me
back in the behind? I could maintain a package on opensuse's OBS,
incorporating a patch and rebuilding against upstream, but I don't really
have alot of ongoing time to deal with fallout if any.

I'd appreciate any insights,
-Jason

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Kane 2014-06-04 11:59:55 Migrating from 9.2.4 to 9.3.0 with XML DOCTYPE
Previous Message Arup Rakshit 2014-06-04 07:15:14 Re: How can I select rows by comparing an array data type column with multiple values ?