Quick Links

Re: Tweaking bytea / large object block sizes?

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Hanno Schlichting <hanno(at)hannosch(dot)eu>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Tweaking bytea / large object block sizes?
Date:	2011-06-13 01:27:45
Message-ID:	BANLkTinCLh1hN_Xiq_S9sqOsWzB37fJhVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sun, Jun 12, 2011 at 11:00 AM, Hanno Schlichting <hanno(at)hannosch(dot)eu> wrote:
> Hi.
>
> I'm a new Postgres user. If I happen to ask stupid questions please
> feel free to point me to any documentation I should read or guidelines
> for asking questions.
>
> I'm looking into storing binary data in Postgres and trying to
> understand how data is actually stored in the database. The dataset
> I'm looking at is images, photos, pdf documents which should commonly
> be at a minimum 100kb, on average 10mb and can scale up to 100mb for
> each document. I want to store this data in the database, as I need
> transactional integrity and want to avoid the extra complexity of
> managing shared filesystems between a number of frontend application
> servers and database backends.
>
> The binary data will only be accessed as a whole. So either a complete
> new file is written to the DB or a complete file will be read and
> cached on a frontend server. I don't need streaming access or be able
> to stream partial data. The use-case seems to be well supported by
> Oracle 11g with the introduction of "secure files" (pdf: [1]).
>
> But from what I read of Postgres, my best bet is to store data as
> large objects [2]. Going all the way down this means storing the
> binary data as 2kb chunks and adding table row overhead for each of
> those chunks. Using the bytea type and the toast backend [3] it seems
> to come down to the same: data is actually stored in 2kb chunks for a
> page size of 8kb.
>
> I'm assuming I'll be able to get ~8kb jumbo frame packets over a
> gigabit network connection and would be able to use a ext4 volume with
> a block size of either 32kb or 64kb for the volume housing the binary
> data, but a smaller block size for the one housing the relational
> data.
>
> Given those other constraints, it seems silly to split data up into
> 2kb chunks on the database level. Is there any way the chunk size for
> binary data can be increased here independent of the one for "normal
> relational" data?

I would not even consider tweaking the internal block sizes until
you've determined there is a problem you expect you might solve by
doing so. The single most important factor affecting blob performance
in postgres is how you send and receive the data -- you absolutely
want to use the binary protocol mode (especially for postgres versions
that don't support hex mode). The next thing to look at is using
bytea/large object -- large objects are a bit faster and have a higher
theoretical limit on size but byea is a standard type and this offers
a lot of conveniences -- I'd say stick with bytea unless you've
determined there is a reason not to. That said, if you are not
writing C some client side drivers might only allow binary
transmission through the lo interface so that's something to think
about.

merlin

In response to

Tweaking bytea / large object block sizes? at 2011-06-12 16:00:19 from Hanno Schlichting

Responses

Re: Tweaking bytea / large object block sizes? at 2011-06-13 06:58:53 from Craig Ringer
Re: Tweaking bytea / large object block sizes? at 2011-06-14 11:48:28 from Hanno Schlichting

Browse pgsql-general by date

	From	Date	Subject
Next Message	Rob Sargent	2011-06-13 01:36:07	Re: Can't drop temp table in subfunction during cursor loop (being used by active queries)
Previous Message	Craig Ringer	2011-06-12 23:59:33	Re: Tweaking bytea / large object block sizes?