RE: pg_dump out of memory for large table with LOB

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jean-Marc Lessard" <Jean-Marc(dot)Lessard(at)ultra-ft(dot)com>
Cc: "Adrien Nayrat" <adrien(dot)nayrat(at)anayrat(dot)info>,"pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: pg_dump out of memory for large table with LOB
Date: 2018-11-15 14:16:45
Message-ID: 65acf49d-5f76-45f1-a0bc-9dad953b902f@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jean-Marc Lessard wrote:

> Another area where LOB hurts is the storage. LOB are broken and stored in 2K
> pieces.
> Due to the block header, only three 2k pieces fit in an 8k block wasting 25%
> of space (in fact pgstattuple reports ~ 20%).

Yes. bytea stored as TOAST is sliced into pieces of 2000 bytes, versus
2048 bytes for large objects. And that makes a significant difference
when packing these slices because 2000*4+page overhead+
4*(row overhead) is just under the default size of 8192 bytes per page,
whereas 2048*4+(page overhead)+4*(row overhead)
is obviously a bit over 8192, since 2048*4=8192.

If the data is compressible, the difference may be less obvious because
the slices in pg_largeobject are compressed individually
(as opposed to bytea that gets compressed as a whole),
so more than 3 slices can fit in a page inside pg_largeobject
The post-compression size can be known with pg_column_size(),
versus octet_length() that gives the pre-compression size.

> Would you recommend bytea over LOB considering that the max LOB size is well
> bellow 1GB?
> Are bytea preferable in terms of support by the community, performance,
> feature, etc?

For the storage and pg_dump issues, bytea seems clearly preferable
in your case.
As for the performance aspect, large objects are excellent because their
API never requires a binary<->text conversion.
This may be different with bytea. The C API provided by libpq allows to
retrieve and send bytea in binary format, for instance through
PQexecParams(), but most drivers implemented on top of libpq use only
the text representation for all datatypes, because it's simpler for them.
So you may want to check the difference in sending and retrieving
your biggest binary objects with your particular app/language/framework
stored in a bytea column versus large objects.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message dclark 2018-11-15 14:24:56 Re: Java UnsatisfiedLinkError exception when connecting to Postgresql database
Previous Message Ian Lawrence Barwick 2018-11-15 12:05:37 Re: VM Instance to Google Cloud SQL Migration