Re: Tweaking bytea / large object block sizes?

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Hanno Schlichting <hanno(at)hannosch(dot)eu>, pgsql-general(at)postgresql(dot)org
Subject: Re: Tweaking bytea / large object block sizes?
Date: 2011-06-13 06:58:53
Message-ID: 4DF5B52D.9000102@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 13/06/11 09:27, Merlin Moncure wrote:

> want to use the binary protocol mode (especially for postgres versions
> that don't support hex mode)

Allowing myself to get a wee bit sidetracked:

I've been wondering lately why hex was chosen as the new input/output
format when the bytea_output change went in. The Base64 encoding is
trivial to implement, already supported by standard libraries for many
languages and add-ons for the rest, fast to encode/decode, and much more
compact than a hex encoding, so it seems like a more attractive option.
PostgreSQL already supports base64 in explicit 'escape()' calls.

Was concern about input format ambiguity a motivator for avoiding
base64? Checking the archives:

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00238.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00192.php

... it was considered but knocked back because it's enough more complex
to encode that it could matter on big dumps and standards-compliant
base64 appears to require newlines - something that was viewed as ugly
and problematic. Initial input format detection reliability options were
also raised, but as the same solution used for hex input would apply to
base64 input too it doesn't look like that was a big factor.

Personally, even with the newline 'ick factor' I think it'd be pretty
nice to have as an option for dumps and COPY.

Ascii85 (base85) would be another alternative. It's used in PostScript
and PDF, but isn't anywhere near as widespread as base64. It's still
trivial to implement and is 7-8% more space-efficient than base64.

After a bit of digging, though, I can't help wonder if a binary dump
format that's machine-representation independent, fast and compact isn't
more practical. Tools like Thrift (http://thrift.apache.org) Protocol
Buffers, etc might make it less painful. Maybe an interesting GsOC
project? Supporting binary COPY with a machine independent format would
be a natural extension of that, too.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-06-13 06:59:59 Re: Reinstalling
Previous Message Zhidong She 2011-06-13 06:00:23 psql core dump