Re: Database restore speed

From: Mitch Skinner <mitch(at)egcrc(dot)net>
To: Luke Lonergan <LLonergan(at)greenplum(dot)com>
Cc: sfrost(at)snowman(dot)net, dlang(at)invendra(dot)net, soualline(at)stbernard(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database restore speed
Date: 2005-12-03 23:29:15
Message-ID: 1133652555.4333.41.camel@firebolt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 2005-12-02 at 23:03 -0500, Luke Lonergan wrote:
> And how do we compose the binary data on the client? Do we trust that the client encoding conversion logic is identical to the backend's?

Well, my newbieness is undoubtedly showing already, so I might as well
continue with my line of dumb questions. I did a little mail archive
searching, but had a hard time coming up with unique query terms.

This is a slight digression, but my question about binary format query
results wasn't rhetorical. Do I have to worry about different platforms
when I'm getting binary RowData(s) back from the server? Or when I'm
sending binary bind messages?

Regarding whether or not the client has identical encoding/conversion
logic, how about a fast path that starts out by checking for
compatibility? In addition to a BOM, you could add a "float format
mark" that was an array of things like +0.0, -0.0, min, max, +Inf, -Inf,
NaN, etc.

It looks like XDR specifies byte order for floats and otherwise punts to
IEEE. I have no experience with SQL*Loader, but a quick read of the
docs appears to divide data types into "portable" and "nonportable"
groups, where loading nonportable data types requires extra care.

This may be overkill, but have you looked at HDF5? Only one hit came up
in the mail archives.
http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html
For (e.g.) floats, the format includes metadata that specifies byte
order, padding, normalization, the location of the sign, exponent, and
mantissa, and the size of the exponent and mantissa. The format appears
not to require length information on a per-datum basis. A cursory look
at the data format page gives me the impression that there's a useful
streamable subset. The license of the implementation is BSD-style (no
advertising clause), and it appears to support a large variety of
platforms. Currently, the format spec only mentions ASCII, but since
the library doesn't do any actual string manipulation (just storage and
retrieval, AFAICS) it may be UTF-8 clean.

Mitch

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andreas Pflug 2005-12-03 23:57:07 Re: Faster db architecture for a twisted table.
Previous Message Rodrigo Madera 2005-12-03 23:00:21 Faster db architecture for a twisted table.