Re: Database restore speed

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Stephen Frost" <sfrost(at)snowman(dot)net>, "David Lang" <dlang(at)invendra(dot)net>, "Steve Oualline" <soualline(at)stbernard(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database restore speed
Date: 2005-12-03 19:42:02
Message-ID: BFB7350A.14FC7%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

On 12/2/05 3:00 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Sure it does ... at least as long as you are willing to assume everybody
> uses IEEE floats, and if they don't you have semantic problems
> translating float datums anyhow.
>
> What we lack is documentation, more than functionality.

Cool - sounds like the transport part might be there - the thing we desire
is a file format that allows for efficient representation of portable binary
datums.

Last I looked at the Postgres binary dump format, it was not portable or
efficient enough to suit the need. The efficiency problem with it was that
there was descriptive information attached to each individual data item, as
compared to the approach where that information is specified once for the
data group as a template for input.

Oracle's format allows for the expression of fixed width fields within the
input file, and specifies the data type of the fields in the metadata. We
could choose to support exactly the specification of the SQL*Loader format,
which would certainly be general enough, and would have the advantage of
providing a compatibility option with Oracle SQL*Loader input.

Note that Oracle does not provide a similar functionality for the expression
of *output* files, those that can be dumped from an Oracle database. Their
mechanism for database dump is the exp/imp utility pair, and it is a
proprietary "shifting sands" specification AFAIK. This limits the benefit
of implementing the Oracle SQL*Loader compatibility to those customers who
have designed utilities to emit that format, which may still be valuable.

The alternative is to design a Postgres portable binary input file format.
I'd like to see a record oriented format like that of FORTRAN unformatted,
which uses bookends around each record to identify the length of each
record. This allows for fast record oriented positioning within the file,
and provides some self-description for integrity checking, etc.

- Luke

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-12-03 20:32:20 Re: Database restore speed
Previous Message Jan Wieck 2005-12-03 19:32:21 Re: 15,000 tables - next step