Re: Re: COPY BINARY file format proposal

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: COPY BINARY file format proposal
Date: 2000-12-09 03:58:15
Message-ID: 10336.976334295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> More a matter of not thinking it was important enough to worry about, and
> not really wanting to drag the MD5/MD4/CRC64/etc debate into this one.

I'd just as soon not drag that debate in here either ;-) ... but once we
settle on an appropriate CRC method for WAL it's easy enough to call the
same routine for this code.

> Sounds good to me. I'm not sure you need it on a per-tuple basis - but it
> can't hurt, assuming it's cheap to generate. Does the backend send tuples
> or blocks of tuples? If the latter, and if CRC is expensive, then maybe 1
> CRC for each group of tuples.

Extending the CRC over multiple tuples would just complicate life,
I think. The per-byte cost is the biggest factor, so you don't really
save all that much.

>> Next 4 bytes: length of remainder of header, not including self. In
>> the initial version this will be zero, and the first tuple follows
>> immediately. Future changes to the format might allow additional data
>> to be present in the header. A reader should silently ignore any header
>> extension data it does not know what to do with.

> Don't you need to at least define how to specify non-essential chunks,
> since the flags are not to be used to describe the header extensions. Or
> are we going to make the initial version barf when it encounters any header
> extension?

No, the initial version will just silently skip the whole header
extension; it's defined so that that's a legal behavior (everything
in the header extension is inessential). We can come back and define
a format for the entries in the header extension area when we need some.

> Another option would be to:
> - dump the field sizes in the header somewhere (they will all be the same),
> - for each row output a bitmap of non-null fields, followed by the data.
> - varlena would have a -1 length in the header, an an int32 length in the row.

That would work if you are willing to assume that all the tuples indeed
always have the same set of fields --- you're not, for example, doing an
inheritance-tree-walk "COPY FROM foo*". But Chris Bitmead still has a
gleam in his eye about that sort of thing, so we might want it someday.
I think it's worth a small amount of extra space to avoid that
assumption, especially since it simplifies the code too.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2000-12-09 04:05:21 Re: Re: A mb problem in PostgreSQL
Previous Message Philip Warner 2000-12-09 03:40:19 Re: Re: COPY BINARY file format proposal