From: | Decibel! <decibel(at)decibel(dot)org> |
---|---|
To: | Markus Wanner <markus(at)bluegap(dot)ch> |
Cc: | PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Subject: | Re: Postgres-R: tuple serialization |
Date: | 2008-07-22 21:32:36 |
Message-ID: | D697DBAF-D495-4602-A1EF-E013DB804B0F@decibel.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jul 22, 2008, at 3:04 AM, Markus Wanner wrote:
> yesterday, I promised to outline the requirements of Postgres-R for
> tuple serialization, which we have been talking about before. There
> are basically three types of how to serialize tuple changes,
> depending on whether they originate from an INSERT, UPDATE or
> DELETE. For updates and deletes, it saves the old pkey as well as
> the origin (a global transaction id) of the tuple (required for
> consistent serialization on remote nodes). For inserts and updates,
> all added or changed attributes need to be serialized as well.
>
> pkey+origin changes
> INSERT - x
> UPDATE x x
> DELETE x -
>
> Note, that the pkey attributes may never be null, so an isnull bit
> field can be skipped for those attributes. For the insert case, all
> attributes (including primary key attributes) are serialized.
> Updates require an additional bit field (well, I'm using chars ATM)
> to store which attributes have changed. Only those should be
> transferred.
>
> I'm tempted to unify that, so that inserts are serialized as the
> difference against the default vaules or NULL. That would make
> things easier for Postgres-R. However, how about other uses of such
> a fast tuple applicator? Does such a use case exist at all? I mean,
> for parallelizing COPY FROM STDIN, one certainly doesn't want to
> serialize all input tuples into that format before feeding multiple
> helper backends. Instead, I'd recommend letting the helper backends
> do the parsing and therefore parallelize that as well.
>
> For other features, like parallel pg_dump or even parallel query
> execution, this tuple serialization code doesn't help much, IMO. So
> I'm thinking that optimizing it for Postgres-R's internal use is
> the best way to go.
>
> Comments? Opinions?
ISTM that both londiste and Slony would be able to make use of these
improvements as well. A modular replication system should be able to
use a variety of methods for logging data changes and then applying
them on a subscriber, so long as some kind of common transport can be
agreed upon (such as text). So having a change capture and apply
mechanism that isn't dependent on a lot of extra stuff would be
generally useful to any replication mechanism.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828
From | Date | Subject | |
---|---|---|---|
Next Message | Markus Wanner | 2008-07-22 21:35:01 | Re: Transaction-controlled robustness for replication |
Previous Message | Tom Lane | 2008-07-22 21:24:00 | Re: Do we really want to migrate plproxy and citext into PG core distribution? |