Re: Some questions about mammoth replication

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Alexey Klyukin <alexk(at)commandprompt(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some questions about mammoth replication
Date: 2007-10-12 10:47:44
Message-ID: 1192186064.16408.7.camel@hannu-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, R, 2007-10-12 kell 12:39, kirjutas Alexey Klyukin:
> Hannu Krosing wrote:
>
> > > We have hooks in executor calling our own collecting functions, so we
> > > don't need the trigger machinery to launch replication.
> >
> > But where do you store the collected info - in your own replication_log
> > table, or do reuse data in WAL you extract it on master befor
> > replication to slave (or on slave after moving the WAL) ?
>
> We don't use either a log table in database or WAL. The data to
> replicate is stored in disk files, one per transaction.

Clever :)

How well does it scale ? That is, at what transaction rate can your
replication keep up with database ?

> As Joshua said,
> the WAL is used to ensure that only those transactions that are recorded
> as committed in WAL are sent to slaves.

How do you force correct commit order of applying the transactions ?

> >
> > > > Do you make use of snapshot data, to make sure, what parts of WAL log
> > > > are worth migrating to slaves , or do you just apply everything in WAL
> > > > in separate transactions and abort if you find out that original
> > > > transaction aborted ?
> > >
> > > We check if a data transaction is recorded in WAL before sending
> > > it to a slave. For an aborted transaction we just discard all data collected
> > > from that transaction.
> >
> > Do you duplicate postgresql's MVCC code for that, or will this happen
> > automatically via using MVCC itself for collected data ?
>
> Every transaction command that changes data in a replicated relation is
> stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
> when transaction commands are replayed there.

Do you replay several transaction files in the same transaction on
slave ?

Can you replay several transaction files in parallel ?

> > How do you handle really large inserts/updates/deletes, which change say 10M
> > rows in one transaction ?
>
> We produce really large disk files ;). When a transaction commits - a
> special queue lock is acquired and transaction is enqueued to a sending
> queue.
> Since the locking mode for that lock is exclusive a commit of a
> very large transaction would delay commits of other transactions until
> the lock is held. We are working on minimizing the time of holding this
> lock in the new version of Replicator.

Why does it take longer to queue a large file ? dou you copy data from
one file to another ?

> > > > Do you extract / generate full sql DML queries from data in WAL logs, or
> > > > do you apply the changes at some lower level ?
> > >
> > > We replicate the binary data along with a command type. Only the data
> > > necessary to replay the command on a slave are replicated.
> >
> > Do you replay it as SQL insert/update/delete commands, or directly on
> > heap/indexes ?
>
> We replay the commands directly using heap/index functions on a slave.

Does that mean that the table structures will be exactly the same on
both master slave ? That is, do you replicate a physical table image
(maybe not including transaction ids on master) ?

Or you just use lower-level versions on INSERT/UPDATE/DELETE ?

---------------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-10-12 11:00:54 Re: First steps with 8.3 and autovacuum launcher
Previous Message Magnus Hagander 2007-10-12 10:41:17 Re: ECPG regression tests