From: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility |
Date: | 2012-10-23 10:41:06 |
Message-ID: | 50867442.7070309@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/23/2012 01:31 AM, Greg Stark wrote:
> On Wed, Oct 17, 2012 at 7:48 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
>> Well, replication is arguably a relevant case.
>>
>> For Slony, the origin/master node never cares about logged changes - that
>> data is only processed on replicas. Now, that's certainly a little weaselly
>> - the log data (sl_log_*) has got to get read to get to the replica.
> Well this is a clever way for Slony to use existing infrastructure to
> get data into the WAL. But wouldn't it be more logical for an in-core
> system to just annotate the existing records with enough information
> to replay them logically?
The QUEUE / LOG ONLY TABLES / WRITE ONLY TABLES :) proposal
was _not_ for use in standard replication - it is already covered by
what is being done - but for cases where the data is needed _only_
on the slave/replay side.
One typical case is sending e-mail on some database actions, like
sending a greeting or confirmation mail when creating a new user.
On a busy system you often want to offload the things that can be
done asynchronously to other hosts.
My RFC was for a proposal to skip writing the unneeded info in local
tables and put it _only_ in WAL.
> Instead of synthesizing inserts into an
> imaginary table containing data that can be extracted to retrieve info
> about some other record, just add the info needed to the relevant
> record.
This is more or less how the current system is being designed,
only the "add enough relevant info" part is offloaded to logical
version of WALSender
> The minimum needed for DML afaict is DELETE and UPDATE records need
> the primary key of the record being deleted and updated. It might make
> sense to include the whole tupledesc or at least key parts of it like
> the attlen and atttyp array so that replay can be more robust. But the
> logical place for this data, it seems to me, is *in* the update or
> insert record that already exists. Otherwise managing logical
> standbies will require a whole duplicate set of infrastructure to keep
> track of what has and hasn't been replayed. For instance what if an
> update record is covered by a checkpoint but the logical record falls
> after the checkpoint and the system crashes before writing it out?
>
This complexity (which is really a lot more than you briefley
described here) is the reason the construction of the "update records"
from WAL records was moved back to master side. In original design
it was hoped that it could be done all on slave by keeping an own
time-synced copy of system catalog.
Currently it seems to play out reasonably well, but I'd not completely
rule out some new complexities arising which would force the creation
of (more of the) full logical DML records as part of WAL.
The downside would be performance, which for current case is mostly
inaffected on the write side, but would be affected a lot more if the WAL
volume had to increase significantly to accommodate all needed info for
LogRep
---------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | John Lumby | 2012-10-23 12:44:31 | Re: [PATCH] Prefetch index pages for B-Tree index scans |
Previous Message | Dhruv Ahuja | 2012-10-23 10:39:08 | "pg_ctl promote" exit status |