Re: logical changeset generation v6.8

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.8
Date: 2013-12-12 00:08:39
Message-ID: 20131212000839.GA22776@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Lots of sensible comments removed, I plan to make changes to
address them.

On 2013-12-10 22:17:44 -0500, Robert Haas wrote:
> - I think this needs SGML documentation, same kind of thing we have
> for background workers, except probably significantly more. A design
> document with ASCII art in a different patch does not constitute
> usable documentation. I think it would fit best under section VII,
> internals, perhaps entitled "Writing a Logical Decoding Plugin".
> Looking at it, I rather wonder if the "Background Worker Processes"
> ought to be there as well, rather than under section V, server
> programming.

Completely agreed it needs that. I'd like the UI decisions in
http://www.postgresql.org/message-id/20131205001520.GA8935@awork2.anarazel.de
resolved before though.

> + logical_xmin =
> + ((volatile LogicalDecodingCtlData*) LogicalDecodingCtl)->xmin;
>
> Ugly.

We really should have a macro for this...

> Message style. I suggest treating a short write as ENOSPC; there is
> precedent elsewhere.
>
> I don't think there's much point in including "remapping" in all of
> the error messages. It adds burden for translators and users won't
> know what a remapping file is anyway.

It helps in locating wich part of the code caused a problem. I utterly
hate to get reports with error messages that I can't correlate to a
sourcefile. Yes, I know that verbose error output exists, but usually
you don't get it that way... That said, I'll try to make the messages
simpler.

> + /* use *GetUpdateXid to correctly deal with multixacts */
> + xmax = HeapTupleHeaderGetUpdateXid(new_tuple->t_data);
>
> I don't feel enlightened by that comment.

Well, it will return the update xid of a tuple locked with NO KEY SHARE
and updated (with NO KEY UPDATE). I.e. 9.3+ foreign key locking stuff.

> + if (!TransactionIdIsNormal(xmax))
> + {
> + /*
> + * no xmax is set, can't have any permanent ones, so
> this check is
> + * sufficient
> + */
> + }
> + else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple->t_data->t_infomask))
> + {
> + /* only locked, we don't care */
> + }
> + else if (!TransactionIdPrecedes(xmax, cutoff))
> + {
> + /* tuple has been deleted recently, log */
> + do_log_xmax = true;
> + }
>
> Obfuscated. Rewrite without empty blocks.

I don't understand why having an empty block is less clear than
including a condition in several branches? Especially if the individual
conditions might need to be documented?

> + /*
> + * Write out buffer everytime we've too many in-memory entries.
> + */
> + if (state->rs_num_rewrite_mappings >= 1000 /* arbitrary number */)
> + logical_heap_rewrite_flush_mappings(state);
>
> What happens if the number of items per hash table entry is small but
> the number of entries is large?

rs_num_rewrite_mappings is the overall number of in-memory mappings, not
the number of per-entry mappings. That means we flush, even if all
entries have only a couple of mappings, as soon as 1000 in memory
entries have been collected. A bit simplistic, but seems okay enough?

> + /* XXX: should we warn about such files? */
>
> Nah.

Ok, will remove that comment from a couple locations then...

> + /* XXX: we could replay the transaction and
> prepare it as well. */
>
> Should we do that?

It would allow a neat feature, namely using 2PC to make sure that a
transaction commit on all the nodes connected using changeset
extraction. But I think that's a feature for later. Its use would have
to be optional anyway.

> All right, I'm giving up for now. These patches need a LOT of work on
> comments and documentation to be committable, even if the underlying
> architecture is basically sound.

> + * clog. If we're doing logical replication we can't do that though, so
> + * hold the lock for a moment longer.
>
> ...because why?

That's something pretty icky imo. But more in the existing HS code than
in this. Without the changes in the locking we can have the situation
that transactions are marked as running in the xl_running_xact record,
but are actually already committed. There's some code for HS that tries
to cope with that situation but I don't trust it very much, and it'd be
more complicated to make it work for logical decoding. I could reproduce
problems for the latter without those changes.

I'll add a comment explaining this.

> I'm still unhappy that we're introducing logical decoding slots but no
> analogue for physical replication. If we had the latter, would it
> share meaningful amounts of structure with this?

Yes, I think we could mostly reuse it, we'd probably want to add a field
or two more (application_name, sync_prio?). I have been wondering
whether some of the code in replication/logical/logical.c shouldn't be
in replication/slot.c or similar. So far I've opted for leaving it in
its current place since it would have to change a bit for a more general
role.

I still think that the technical parts of properly supporting persistent
slots for physical rep aren't that hard, but that the behavioural
decisions are. I think there are primarily two things for SR that we
want to prevent using slots:
a) removal of still used WAL (i.e. maintain knowledge about a standby's
last required REDO location)
b) make hot_standby_feedback work across disconnections of the walsender
connection (i.e peg xmin, not just for catalogs though)
c) Make sure we can transport those across cascading
replication.
once those are there it's also useful to keep a bit more information
about the state of replicas:
* write, flush, apply lsn

The hard questions that I see are like:
* How do we manage standby registration? Does the admin have to do that
manually? Or does a standby register itself automatically if some config
paramter is set?
* If automatically, how do we deal with the situation that registrant
dies before noting his own identifier somewhere persistent? My best idea
is a two phase registration process where registration in phase 1 are
thrown away after a restart, but yuck.
* How do we deal with the usability wart that people *will* forget to
delete a slot when removing a node?
* What requirements do we have for transporting knowlede about this
across a failover?

I have little hope that we can resolve all that for 9.4.

> + * noncompatible way, but those are prevented both on catalog + *
> tables and on user tables declared as additional catalog + * tables.
>
> Really?

Yes, I think so, c.f. ATRewriteTables():
/*
* We don't support rewriting of system catalogs; there are too
* many corner cases and too little benefit. In particular this
* is certainly not going to work for mapped catalogs.
*/
if (IsSystemRelation(OldHeap))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot rewrite system relation \"%s\"",
RelationGetRelationName(OldHeap))));

if (RelationIsUsedAsCatalogTable(OldHeap))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot rewrite table \"%s\" used as a catalog table",
RelationGetRelationName(OldHeap))));

Do you see situations where that's not sufficient? It's not even
dependent on allow_system_table_mods ...

> My eyes are starting to glaze over, so really stopping here.

There's quite a bit to do from that point, so I think you've more than
done your duty... Thanks!

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2013-12-12 00:30:37 Re: Optimize kernel readahead using buffer access strategy
Previous Message desmodemone 2013-12-12 00:06:44 Re: In-Memory Columnar Store