From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: logical changeset generation v6.8 |
Date: | 2013-12-16 05:53:10 |
Message-ID: | CA+TgmobBmuAAgy8JV=veCtDK9Nj-5Bjo_4VF+=TtfpXS+JfKgQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 11, 2013 at 7:08 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> I don't think there's much point in including "remapping" in all of
>> the error messages. It adds burden for translators and users won't
>> know what a remapping file is anyway.
>
> It helps in locating wich part of the code caused a problem. I utterly
> hate to get reports with error messages that I can't correlate to a
> sourcefile. Yes, I know that verbose error output exists, but usually
> you don't get it that way... That said, I'll try to make the messages
> simpler.
Well, we could adopt a policy of not making messages originating from
different locations in the code the same. However, it looks to me
like that's NOT the current policy, because some care has been taken
to reuse messages rather than distinguish them. There's no hard and
fast rule here, because some cases are distinguished, but my gut
feeling is that all of the errors your patch introduces are
sufficiently obscure cases that separate messages with separate
translations are not warranted. The time when this is likely to fail
is when someone borks the permissions on the data directory, and the
user probably won't need to care exactly which file we can't write.
>> + if (!TransactionIdIsNormal(xmax))
>> + {
>> + /*
>> + * no xmax is set, can't have any permanent ones, so
>> this check is
>> + * sufficient
>> + */
>> + }
>> + else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple->t_data->t_infomask))
>> + {
>> + /* only locked, we don't care */
>> + }
>> + else if (!TransactionIdPrecedes(xmax, cutoff))
>> + {
>> + /* tuple has been deleted recently, log */
>> + do_log_xmax = true;
>> + }
>>
>> Obfuscated. Rewrite without empty blocks.
>
> I don't understand why having an empty block is less clear than
> including a condition in several branches? Especially if the individual
> conditions might need to be documented?
It's not a coding style we typically use, to my knowledge. Of course,
what is actually clearest is a matter of opinion, but my own
experience is that such blocks typically end up seeming clearer to me
when the individual comments are joined together into a paragraph that
explains the logic in full sentences what's going on.
>> + /*
>> + * Write out buffer everytime we've too many in-memory entries.
>> + */
>> + if (state->rs_num_rewrite_mappings >= 1000 /* arbitrary number */)
>> + logical_heap_rewrite_flush_mappings(state);
>>
>> What happens if the number of items per hash table entry is small but
>> the number of entries is large?
>
> rs_num_rewrite_mappings is the overall number of in-memory mappings, not
> the number of per-entry mappings. That means we flush, even if all
> entries have only a couple of mappings, as soon as 1000 in memory
> entries have been collected. A bit simplistic, but seems okay enough?
Possibly, not sure yet. I need to examine that logic in more detail,
but I had trouble following it and was hoping the next version would
be better-commented.
>> I'm still unhappy that we're introducing logical decoding slots but no
>> analogue for physical replication. If we had the latter, would it
>> share meaningful amounts of structure with this?
>
> Yes, I think we could mostly reuse it, we'd probably want to add a field
> or two more (application_name, sync_prio?). I have been wondering
> whether some of the code in replication/logical/logical.c shouldn't be
> in replication/slot.c or similar. So far I've opted for leaving it in
> its current place since it would have to change a bit for a more general
> role.
I strongly favor moving the slot-related code to someplace with "slot"
in the name, and replication/slot.c seems about right. Even if we
don't extend them to cover non-logical replication in this release,
we'll probably do it eventually, and it'd be better if that didn't
require moving large amounts of code between files.
> I still think that the technical parts of properly supporting persistent
> slots for physical rep aren't that hard, but that the behavioural
> decisions are. I think there are primarily two things for SR that we
> want to prevent using slots:
> a) removal of still used WAL (i.e. maintain knowledge about a standby's
> last required REDO location)
Check.
> b) make hot_standby_feedback work across disconnections of the walsender
> connection (i.e peg xmin, not just for catalogs though)
Check; might need to be optional.
> c) Make sure we can transport those across cascading
> replication.
Not sure I follow.
> once those are there it's also useful to keep a bit more information
> about the state of replicas:
> * write, flush, apply lsn
Check.
> The hard questions that I see are like:
> * How do we manage standby registration? Does the admin have to do that
> manually? Or does a standby register itself automatically if some config
> paramter is set?
> * If automatically, how do we deal with the situation that registrant
> dies before noting his own identifier somewhere persistent? My best idea
> is a two phase registration process where registration in phase 1 are
> thrown away after a restart, but yuck.
If you don't know the answers to these questions for the kind of
replication that we have now, then how do you know the answers for
logical replication? Conversely, what makes the answers that you've
selected for logical replication unsuitable for our existing
replication?
I have to admit that before I saw your design for the logical
replication slots, the problem of making this work for our existing
replication stuff seemed almost intractable to me; I had no idea how
that was going to work. GUCs didn't seem suitable because I thought we
might need some data that is tabular in nature - i.e. configuration
specific to each slot. And GUC has problems with that sort of thing.
And a new system catalog didn't seem good either, because you are
highly likely to want different configuration on the standby vs. on
the master. But after I saw your design for the logical slots I said,
dude, get me some of that. Keeping the data in shared memory,
persisting them across shutdowns, and managing them via either
function calls or the replication command language seems perfect.
Now, in terms of how registration works, whether for physical
replication or logical, it seems to me that the DBA will have to be
responsible for telling each client the name of the slot to which that
client should connect (omitting it at said DBA's option if, as for
physical replication, the slot is not mandatory). Assuming such a
design, clients could elect for one of two possible behaviors when the
anticipated slot doesn't exist: either they error out, or they create
it. Either is reasonable; we could even support both. A third
alternative is to make each client responsible for generating a name,
but I think that's probably not as good. If we went that route, the
name would presumably be some kind of random string, which will
probably be a lot less usable than a DBA-assigned name. The client
would first generate it, second save it somewhere persistent (like a
file in the data directory), and third create a slot by that name. If
the client crashes before creating the slot, it will find the name in
its persistent store after restart and, upon discovering that no such
slot exists, try again to create it.
But note that no matter which of those three options we pick, the
server support really need not work any differently. I can imagine
any of them being useful and I don't care all that much which one we
end up with. My personal preference is probably for manual
registration: if the DBA wants to use slots, said DBA will need to set
them up. This also mitigates the issue you raise in your next point:
what is manually created will naturally also need to be manually
destroyed.
> * How do we deal with the usability wart that people *will* forget to
> delete a slot when removing a node?
Aside from the point mentioned in the previous paragraph, we don't.
The fact that a standby which gets too far behind can't recover is a
usability wart, too. I think this is a bit like asking what happens
if a user keeps inserting data of only ephemeral value into a table
and never deletes any of it as they ought to have done. Well, they'll
be wasting disk space; potentially, they will fill the disk. Sucks to
be them. The solution is not to prohibit inserting data.
> * What requirements do we have for transporting knowlede about this
> across a failover?
>
> I have little hope that we can resolve all that for 9.4.
I wouldn't consider this a requirement for a useful feature, and if it
is a requirement, then how are you solving it for logical replication?
For physical replication, there's basically only one event that needs
to be considered: master dead, promote standby. That scenario can
also happen in logical replication... but there's a whole host of
other things that can happen, too. An individual replication solution
may be single-writer but may, like Slony, allow that write authority
to be moved around. Or it may have multiple writers.
Suppose A, B, and C replicate between themselves using mult-master
replication for write availability across geographies. Within each
geo, A physically replicates to A', B to B', and C to C'. There is
also a reporting server D which replicates selected tables from each
of A, B, and C. On a particularly bad day, D is switched to replicate
a particular table from B rather than A while at the same time B is
failed over to B' in the midst of a network outage that renders B and
B' only intermittently network-accessible, leading to a high rate of
multi-master conflicts with C that require manual resolution, which
doesn't happen until the following week. In the meantime, B' is
failed back to B and C is temporarily removed from the multi-master
cluster and allowed to run standalone. If you can develop a
replication solution that leaves D with the correct data at the end of
all of that, my hat is off to you ... and the problem of getting all
this to work when only physical replication is in use and then number
of possible scenarios is much less ought to seem simple by comparison.
But whether it does or not, I don't see why we have to solve it in
this release. Following standby promotions, etc. is a whole feature,
or several, unto itself.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2013-12-16 06:47:20 | Re: Extension Templates S03E11 |
Previous Message | Shigeru Hanada | 2013-12-16 05:15:54 | Re: Custom Scan APIs (Re: Custom Plan node) |