From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> |
Cc: | Kevin Grittner <kgrittn(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Alfred Perlstein <alfred(at)freebsd(dot)org>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Why we lost Uber as a user |
Date: | 2016-08-17 05:27:18 |
Message-ID: | CAMsr+YFXG_Y8gnhXd2_FLvpqRBLV0LTHYFHcKvfWg8rt_Yv-iA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17 August 2016 at 08:36, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> Something I didn't see mentioned that I think is a critical point: last I
> looked, HOT standby (and presumably SR) replays full page writes.
Yes, that's right, all WAL-based physical replication replays FPWs.
We could, at the cost of increased WAL size, retain both the original WAL
buffer that triggered the FPW and the FPW page image. That's what wal_level
= logical does in some cases. I'm not sure it's that compelling though, it
just introduces another redo path that can go wrong.
> Ultimately, people really need to understand the trade-offs to the
> different solutions so they can make an informed decision on which ones
> (yes, plural) they want to use. The same can be said about pg_upgrade vs
> something else, and the different ways of doing backups.
>
Right.
It's really bugging me that people are talking about "statement based"
replication in MySQL as if it's just sending SQL text around. MySQL's
statemnet based replication is a lot smarter than that, and in the
actually-works-properly form it's a hybrid of row and statement based
replication ("MIXED" mode). As I understand it it lobs around something
closer to parsetrees with some values pre-computed rather than SQL text
where possible. It stores some computed values of volatile functions in the
binlog and reads them from there rather than computing them again when
running the statement on replicas, which is why AUTO_INCREMENT etc works.
It also falls back to row based replication where necessary for
correctness. Even then it has a significant list of caveats, but it's
pretty damn impressive. I didn't realise how clever the hybrid system was
until recently.
I can see it being desirable to do something like that eventually as an
optimisation to logical decoding based replication. Where we can show that
the statement is safe or make it safe by doing things like evaluating and
substituting volatile function calls, xlog a modified parsetree with oids
changed to qualified object names etc, send that when decoding, and execute
that on the downstream(s). If there's something we can't show to be safe
then replay the logical rows instead. That's way down the track though; I
think it's more important to focus on completing logical row-based
replication to the point where we handle table rewrites seamlessly and it
"just works" first.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2016-08-17 05:33:36 | Re: Declarative partitioning - another take |
Previous Message | Craig Ringer | 2016-08-17 05:16:24 | Re: [GENERAL] C++ port of Postgres |