Quick Links

Re: Streaming replication status

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 21:02:38
Message-ID:	4B4CE36E.3010603@kaltenbrunner.cc
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2010-01-12 at 15:11 -0500, Bruce Momjian wrote:
>> Stefan Kaltenbrunner wrote:
>>> Simon Riggs wrote:
>>>> On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
>>>>> Fujii Masao wrote:
>>>>>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>>>>>> I don't think anybody can deploy this feature without at least some very
>>>>>>> basic monitoring here. I like the basic proposal you made back in September
>>>>>>> for adding a pg_standbys_xlog_location to replace what you have to get from
>>>>>>> ps right now:
>>>>>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>>>>>>
>>>>>>> That's basic, but enough that people could get by for a V1.
>>>>>> Yeah, I have no objection to add such simple capability which monitors
>>>>>> the lag into the first release. But I guess that, in addition to that,
>>>>>> Simon wanted the capability to collect the statistical information about
>>>>>> replication activity (e.g., a transfer time, a write time, replay time).
>>>>>> So I'd like to postpone it.
>>>>> yeah getting that would all be nice and handy but we have to remember
>>>>> that this is really our first cut at integrated replication. Being able
>>>>> to monitor lag is what is needed as a minimum, more advanced stuff can
>>>>> and will emerge once we get some actual feedback from the field.
>>>> Though there won't be any feedback from the field because there won't be
>>>> any numbers to discuss. Just "it appears to be working". Then we will go
>>>> into production and the problems will begin to be reported. We will be
>>>> able to do nothing to resolve them because we won't know how many people
>>>> are affected.
>>> field is also production usage in my pov, and I'm not sure how we would
>>> know how many people are affected by some imaginary issue just because
>>> there is a column that has some numbers in it.
>>> All of the large features we added in the past got finetuned and
>>> improved in the following releases, and I expect SR to be one of them
>>> that will see a lot of improvement in 8.5+n.
>>> Adding detailed monitoring of some random stuff (I don't think there was
>>> a clear proposal of what kind of stuff you would like to see) while we
>>> don't really know what the performance characteristics are might easily
>>> lead to us provding a ton of data and nothing relevant :(
>>> What I really think we should do for this first cut is to make it as
>>> foolproof and easy to set up as possible and add the minimum required
>>> monitoring knobs but not going overboard with doing too many stats.
>> I totally agree. If SR isn't going to be useful without being
>> feature-complete, we might as well just drop it for 8.5 right now.
>>
>> Let's get a reasonable feature set implemented and then come back in 8.6
>> to improve it. For example, there is no need for a special
>> 'replication' user (just use super-user), and monitoring should be
>> minimal until we have field experience of exactly what monitoring we
>> need.
>>
>> The final commit-fest is in 5 days --- this is not the time for design
>> discussion and feature additions. If we wait for SR to be feature
>> complete, with design discussions, etc, we will hopelessly delay 8.5 and
>> people will get frustrated. I am not saying we can't talk about design,
>> but none of this should be a requirement for 8.5.
>
> We can't add monitoring until we know what the performance
> characteristics are. Hmmm. And how will we know what the performance
> characteristics are, I wonder?

well I would say we do exactly how we have done in the past with other
features - by debugging the stuff with low level tools until we fully
understand what it really is and then we can always add more
"accessible" stats.

Stefan

In response to

Re: Streaming replication status at 2010-01-12 20:49:01 from Simon Riggs

Responses

Re: Streaming replication status at 2010-01-12 21:35:05 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2010-01-12 21:19:17	Re: Hot Standy introduced problem with query cancel behavior
Previous Message	Simon Riggs	2010-01-12 20:49:01	Re: Streaming replication status