From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Streaming replication status |
Date: | 2010-01-12 21:02:38 |
Message-ID: | 4B4CE36E.3010603@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon Riggs wrote:
> On Tue, 2010-01-12 at 15:11 -0500, Bruce Momjian wrote:
>> Stefan Kaltenbrunner wrote:
>>> Simon Riggs wrote:
>>>> On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
>>>>> Fujii Masao wrote:
>>>>>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>>>>>> I don't think anybody can deploy this feature without at least some very
>>>>>>> basic monitoring here. I like the basic proposal you made back in September
>>>>>>> for adding a pg_standbys_xlog_location to replace what you have to get from
>>>>>>> ps right now:
>>>>>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>>>>>>
>>>>>>> That's basic, but enough that people could get by for a V1.
>>>>>> Yeah, I have no objection to add such simple capability which monitors
>>>>>> the lag into the first release. But I guess that, in addition to that,
>>>>>> Simon wanted the capability to collect the statistical information about
>>>>>> replication activity (e.g., a transfer time, a write time, replay time).
>>>>>> So I'd like to postpone it.
>>>>> yeah getting that would all be nice and handy but we have to remember
>>>>> that this is really our first cut at integrated replication. Being able
>>>>> to monitor lag is what is needed as a minimum, more advanced stuff can
>>>>> and will emerge once we get some actual feedback from the field.
>>>> Though there won't be any feedback from the field because there won't be
>>>> any numbers to discuss. Just "it appears to be working". Then we will go
>>>> into production and the problems will begin to be reported. We will be
>>>> able to do nothing to resolve them because we won't know how many people
>>>> are affected.
>>> field is also production usage in my pov, and I'm not sure how we would
>>> know how many people are affected by some imaginary issue just because
>>> there is a column that has some numbers in it.
>>> All of the large features we added in the past got finetuned and
>>> improved in the following releases, and I expect SR to be one of them
>>> that will see a lot of improvement in 8.5+n.
>>> Adding detailed monitoring of some random stuff (I don't think there was
>>> a clear proposal of what kind of stuff you would like to see) while we
>>> don't really know what the performance characteristics are might easily
>>> lead to us provding a ton of data and nothing relevant :(
>>> What I really think we should do for this first cut is to make it as
>>> foolproof and easy to set up as possible and add the minimum required
>>> monitoring knobs but not going overboard with doing too many stats.
>> I totally agree. If SR isn't going to be useful without being
>> feature-complete, we might as well just drop it for 8.5 right now.
>>
>> Let's get a reasonable feature set implemented and then come back in 8.6
>> to improve it. For example, there is no need for a special
>> 'replication' user (just use super-user), and monitoring should be
>> minimal until we have field experience of exactly what monitoring we
>> need.
>>
>> The final commit-fest is in 5 days --- this is not the time for design
>> discussion and feature additions. If we wait for SR to be feature
>> complete, with design discussions, etc, we will hopelessly delay 8.5 and
>> people will get frustrated. I am not saying we can't talk about design,
>> but none of this should be a requirement for 8.5.
>
> We can't add monitoring until we know what the performance
> characteristics are. Hmmm. And how will we know what the performance
> characteristics are, I wonder?
well I would say we do exactly how we have done in the past with other
features - by debugging the stuff with low level tools until we fully
understand what it really is and then we can always add more
"accessible" stats.
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2010-01-12 21:19:17 | Re: Hot Standy introduced problem with query cancel behavior |
Previous Message | Simon Riggs | 2010-01-12 20:49:01 | Re: Streaming replication status |