Re: pg_ctl reload breaks our client

From: Marc Munro <marc(at)bloodnok(dot)com>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_ctl reload breaks our client
Date: 2005-09-16 23:34:46
Message-ID: 1126913686.15766.49.camel@bloodnok.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Michael,
Many thanks for your response; it is much appreciated. My responses are
embedded below:

On Fri, 2005-09-16 at 17:10 -0600, Michael Fuhr wrote:
> On Fri, Sep 16, 2005 at 02:16:29PM -0700, Marc Munro wrote:
> > It is Postgres 7.3.6. The client is a multi-threaded C++ client. The
> > breakage was that one group of connections simply stopped. Others
> > contined without problem. It is not clear exactly what was going on.
>
> How did the connections "stop"? Were the connections broken, causing
> queries to fail? Or did queries block and never return? Or something
> else? What was happening that shouldn't happen, or what wasn't
> happening that should happen?

From the server side, there were simply connections (1 or 2) that
appeared idle. From the client side it looked like a query had been
initiated but the client thread was stuck in a library call (as near as
we can tell). This, vague though it is, is as much as I know right now.
We were unable to do much debugging as it is a production system and the
priority was to get it back up.

> If the connections were still active but not returning, did you do
> a process trace on the connection's postmaster or attach a debugger
> to it to see what it was doing?

No, time pressure prevented this.

> Could the timing of the problem have been coincidence? Have you
> ever seen the problem without a reload? How often do you see the
> problem after a reload? Do you know for certain that the application
> was working immediately before the reload and not working immediately
> after it?

It *could* be coincidence, but the problem began within 5 seconds of the
reload. Coincidence is unlikely.

> What operating system are you using?

Linux 2.4.20 smp i686

>
> > Nothing in our application logs gives us any clue to this.
>
> What about the postmaster logs?

Ah, now there's another story. Unavailable I'm afraid. Resolving that
is also on my priority list.

> > As for reproducibility, it has hapenned before in test environments when
> > we have bounced the datanase. This is not too shocking as I would
> > expect the client to notice this :-) It is a little more shocking when
> > it's a reload. Or maybe I have simply misunderstood what reload does.
>
> Can you reproduce the problem with a reload? A stop and start will
> terminate client connections, but a reload shouldn't.

This is not currently seen as a priority (the work-around of "don't do
that" is seen as sufficient). I'm simply hoping to get someone to say
for sure that the client app should not be able to tell that a reload
has happened. At that point I may be able to raise the priority of this
issue.

I would certainly like to do more investigation. If postgresql hackers
are interested in this strange event (please tell me for sure that it
*is* strange) that may also help me to get the necessary resources to
run more tests.

Thanks again.

__
Marc Munro

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Roger Hand 2005-09-16 23:56:25 Re: Setting WHERE on a VIEW with aggregate function.
Previous Message Michael Fuhr 2005-09-16 23:10:27 Re: pg_ctl reload breaks our client