Re: Having trouble with backups (was: Re: Crash Recovery)

From: Carlos Moreno <moreno(at)mochima(dot)com>
To: PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Having trouble with backups (was: Re: Crash Recovery)
Date: 2003-01-25 15:22:51
Message-ID: 3E32ABCB.6070003@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Tom Lane wrote:

>I said:
>
>>Carlos Moreno <moreno(at)mochima(dot)com> writes:
>>
>>>One quick question, Tom, before some general comments and
>>>reply to the other messages... Where would I specify any
>>>locks the software wants to do?
>>>
>
>>If you are not issuing any explicit "LOCK" SQL commands, then you can
>>disregard my theory.
>>

Well, it was a good thing that you brought it to my attention.
Yes, two minutes after I wrote the message I found the docs
that told me it is an SQL command -- which means that I'm
positively sure that I'm not doing any of those :-) I guess
a well-developed software could use some locks here and there,
and the risk of making a mistake and "over-blocking" things is
there...

>>Actually, that's too simple. Are you creating and dropping tables,
>>or issuing schema-change commands (such as ADD COLUMN or RENAME)?
>>All of those things take exclusive locks on the tables they modify.
>>Ordinary SELECT/INSERT/UPDATE/DELETE operations can run in parallel with
>>pg_dump, but messing with the database structure is another story.
>>

I do that (changing the database schema while the system is
running) once in a while -- but not on a regular basis, and
definitely never during the time a pg_dump is in progress
(*that* would have scared me to death ;-))

>
>I guess the real question here is whether your app is actually stopped
>dead (as it would be if waiting for a lock), or just slowed to a crawl
>(as a performance problem could do). I cannot tell if your "frozen"
>description is hyperbole or literal truth.
>

Actually, you got me on that one... From the "practical" point of
view, you could say it's literal truth (i.e., the system responsiveness
falls to ZERO). The system is an online multi-player game, where the
main things the database is doing is holding the users information
to process the login authentications, and logging results and the
progress of games (to later -- offline -- compute statistics,
rankings, etc.). Logging is done on a separate worker thread, so
it shouldn't matter if that stops for a few minutes (the lists of
SQL's pending to be executed would just grow during that time)...

But the thing is, when I run pg_dump, the games freeze, you are
absolutely unable to connect (the server does not respond, period),
and the players that are in a game, playing, massively abandon
games, and you then see comments in the chat window that the
server went down, etc. (i.e., I take it the server stopped
responding to them and they abandoned thinking that the connection
had dropped, or that the server had died).

Now, I guess a more specific answer to your question is important
(i.e., is the above behaviour the result of the system slowing to
a crawl, or is it that the software just hung on a single db.Exec
statement in the main loop and no single line of code is being
executed until the pg_dump finishes? -- according to the comments
so far, I would say this last option is not possible), and I think
I'll get such an answer when running some tests as suggested by you
and others that replied.

>One thing that might help diagnose it is to look at the output of ps
>auxww (or ps -ef on SysV-ish platforms) to see what all the backends are
>currently doing while the problem exists.
>

We have done (IIRC) top (the command "top", that is), and yes, the
postmaster process takes a lot of CPU... (not sure of the exact
numbers, but it was at the top).

Anyway, thanks again guys for the valuable comments and ideas!!

Carlos
--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-01-25 16:16:12 Re: WAL replay logic (was Re: [PERFORM] Mount options for Ext3?)
Previous Message Andrew Sullivan 2003-01-25 15:20:49 Re: Having trouble with backups (was: Re: Crash Recovery)