Re: performance enhancements for PostgreSQL: update

From: "Johnson, Shaunn" <SJohnson6(at)bcbsm(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: performance enhancements for PostgreSQL: update
Date: 2002-11-18 21:58:02
Message-ID: 73309C2FDD95D11192E60008C7B1D5BB04C74259@snt452.corp.bcbsm.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

--thanks for the reply all

--this is what i have so far:

--[snip]

> In an effort to enhance / streamline performance, I've done
> the following:
>
> * memory upgrade from 512M to 1G
> * move RAID5 to scsi drives (10K RPM)
> * set up cron script to vacuum database weekly
> * set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N
64
> -d 4 )

--in the startup script, i launch / end postgres this way:

--[snip from startup]

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o '-i -B 128 -N 64
-d 2' \
-p /usr/bin/postmaster start >/dev/null "

--and--

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl stop -D $PGDATA -s -m fast" >
/dev/null 2>&1

--[/snip]

--i have been trying to use the postgres.conf file but i haven't
--had much success with it. will put that up higher in the priority
--food chain.

--can someone explain what the benefits for that (allocating
--more buffer blocks) will be? rather, can someone post a link
--to the docs that explains why it would benefit me?

You could probably allocate WAY more buffer blocks than that. I run 4000
on most of my medium weight machines, and with a gig of ram you could get
away with quite a bit more, but you'll need to increads shmmax and shmall
to go very high. But I don't think that's your major problem.

--working on it now ... had to take the '-d 4' part of my start up options
--down to a '-d 2' ... the system was thinking / writing for too long just
--to test a simple query ...

More important, don't start the postmaster that way. Edit the
$PGDATA/postgresql.conf file, then use the pg_ctl command to start and
stop it.

> But I'm at the point now that I can't kill some jobs. Yes, I know
> I shouldn't use 'kill' in any forceful way, but just a kill seems to do
> nothing (or, if it is doing something, it's not fast enough
> for the user community and it's stopping production).

--i've seen it first hand why it's not good, but, i got this not
--too long ago:

--[excerpt from old email]
Use kill -9. Do a:

killall -9 postgres
killall -9 postmaster

The tip message

'Don't kill -9 the postmaster'

is old and was added when havoc could be caused by an old
postgres backend process. Consider the scenrio:

1. postmaster started
2. postgres started (Session #1)
3. postmaster killed (-9)
4. postmaster restarted
5. postgres started (Session #2)

Now there isn't any synchronization between Sesison #1 and
Session #2 at all, which would lead to data corruption. This
scenario was fixed a long time ago (7.1?). The whole 'Don't kill
-9 the postmaster' comment was actually a tongue-in-cheek remark
by me regarding a parallel discussion of RedHat init scripts.
The corruption possibility has long-since been fixed. Since I've
seen FUD claiming PostgreSQL doesn't have sufficient
crash-recovery because of the tip, I suggest the tip be changed to:

'Feel free to kill -9 the postmaster'

--[/excerpt]

--is this true? perhaps someone can verify this.

You can kill individual backends pretty safely, it's the postmaster you
can't kill -9 safely. If you see a postgres child running away with all
your memory etc... you can kill -9 that pid pretty safely.

> I got this from a co-worker:
>
> [snip]
>
> the server is blocking on access to the metadata tables. not even
> logins are being processed. I'm not sure what caused the
> problem, but I think a database restart is the best course,
> which I have been trying to do. Do not kill -9, as it will corrupt the
> WAL.

--yup. tried that and nothing changed for about 20 minutes or so ... so
--i had to reboot the server (i *hate* doing that).

If you want to shut down the server and it doesn't seem to respond to
pg_ctl stop, try 'pg_ctl -m fast stop' and see if that works.

--[snip raid errors]

--this is what the /proc/mdstat says

--[snip from /proc/mdstats]

Personalities :
read_ahead not set
unused devices: <none>

--[/snip from /proc/mdstats]

That looks like a dead drive in your RAID array. What does 'cat
/proc/mdstat' say about the drive

--i'm doing a linux software raid

(are you using linux software raid, or a hardware controller?)

> * have can I figure out how access to the metadata tables
> are being stopped? (my guess is the error on the scsi drive, but ... )

Sounds like processes are hanging, and users are just trying to
reconnect over and over and you're running out of connections.
This is a symptom, not the problem, which is your machine is
having issues.

[snip rest of email]

--i'm thinking more and more that while someone was trying to
--update / insert data into a table the scsi disk stopped writing
--(or just gave a lot of errors while writing). the thing is, *some* people
--could do work (albeit very little).

--at any rate, i'll have to investigate more later (probably the
--weekend) and stress test the array.

--also, i hope to get more messages in the log file with a
--better debug level to help track the problem.

--thanks again!

-X

Responses

Browse pgsql-general by date

  From Date Subject
Next Message scott.marlowe 2002-11-18 22:30:57 Re: performance enhancements for PostgreSQL: update
Previous Message Justin Clift 2002-11-18 21:51:17 Looking for a "Linux on Playstation 2" person to compile PostgreSQL RPM's