From: | "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com> |
---|---|
To: | "Magnus Hagander" <magnus(at)hagander(dot)net> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: pg crashing |
Date: | 2008-07-02 13:26:45 |
Message-ID: | 1A6E6D554222284AB25ABE3229A92762E9A68F@nrtexcus702.int.asurion.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
> Roberts, Jon wrote:
> >> Tom Lane wrote:
> >>> "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com> writes:
> >>>> Version: "PostgreSQL 8.3.0, compiled by Visual C++ build 1400"
> >>> Well, there are plenty of known bugs in 8.3.0 by now. You really
> >>> should update before complaining, not after.
> >> Yes. And the traditional question should be asked - is there any
> >> antivirus or other "personal security" software running on tihs
> > machine?
> >> If so, uninstall (not just disable!) it and see if the problem goes
> > away.
> >
> > I am not able to un-install this. However, this problem only
started as
> > the database grew in size and usage. It is 232 MB in size now and
it
> > has 30 or so active sessions 24x7.
>
> But you do run antivirus on the machine? Which antivirus?
We are running Computer Associates eTrust Antivirus. The realtime
scanner has an exclusion of e:\PostgreSQL and all subdirectories.
>
> It's quite possible that this is the reason. As the files grow, the AV
> may take longer to do whatever crap it's doing to them, thus exceeding
> timeouts.
>
The log files are located here: e:\PostgreSQL\data\pg_log
Before I had that exclusion on the directory, the database would crash
because data files would get locked. We don't get those errors now at
all so I don't think this problem is caused by the antivirus program.
We do have lots of sessions running at once. It seems that the extra
load on the server could also cause it to exceed the timeout.
> In general, you can get all sorts of strange things when you run AV on
> your database server. The general recommendation is never to do that
if
> you want things to work, and it will be the first thing you're told to
> remove.
>
I agree to an extent. I've used Oracle and SQL Server on Windows and
when the antivirus program scans any database files, it screws it up.
However, I've never had a problem with an antivirus program once the
database directories were excluded.
> It would be good if you could at least temporarily remove it and see
if
> it fixes the issue.
>
I can't do that. In this corporate environment, I would get shot if I
did that.
>
> >> It would also be interesting to know if there are actually any
other
> >> processes running at this time.
> >>
> >
> > Yes, there were about 30 active sessions executing functions.
>
> Right, but are the processes still active at the moment whrere it
> crashes, or has the postmaster managed to kill them off for the
restart?
>
Yes, there are active sessions.
Could this be the problem?
We have a queue table in which we only allow x number of concurrent
jobs. A job is first inserted into the Queue with a Status of "Queued".
Next the job executes a function to update the Status to "Processing".
Function Details:
1. Creates a db_link
2. "lock table gp_load.queue in access exclusive mode"
3. executes this in the linked session:
select count(*) from gp_load.queue where status = 'Processing'
4. if current jobs less than max jobs allowed then
Update gp_load.queue set status = 'Processing'
Commit
5. Close connection
6. Returns the status of still Queued or Processing
I have to use a dblink because PostgreSQL doesn't have autonomous
transactions. We call a single function to do all of this work and the
other jobs need to see the current Status so they know if they should
wait or run right now.
The calling function uses a while loop to execute this function. If the
function returns Queued, it means it was unable to change the status so
it uses pg_sleep to wait 10 seconds and then it trys again until it is
able to set the status to Processing.
Sorry for the lengthy answer but yes, we could have active sessions
using pg_sleep waiting for a slot in the Queue to process a job. The
log file shows that this happened too.
Jon
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Coventry | 2008-07-02 13:43:30 | Re: SAST FATAL: could not access private key file "server.key" |
Previous Message | Zbigniew Baniewski | 2008-07-02 13:26:06 | SET CLIENT_ENCODING |
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2008-07-02 14:10:10 | Re: Location for pgstat.stat |
Previous Message | Bernd Helmle | 2008-07-02 13:09:55 | Re: A Windows x64 port of MySQL |