Could not read directory "pg_xlog": Invalid argument (on SSD Raid)

From: Data Growth Pty Ltd <datagrowth(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Could not read directory "pg_xlog": Invalid argument (on SSD Raid)
Date: 2009-11-04 04:30:28
Message-ID: 51549ea20911032030n31ae047r227ce9b0a1c821aa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm frequently getting these errors in my console:

4/11/09 2:25:04 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:25:56 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:36:03 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument

and rarely:

3/11/09 10:32:31 PM org.postgresql.postgres[217] ERROR: could not
read directory "pg_clog": Invalid argument

It is clearly not failing all the time, as the pg_xlog file is full of files
that keep being touched and updated. I have not experienced data loss
(yet), but large queries are taking orders of magnitude longer than I would
like.

System:

Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in
64bit mode

Postgres 8.4.1 (Intel 64 bit) from
http://www.kyngchaos.com/software:postgres
( I have also tried compiling from source - I have the same problems
plus a few extra installation issues. The "official" postgresql binary from
http://www.enterprisedb.com/ is not 64 bit)

The postgres data directory is on an SSD Raid 0 array. It can support
around 10K random read I/O per second, or 5K random write I/Os, sustained,
in other applications. pg_xlog and pg_clog are on the same SSD raid array as
the postgres DB.

Under postgres it does several thousand I/Os per second for about 1-2
seconds, then drops back to only about 50 I/Os per second for about 10
seconds, before repeating the cycle. CPU is usually only a couple %
occupied. The console often records an error message "pg_xlog": Invalid
argument during those infrequent activity bursts.

I've looked at the source code in src/port/dirmod.c:

pgfnames(const char *path)
{
....
while ((file = readdir(dir)) != NULL)
{
....
errno = 0;
}
....
if (errno)
{
....
fprintf(stderr, _("could not read directory \"%s\": %s\n"),
path, strerror(errno));
....
}

So it seems that readdir is returning "Invalid argument" occasionally. But
I do not understand how this error could possibly occur in this location.

I've searched for "pg_xlog": Invalid argument, and the only other mention I
have found was on Linux running on a ram disk.

Could this be a race condition? Suggestions?

Stephen

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brian Modra 2009-11-04 04:33:28 Re: Postgres for mobile website?
Previous Message Scott Marlowe 2009-11-04 03:38:24 Re: auditing pg_hba.conf