Re: New Linux xfs/reiser file systems

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Trond Eivind Glomsrød <teg(at)redhat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, sct(at)redhat(dot)com
Subject: Re: New Linux xfs/reiser file systems
Date: 2001-05-04 17:49:54
Message-ID: 200105041749.f44HnsJ29002@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ Charset ISO-8859-1 unsupported, converting... ]
> I got some information from Stephen Tweedie on this - please keep him
> "Cc:" as he's not on this list
>
> ************************************************************************
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > I was talking to a Linux user yesterday, and he said that performance
> > using the xfs file system is pretty bad. He believes it has to do with
> > the fact that fsync() on log-based file systems requires more writes.
>
>
> Performance doing what? XFS has known performance problems doing
> unlinks and truncates, but not synchronous IO. The user should be
> using fdatasync() for databases, btw, not fsync().

This is hugely helpful. In PostgreSQL 7.1, we do use fdatasync() by
default it is available on a platform.

> First, XFS, ext3 and reiserfs are *NOT* log-based filesystems. They
> are journaling filesystems. They have a log, but they are not
> log-based because they do not store data permanently in a log
> structure. Berkeley LFS, Sprite and Spiralog are log-based
> filesystems.

Sorry, I get those mixed up.

> > With a standard BSD/ext2 file system, WAL writes can stay on the same
> > cylinder to perform fsync. Is that true of log-based file systems?
>
> Not true on ext2 or BSD. Write-aheads are _usually_ close to the
> inode, but not always. For true log-based filesystems, writes are
> always completely sequential, so the issue just goes away. For
> journaling filesystems, depending on the setup there may be a seek to
> the journal involved, but some journaling filesystems can use a
> separate disk for the journal so no seek is required.
>
> > I know xfs and reiser are both log based. Do we need to be concerned
> > about PostgreSQL performance on these file systems? I use BSD FFS with
> > soft updates here, so it doesn't affect me.
>
> A database normally preallocates its data files and then performs most
> of its writes using update-in-place. In such cases, fsync() is almost
> always the wrong thing to be doing --- the data writes have changed
> nothing in the inode except for the timestamps, and there's no need to
> flush the timestamps to disk for every write. fdatasync() is
> designed for this --- if the only inode change is timestamps,
> fdatasync() will skip the seek to the inode and will only update the
> data. If any significant inode fields have been changed, then a full
> flush is done.

We do pre-allocate our log file space in chunks to avoid inode/block
index writes.

> Using fdatasync, most filesystems will incur no seeks for data flush,
> regardless of whether the filesystem is journaling or not.

Thanks. That is a big help. I wonder if people reporting performance
problems were using 7.0.3. We only added fdatasync() in 7.1.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2001-05-04 17:51:40 Re: Packaging 7.1.1
Previous Message Joel Burton 2001-05-04 17:47:37 Re: help!