Re: Slow count(*) again...

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow count(*) again...
Date: 2010-10-12 16:02:39
Message-ID: 0CA660FA-F59C-494E-BFA9-98CE8E527378@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


On Oct 11, 2010, at 9:21 PM, Samuel Gendler wrote:

On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey <scott(at)richrelevance(dot)com<mailto:scott(at)richrelevance(dot)com>> wrote:
I can't speak to documentation, but it is something that helps as your I/O subsystem gets more powerful, and how much it helps depends more on your hardware, which may have adaptive read ahead on its own, and your file system which may be more or less efficient at sequential I/O. For example ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS on a DELL PERC6 RAID card (but still ends up slower).

Geez. I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs. I couldn't find anything that made a strong filesystem recommendation. How does xfs compare to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure on a production system caused my test system to get thrown into production before I could do any serious testing of xfs. If there is a strong consensus that xfs is simply better, I could afford the downtime to switch.

As it happens, this is a system where all of the heavy workload is in the form of sequential scan type load. The OLTP workload is very minimal (tens of queries per minute on a small number of small tables), but there are a lot of reporting queries that wind up doing sequential scans of large partitions (millions to tens of millions of rows). We've sized the new hardware so that the most commonly used partitions fit into memory, but if we could speed the queries that touch less frequently used partitions, that would be good. I'm the closest thing our team has to a DBA, which really only means that I'm the one person on the dev team or the ops team to have read all of the postgres docs and wiki and the mailing lists. I claim no actual DBA experience or expertise and have limited cycles to devote to tuning and testing, so if there is an established wisdom for filesystem choice and read ahead tuning, I'd be very interested in hearing it.

ext4 is a very fast file system. Its faster than ext2, but has many more features and has the all-important journaling.

However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once, forever, serially. Parallel restore will result in a system that is fragmented -- ext4 will do best at limiting this on the restore, but only xfs has online defragmentation. We schedule ours daily and it noticeably improves sequential scan I/O.

Supposedly, an online defragmenter is in the works for ext4 but it may be years before its available.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2010-10-12 16:03:57 Re: Slow count(*) again...
Previous Message David Fetter 2010-10-12 15:55:31 Re: [JDBC] Support for JDBC setQueryTimeout, et al.

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Browne 2010-10-12 16:03:57 Re: Slow count(*) again...
Previous Message david 2010-10-12 15:54:24 Re: Slow count(*) again...