From: | Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improvement of checkpoint IO scheduler for stable transaction responses |
Date: | 2013-07-03 19:23:03 |
Message-ID: | 51D47A17.6000809@archidevsys.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/07/13 01:31, Robert Haas wrote:
> On Wed, Jul 3, 2013 at 4:18 AM, KONDO Mitsumasa
> <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> I tested and changed segsize=0.25GB which is max partitioned table file size and
>> default setting is 1GB in configure option (./configure --with-segsize=0.25).
>> Because I thought that small segsize is good for fsync phase and background disk
>> write in OS in checkpoint. I got significant improvements in DBT-2 result!
> This is interesting. Unfortunately, it has a significant downside:
> potentially, there will be a lot more files in the data directory. As
> it is, the number of files that exist there today has caused
> performance problems for some of our customers. I'm not sure off-hand
> to what degree those problems have been related to overall inode
> consumption vs. the number of files in the same directory.
>
> If the problem is mainly with number of of files in the same
> directory, we could consider revising our directory layout. Instead
> of:
>
> base/${DBOID}/${RELFILENODE}_{FORK}
>
> We could have:
>
> base/${DBOID}/${FORK}/${RELFILENODE}
>
> That would move all the vm and fsm forks to separate directories,
> which would cut down the number of files in the main-fork directory
> significantly. That might be worth doing independently of the issue
> you're raising here. For large clusters, you'd even want one more
> level to keep the directories from getting too big:
>
> base/${DBOID}/${FORK}/${X}/${RELFILENODE}
>
> ...where ${X} is two hex digits, maybe just the low 16 bits of the
> relfilenode number. But this would be not as good for small clusters
> where you'd end up with oodles of little-tiny directories, and I'm not
> sure it'd be practical to smoothly fail over from one system to the
> other.
>
16 bits ==> 4 hex digits
Could you perhaps start with 1 hex digit, and automagically increase it
to 2, 3, .. as needed? There could be a status file at that level, that
would indicate the current number of hex digits, plus a temporary
mapping file when in transition.
Cheers,
Gavin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-07-03 19:31:44 | Re: refresh materialized view concurrently |
Previous Message | Josh Berkus | 2013-07-03 19:21:18 | Re: [9.4 CF 1] The Commitfest Slacker List |