Re: Filesystem vs. Postgres for images

From: Anton Nikiforov <anton(at)nikiforov(dot)ru>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc: postgres list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Filesystem vs. Postgres for images
Date: 2004-04-13 17:24:18
Message-ID: 407C2242.4050708@nikiforov.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

scott.marlowe пишет:

>On Tue, 13 Apr 2004, Christopher Petrilli wrote:
>
>
>
>>2. Retrieval time is limited not by disk bandwidth, but by I/O seek
>>performance. More spindles = more concurrent I/O in flight. Also, this
>>is where SCSI takes a massive lead with tag-command-queuing.
>>
>>In our case, we ended up using a three-tier directory structure, so
>>that we could manage the number of files per directory, and then
>>because load was relatively even across the top 20 "directories", we
>>split them onto 5 spindle-pairs (i.e. RAID-1). This is a place where
>>RAID-5 is your enemy. RAID-1, when implemented with read-balancing, is
>>a substantial performance increase.
>>
>>
>
>Please explain why RAID 5 is so bad here. I would think that on a not
>very heavily updated fs, RAID-5 would be the functional equivalent of a
>RAID 0 array with one fewer disks, wouldn't it? Or is RAID 0 also a bad
>idea (other than the unreliability of it) because it only puts the data on
>one spindle, unlike RAID-1 which puts it on many.
>
>In that case >2 drive RAID 1 setups might be a huge win. The linux kernel
>certainly supports them, and I think some RAID cards do too.
>
>Just wondering.
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 7: don't forget to increase your free space map settings
>
>
Hello All.
I'll try to explain the raid scheme
First of all the head movement takes 99% of all data retrival time in
case you would like to get a small block of data (actualy one FS block).
You need to move HDD's heads something like 4-20ms when reading of a
block of data (actualy one cilinder will hit hte disk's cache) takes
1000 times less time.
Now to the RAIDs:
It is true that the only RAID that allow increasing of record speed is
RAID0. That is why all database developers recomend RAID0+1 or RAID 10
(They are different, but it is not the topic here). So if you need
record speed - you know the way.
At most all RAIDs give you a read performance goal. The matter is when
RAID5 is slower than RAID1 (whatever else) is the matter of disk
subsystem planning and configuration.
If you have the FS block size is 4K, then all disk IO from the OS point
of view is reading 4k blocks.
While in the RAID you could have a block size configured to
4,8,16,32,64,128k.
Lets imagine three situations:
1. Raid bock size is 4k and we have 3 disks in RAID5
The controller will read data by blocks, so it could get 2 blocks at a
time (3rd disk stores redundancy information). The situation is exactly
like when using RAID1 with 2 disks.
2. Raid block is 128k, and we have 3 disks in the RAID5
The controller will read the whole block even if you have asked to read
only 4k (and you did, because of FS request size). And as you could see
124k will hit cache but will be useless.
But if you have files that of comparable size with the block or much
more in size than a block you will increase reading performance
drammaticaly (like for video files which were put on the disks
contineously and are being read block by block).

So, if you have some time try to "play" with your raid 5 and you will
see the differences when you change block size of you FS or RAID's
stripe size. But you will see that single disk writes data always faster
than RAID 5.

If you are talking about software raid (supported by the kernel) - it
will be always slower than hardware one (you will loose at least 30% of
your system bus and CPU power for calculations and internal RAID5 data
computing). With RAID 0/1 it is not so drammatical but remember that you
have RAID1 support in the kernel not for the productivity improvement of
IO but for redundancy. And software raids does not decrease your system
downtime.

--
Best regads,
Anton Nikiforov

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vivek Khera 2004-04-13 17:44:27 Re: pg_autovacuum won't make on FreeBSD
Previous Message Christopher Petrilli 2004-04-13 16:52:38 Re: Filesystem vs. Postgres for images