Quick Links

Re: Read/Write block sizes

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Read/Write block sizes
Date:	2005-08-23 22:09:09
Message-ID:	60k6icpapm.fsf@dba2.int.libertyrms.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

J(dot)K(dot)Shah(at)Sun(dot)COM (Jignesh Shah) writes:
>> Does that include increasing the size of read/write blocks? I've
>> noticedthat with a large enough table it takes a while to do a
>> sequential scan, even if it's cached; I wonder if the fact that it
>> takes a million read(2) calls to get through an 8G table is part of
>> that.
>
> Actually some of that readaheads,etc the OS does already if it does
> some sort of throttling/clubbing of reads/writes. But its not enough
> for such types of workloads.
>
> Here is what I think will help:
>
> * Support for different Blocksize TABLESPACE without recompiling the
> code.. (Atlease support for a different Blocksize for the whole
> database without recompiling the code)
>
> * Support for bigger sizes of WAL files instead of 16MB files
> WITHOUT recompiling the code.. Should be a tuneable if you ask me
> (with checkpoint_segments at 256.. you have too many 16MB files in
> the log directory) (This will help OLTP benchmarks more since now
> they don't spend time rotating log files)
>
> * Introduce a multiblock or extent tunable variable where you can
> define a multiple of 8K (or BlockSize tuneable) to read a bigger
> chunk and store it in the bufferpool.. (Maybe writes too) (Most
> devices now support upto 1MB chunks for reads and writes)
>
> *There should be a way to preallocate files for TABLES in
> TABLESPACES otherwise with multiple table writes in the same
> filesystem ends with fragmented files which causes poor "READS" from
> the files.
>
> * With 64bit 1GB file chunks is also moot.. Maybe it should be
> tuneable too like 100GB without recompiling the code.
>
> Why recompiling is bad? Most companies that will support Postgres
> will support their own binaries and they won't prefer different
> versions of binaries for different blocksizes, different WAL file
> sizes, etc... and hence more function using the same set of binaries
> is more desirable in enterprise environments

Every single one of these still begs the question of whether the
changes will have a *material* impact on performance.

What we have been finding, as RAID controllers get smarter, is that it
is getting increasingly futile to try to attach knobs to 'disk stuff;'
it is *way* more effective to add a few more spindles to an array than
it is to fiddle with which disks are to be allocated to what database
'objects.'

The above suggested 'knobs' are all going to add to complexity and it
is NOT evident that any of them will forcibly help.

I could be wrong; code contributions combined with Actual Benchmarking
would be the actual proof of the merits of the ideas.

But it also suggests another question, namely...

Will these represent more worthwhile improvements to speed than
working on other optimizations that are on the TODO list?

If someone spends 100h working on one of these items, and gets a 2%
performance improvement, that's almost certain to be less desirable
than spending 50h on something else that gets a 4% improvement.

And we might discover that memory management improvements in Linux
2.6.16 or FreeBSD 5.5 allow some OS kernels to provide some such
improvements "for free" behind our backs without *any* need to write
database code. :-)
--
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/postgresql.html
Wiener's Law of Libraries:
There are no answers, only cross references.

In response to

Re: Read/Write block sizes (Was: Caching by Postgres) at 2005-08-23 21:29:01 from Jignesh Shah

Responses

Re: Read/Write block sizes at 2005-08-23 23:24:24 from Michael Stone
Re: Read/Write block sizes at 2005-08-23 23:36:08 from Jim C. Nasby
Re: Read/Write block sizes at 2005-08-24 01:25:43 from Steve Poe

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Michael Stone	2005-08-23 23:12:38	Re: Read/Write block sizes (Was: Caching by Postgres)
Previous Message	William Yu	2005-08-23 21:59:42	Re: Caching by Postgres