Re: parametric block size?

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parametric block size?
Date: 2014-07-26 15:40:50
Message-ID: 20140726154050.GF17793@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-07-26 12:50:30 +0200, Fabien COELHO wrote:
> >>The default blocksize is currently 8k, which is not necessary optimal for
> >>all setup, especially with SSDs where the latency is much lower than HDD.
> >
> >I don't think that really follows.
>
> The rationale, which may be proven false, is that with a SSD the latency
> penalty for reading and writing randomly vs sequentially is much lower than
> for HDD, so there is less insentive to group stuff in larger chunks on that
> account.

A higher number of blocks has overhead unrelated to this though:
Increased waste/lower storage density as it gets more frequently that
tuples don't fit into a page; more locks; higher number of buffer
headers; more toasted rows; smaller toast chunks; more vacuuming/heap
pruning WAL records, ...

Now obviously there's also a inverse to this, otherwise we'd all be
using 1GB page sizes. But I don't think storage latency has much to do
with it - it's imo more about write amplification (i.e. turning a single
row update into a 8/4/16/32 kb write).

> >>There is a case for different values with significant impact on performance
> >>(up to a not-to-be-sneezed-at 10% on a pgbench run on SSD, see
> >>http://www.cybertec.at/postgresql-block-sizes-getting-started/) and ISTM
> >>that the ability to align PostgreSQL block size to the underlying FS/HW
> >>block size would be nice.
> >
> >I don't think that benchmark is very meaningful. Way too small scale, way
> >to short runtime (there'll be barely any checkpoints, hot pruning, vacuum
> >at all).
>
> These benchs have the merit to exist, to be consistent (the smaller the
> blocksize, the better the performance), and ISTM that the performance
> results suggest that this is worth investigating.

Well, it's easy to make claims that aren't meaningful with bad
benchmarks.

Those numbers are *far* too low for the presented SSD - invalidating the
entire thing. That's the speed you'd expect for rotating media, not an
SSD. My laptop has the 1TB variant of that disk and I get nearly 10 that
number of TPS. With a parallel parallel make running, a profiler
started, and assertions enabled.

This isn't an actual benchmark, sorry. It's SEO.

> Possibly the "small" scale means that data fit in memory, so the benchmarks
> as run emphasize write performance linked to the INSERT/UPDATE.

Well, the generated data is 160MB in size. Nobody with a concurrent
write heavy OLTP load has that little data.

> What would you suggest as meaningful for scale and run time, say on a
> dual-core 8GB memory 256GB SSD laptop?

At the very least scale hundred - then it likely doesn't fit into
internal caches on common consumer drives anymore. But more importantly
the test has to run over several checkpoint cycles, so hot pruning and
vacuuming are also measured.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-07-26 16:02:26 Re: pg_background (and more parallelism infrastructure patches)
Previous Message Tom Lane 2014-07-26 15:32:24 Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations