Re: Initdb-time block size specification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
Cc: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Initdb-time block size specification
Date: 2023-09-01 14:57:36
Message-ID: CA+Tgmoa5o-Zf+xADmOUWArZKb=8hD-rTonYP5cEXN_FwGUbRpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 31, 2023 at 2:32 PM David Christensen
<david(dot)christensen(at)crunchydata(dot)com> wrote:
> Here's a patch atop the series which converts to 16-bit uints and
> passes regressions, but I don't consider well-vetted at this point.

For what it's worth, my gut reaction to this patch series is similar
to that of Andres: I think it will be a disaster. If the disaster is
not evident to us, that's far more likely to mean that we've failed to
test the right things than it is to mean that there is no disaster.

I don't see that there is a lot of upside, either. I don't think we
have a lot of evidence that changing the block size is really going to
help performance. In fact, my guess is that there are large amounts of
code that are heavily optimized, without the authors even realizing
it, for 8kB blocks, because that's what we've always had. If we had
much larger or smaller blocks, the structure of heap pages or of the
various index AMs used for blocks might no longer be optimal, or might
be less optimal than they are for an 8kB block size. If you use really
large blocks, your blocks may need more internal structure than we
have today in order to avoid CPU inefficiencies. I suspect there's
been so little testing of non-default block sizes that I wouldn't even
count on the code to not be outright buggy.

If we could find a safe way to get rid of full page writes, I would
certainly agree that that was worth considering. I'm not sure that
anything in this thread adds up to that being a reasonable way to go,
but the savings would be massive.

I feel like the proposal here is a bit like deciding to change the
speed limit on all American highways from 65 mph or whatever it is to
130 mph or 32.5 mph and see which way works out best. The whole
infrastructure has basically been designed around the current rules.
The rate of curvature of the roads is appropriate for the speed that
you're currently allowed to drive on them. The vehicles are optimized
for long-term operation at about that speed. The people who drive the
vehicles are accustomed to driving at that speed, and the people who
maintain them are accustomed to the problems that happen when you
drive them at that speed. Just changing the speed limit doesn't change
all that other stuff, and changing all that other stuff is a truly
massive undertaking. Maybe this example somewhat overstates the
difficulties here, but I do think the difficulties are considerable.
The fact that we have 8kB block sizes has affected the thinking of
hundreds of developers over decades in making thousands or tens of
thousands or hundreds of thousands of decisions about algorithm
selection and page format and all kinds of stuff. Even if some other
page size seems to work better in a certain context, it's pretty hard
to believe that it has much chance of being better overall, even
without the added overhead of run-time configuration.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2023-09-01 15:39:24 Add const qualifiers
Previous Message Mark Wong 2023-09-01 14:50:38 Re: Buildfarm failures on urocryon