From: | David Christensen <david(dot)christensen(at)crunchydata(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: Initdb-time block size specification |
Date: | 2023-06-30 21:40:20 |
Message-ID: | CAOxo6XLz0O-UtFBPsc-whzooJ2mXHYmkgXm64OKEDwEbP=vLkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 30, 2023 at 4:17 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-06-30 15:05:54 -0500, David Christensen wrote:
> > > I am fairly certain this is going to be causing substantial performance
> > > regressions. I think we should reject this even if we don't immediately find
> > > them, because it's almost guaranteed to cause some.
> >
> > What would be considered substantial? Some overhead would be expected,
> > but I think having an actual patch to evaluate lets us see what
> > potential there is.
>
> Anything beyond 1-2%, although even that imo is a hard sell.
I'd agree that that threshold seems like a reasonable target, and
anything much above that would be regressive.
> > > Besides this, I've not really heard any convincing justification for needing
> > > this in the first place.
> >
> > Doing this would open up experiments in larger block sizes, so we
> > would be able to have larger indexable tuples, say, or be able to
> > store data types that are larger than currently supported for tuple
> > row limits without dropping to toast (native vector data types come to
> > mind as a candidate here).
>
> You can do experiments today with the compile time option. Which does not
> require regressing performance for everyone.
Sure, not arguing that this is more performant than the current approach.
> > We've had 8k blocks for a long time while hardware has improved over 20+
> > years, and it would be interesting to see how tuning things would open up
> > additional avenues for performance without requiring packagers to make a
> > single choice on this regardless of use-case.
>
> I suspect you're going to see more benefits from going to a *lower* setting
> than a higher one. Some practical issues aside, plenty of storage hardware
> these days would allow to get rid of FPIs if you go to 4k blocks (although it
> often requires explicit sysadmin action to reformat the drive into that mode
> etc). But obviously that's problematic from the "postgres limits" POV.
>
>
> If we really wanted to do this - but I don't think we do - I'd argue for
> working on the buildsystem support to build the postgres binary multiple
> times, for 4, 8, 16 kB BLCKSZ and having a wrapper postgres binary that just
> exec's the relevant "real" binary based on the pg_control value. I really
> don't see us ever wanting to make BLCKSZ runtime configurable within one
> postgres binary. There's just too much intrinsic overhead associated with
> that.
You may well be right, but I think if we haven't tried to do that and
measure it, it's hard to say exactly. There are of course more parts
of the system that are about BLCKSZ than the backend, plus you'd need
to build other extensions to support each option, so there is a lot
more that would need to change. (That's neither here nor there, as my
approach also involves changing all those places, so change isn't
inherently bad; just saying it's not a trivial solution to merely
iterate over the block size for binaries.)
David
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2023-06-30 21:42:30 | Re: Initdb-time block size specification |
Previous Message | Tom Lane | 2023-06-30 21:40:18 | Re: Should we remove db_user_namespace? |