Re: pgsql: Optimize btree insertions for common case of increasing values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Optimize btree insertions for common case of increasing values
Date: 2018-04-09 19:48:22
Message-ID: CAH2-Wzm171z4Q7=waBeTRyZp9-JPYwEYLCWWoTvr9_YzCQeEAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On Thu, Apr 5, 2018 at 10:16 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I think you can take that wording almost verbatim. Obviously it should
> refer to the optimization by name, and blend into the surrounding text
> in the README. I suggest putting a small section before "On-the-Fly
> Deletion Of Index Tuples", but after the main discussion of deletion +
> recycling. It's essentially an exception to the general rule, so that
> placement makes sense to me.

I also think that we should also say something about extent-based
storage. This optimization relies on the assumption that reading some
stale block cannot read a block from some other relation (which could
perhaps be its own rightmost leaf page). If we ever wanted to share
storage between small relations as extents, that would invalidate the
optimization.

This came up recently on the "PostgreSQL's handling of fsync() errors
is unsafe and risks data loss at least on XFS" thread, and what I
describe is in fact how other database systems manage storage, so this
seems like a real practical consideration.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-09 20:30:36 pgsql: Fix incorrect logic for choosing the next Parallel Append subpla
Previous Message Magnus Hagander 2018-04-09 19:46:26 pgsql: Silence some warnings in TAP tests