Re: SP-GiST confusing introductory paragraph

From: Alex Matchneer <amatchneer(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Re: SP-GiST confusing introductory paragraph
Date: 2023-10-16 23:29:07
Message-ID: CANB4_QEKZfZhMvuFiVRYx-Yc7SHxQLHbhkC6r5ZyyUNxaEAMRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

I realized later that "these data structures" was referring to
"non-balanced data structures, such as quad-trees, k-d trees, and radix
trees (tries)" from the previous paragraph, and not SP-GiST, which isn't a
data structure per se. I feel a bit silly about this confusion and am not
sure others would have the same confusion I did.

Alex

On Mon, Oct 16, 2023 at 11:09 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
wrote:

> On 2023-Oct-16, PG Doc comments form wrote:
>
> > I'm confused by this paragraph:
> >
> > > These popular data structures were originally developed for in-memory
> > > usage. In main memory, they are usually designed as a set of
> dynamically
> > > allocated nodes linked by pointers. This is not suitable for direct
> storing
> > > on disk, since these chains of pointers can be rather long which would
> > > require too many disk accesses. In contrast, disk-based data structures
> > > should have a high fanout to minimize I/O. The challenge addressed by
> > > SP-GiST is to map search tree nodes to disk pages in such a way that a
> > > search need access only a few disk pages, even if it traverses many
> nodes.
> >
> > In particular, "These popular datastructures" is ambiguous -- based on
> how
> > the previous paragraph ends, it sounds like the "popular datastructures"
> or
> > SP-GiSTs, but then it goes on to say they were designed for in-memory,
> and
> > then it mentions that SP-GiST's space partitioning (with high fanout) is
> > more appropriate to minimize disk access. I think maybe the solution here
> > would be to replace "These popular data structures" with something like
> > "Classic Postgres indexes such as B-tree indexes..." or something like
> > that.
>
> Yeah, this paragraph is a rewording of Oleg's[1]
>
> SP-GiST is an abbreviation of space-partitioned GiST - the search
> tree, which allows to implement a wide range of different
> non-balanced disk-based data structures, such as quadtree, kd-tree,
> tries - popular data structures, originally developed for memory
> storage. Main memory access structures usually designed as a set of
> dynamically allocated nodes linked by pointers, which is not suitable
> for direct storing on disk, since these chains of pointers can be
> rather long and require too many disk accesses. In opposite, disk
> based data structures have a high fanout to minimize I/O. The
> challenge is to map nodes of tree to disk pages in such a way, so
> search algorithm accesses only a few disk pages, even if it traverse
> many nodes.
>
> where the point is (IMO) much clearer.
>
> [1] http://www.sai.msu.su/~megera/wiki/spgist_dev
>
> > Also, I think a short parenthetical definition of "fanout" would be
> useful
> > here, something like "high fanout (i.e. where each node has a potentially
> > large number of child nodes that reside in the same disk page)". Took me
> > some googling to realize what fanout meant in this context.
>
> Hmm. We also use the term (hypenated as fan-out) in the reference to
> recursive_worktable_factor. Maybe we should list it in the glossary in
> a way that works for both.
>
> --
> Álvaro Herrera PostgreSQL Developer —
> https://www.EnterpriseDB.com/
> "La primera ley de las demostraciones en vivo es: no trate de usar el
> sistema.
> Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
>

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Laurenz Albe 2023-10-17 01:12:18 Re: "20.16. Customized Options" – cannot be set by `ALTER SYSTEM`
Previous Message Tom Lane 2023-10-16 20:21:09 Re: "20.16. Customized Options" – cannot be set by `ALTER SYSTEM`