Re: [QUESTION/PROPOSAL] loose quadtree in spgist

From: Peter Griggs <petergriggs33(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [QUESTION/PROPOSAL] loose quadtree in spgist
Date: 2020-01-28 06:23:33
Message-ID: CACEwj4q91T-Z3Cc=9-Z3EwnPSacvAponrwX3tBG_6aefKFTrJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I was wondering if someone could help me understand what the
"allTheSame" attribute in the spgChooseIn struct is.
Does it mean that the inner tuple contains either only inner tuples or only
leaf nodes? Or is it saying that the tuples in an inner tuple are all in
the same quadrant?

This is a code snippet from /src/include/access/spgist.h:
/*
* Argument structs for spg_choose method
*/
typedef struct spgChooseIn
{
Datum datum; /* original datum to be indexed */
Datum leafDatum; /* current datum to be stored at leaf */
int level; /* current level (counting from zero) */

/* Data from current inner tuple */
bool allTheSame; /* tuple is marked all-the-same? */
bool hasPrefix; /* tuple has a prefix? */
Datum prefixDatum; /* if so, the prefix value */
int nNodes; /* number of nodes in the inner tuple */
Datum *nodeLabels; /* node label values (NULL if none) */
} spgChooseIn;

On Thu, Jan 16, 2020 at 12:32 AM Peter Griggs <petergriggs33(at)gmail(dot)com>
wrote:

> As an update, you were totally right Tom, SPGIST loads all of the tuples
> onto the root page which doesn't call picksplit until there's a page's
> worth of data. I just wasn't inserting enough tuples to see my elog values
> appear in the log, but now I am and they do!
>
> The hint from Tomas before to use the CREATE OPERATOR CLASS command was
> spot on. That documentation lead me to this page (
> https://www.postgresql.org/docs/11/xindex.html) which looked like the
> sql I need to include in the extension to get the new loose quadtree index
> to build. What I did was create a "loose_quadtree" folder for my extension
> in the /contrib folder and followed the format of another extension by
> including a Makefile, loose_quadtree.control file, loose_quadtree.sql, and
> my loose_quadtree.c file, in which I just put copies of the box quadtree
> functions from geo_spgist.c with a bit of extra logging code. Now, I can
> build the extension with make and then run 'CREATE EXTENSION
> loose_quadtree;', then index using 'spgist(b loose_quadtree_ops)' operator
> class! Now, I have to actually figure out how to change the logic within
> the functions from geo_spgist.c.
>
> I was wondering if you had advice on how to handle implementing insertion
> for the loose quadtree index. For some background, a loose quadtree is
> similar to a quadtree over boxes, except that the length of a side becomes
> k*L where k>1. Throughout this, I assume that our space is a square (take
> the minimum bounding square of all of the boxes). Usually, a value of K=2
> is used. Since, each loose quadtree cell is 2x its normal size, a given
> level can hold any object that has a radius of <=1/4 of the cell side
> length, regardless of the object's position. We can do a bit of math and
> figure out what level an object should be inserted into the tree in O(1)
> time. I'm including a picture of the level selection algorithm below, but
> its just a re-formulation of what i've said above. My overall question is
> how to do this in spgist. From what I understand in the spgist insertion
> algorithm, the level selection would should done in the choose() function
> because choose() is called when we are trying to insert a leaf tuple into a
> inner tuple that has one or more levels under it. Currently, it seems like
> the spg_box_quad_choose() function descends recursively into a quadtree
> node. What I would like to do is to have it jump straight to the level it
> wants to insert into using the loose quadtree level selection algorithm and
> then find which cell it should add to by comparing its center coordinates.
>
> [Following image from: Thatcher Ulrich. Loose octrees. In Mark DeLoura,
> editor, Game Programming Gems, pages 444–453. Charles River Media, 2000]
>
> [image: Screen Shot 2020-01-14 at 6.15.09 PM.png]
>
> Best,
> Peter
>
>
>
>
>
> On Wed, Jan 8, 2020 at 3:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Peter Griggs <petergriggs33(at)gmail(dot)com> writes:
>> > In the getQuadrant function in the file
>> src/backend/utils/adt/geo_spgist.c,
>> > I only added some elog statements to see the quadrant that a box is
>> placed
>> > into using the current code. getQuadrant is called several times by the
>> > spg_box_quad_picksplit function, which is used when inserting into the
>> > quadtree. With this change, I can still build postgres but when I try to
>> > trigger the code, nothing gets printed to my logfile.
>>
>> Perhaps you're looking in the wrong logfile. elog(LOG) should definitely
>> produce output unless you're using very strange settings.
>>
>> Another possibility is that the specific test case you're using doesn't
>> actually reach this function. I'm not totally sure, but I think that
>> SPGiST might not call the datatype-specific choose or picksplit functions
>> until it's got more than one index page's worth of data.
>>
>> > elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x,
>> centroid->low.y);
>>
>> A couple thoughts here:
>>
>> * From memory, the x and y values of a BOX are float8, so don't you want
>> to use %g or %f instead of %d?
>>
>> * You don't want to end an elog with \n, that'll just add extra blank
>> lines to the log.
>>
>> regards, tom lane
>>
>
>
> --
> Peter Griggs
> Masters of Engineering (Meng) in Computer Science
> Massachusetts Institute of Technology | 2020
>

--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-28 06:26:07 Adding one CheckTableNotInUse() for REINDEX CONCURRENTLY
Previous Message Amit Kapila 2020-01-28 06:12:58 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions