From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Claudio Freire <klaussfreire(at)gmail(dot)com> |
Subject: | Re: Sparse bit set data structure |
Date: | 2019-03-14 15:37:16 |
Message-ID: | CAOBaU_ZS9bPogDHPs8KhP+vQNZayY2Z5rOc5PdP4R0zRwnZ6Ag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 13, 2019 at 8:18 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I started to consider rewriting the data structure into something more
> like B-tree. Then I remembered that I wrote a data structure pretty much
> like that last year already! We discussed that on the "Vacuum: allow
> usage of more than 1GB of work mem" thread [2], to replace the current
> huge array that holds the dead TIDs during vacuum.
>
> So I dusted off that patch, and made it more general, so that it can be
> used to store arbitrary 64-bit integers, rather than ItemPointers or
> BlockNumbers. I then added a rudimentary form of compression to the leaf
> pages, so that clusters of nearby values can be stored as an array of
> 32-bit integers, or as a bitmap. That would perhaps be overkill, if it
> was just to conserve some memory in GiST vacuum, but I think this will
> turn out to be a useful general-purpose facility.
I had a quick look at it, so I thought first comments could be helpful.
+ * If you change this, you must recalculate MAX_INTERVAL_LEVELS, too!
+ * MAX_INTERNAL_ITEMS ^ MAX_INTERNAL_LEVELS >= 2^64.
I think that MAX_INTERVAL_LEVELS was a typo for MAX_INTERNAL_LEVELS,
which has probably been renamed to MAX_TREE_LEVELS in this patch.
+ * with varying levels of "compression". Which one is used depending on the
+ * values stored.
depends on?
+ if (newitem <= sbs->last_item)
+ elog(ERROR, "cannot insert to sparse bitset out of order");
Is there any reason to disallow inserting duplicates? AFAICT nothing
prevents that in the current code. If that's intended, that probably
should be documented.
Nothing struck me other than that, that's a pretty nice new lib :)
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Banck | 2019-03-14 15:54:34 | Re: Offline enabling/disabling of data checksums |
Previous Message | Michael Banck | 2019-03-14 15:26:20 | Re: Offline enabling/disabling of data checksums |