Re: Sequential vs. random values - number of pages in B-tree

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pinker <pinker(at)onet(dot)eu>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Sequential vs. random values - number of pages in B-tree
Date: 2016-08-19 17:02:59
Message-ID: CA+bJJbywO9hknJyqK7x5kixyFwyWRwah1Wn801f4YP_VpdyhRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Aug 19, 2016 at 3:20 PM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> There's a simple technique that works on top of a Feistel network,
> called the cycle-walking cipher. Described for instance at:
> http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf
> I'm using the opportunity to add a wiki page:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
> with sample plgsql code for the [0..10,000,000] range that might be useful
> for other cases.

Nice reference, nice WikiPage, nice job. Bookmarking every thing.

> But for the btree fragmentation and final size issue, TBH I don't expect
> that constraining the values within that smaller range will make any
> difference
> in the tests, because it's the dispersion that matters, not the values
> themselves.

Neither do I, that is why I stated probabley good enough for tests, but....

> I mean that, whether the values are well dispersed in the [0..1e7] range or
> equally well dispersed in the [0..2**32] range, the probability of a newly
> inserted value to compare greater or lower to any previous values of the list
> should be the same, so shouldn't the page splits be the same, statistically
> speaking?

I know some btrees do prefix-coding of the keys, and some do it even
with long integers, and some others do delta-coding for integer keys.
I seem to recall postgres does not, that's whay I do not think it will
make a difference.

But anyway, to compare two things like that, as the original poster
was doing, I normally prefer to test just one thing at a time, that's
why I would normally try to do it by writing a sorted file, shuffling
it with sort -R, and copying it, server side if posible, to eliminate
so both

Francisco Olarte.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2016-08-19 17:13:44 Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Previous Message Victor Blomqvist 2016-08-19 17:02:58 Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans