From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | Daniel Verite <daniel(at)manitou-mail(dot)org> |
Cc: | pinker <pinker(at)onet(dot)eu>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Sequential vs. random values - number of pages in B-tree |
Date: | 2016-08-19 17:02:59 |
Message-ID: | CA+bJJbywO9hknJyqK7x5kixyFwyWRwah1Wn801f4YP_VpdyhRA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Aug 19, 2016 at 3:20 PM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> There's a simple technique that works on top of a Feistel network,
> called the cycle-walking cipher. Described for instance at:
> http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf
> I'm using the opportunity to add a wiki page:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
> with sample plgsql code for the [0..10,000,000] range that might be useful
> for other cases.
Nice reference, nice WikiPage, nice job. Bookmarking every thing.
> But for the btree fragmentation and final size issue, TBH I don't expect
> that constraining the values within that smaller range will make any
> difference
> in the tests, because it's the dispersion that matters, not the values
> themselves.
Neither do I, that is why I stated probabley good enough for tests, but....
> I mean that, whether the values are well dispersed in the [0..1e7] range or
> equally well dispersed in the [0..2**32] range, the probability of a newly
> inserted value to compare greater or lower to any previous values of the list
> should be the same, so shouldn't the page splits be the same, statistically
> speaking?
I know some btrees do prefix-coding of the keys, and some do it even
with long integers, and some others do delta-coding for integer keys.
I seem to recall postgres does not, that's whay I do not think it will
make a difference.
But anyway, to compare two things like that, as the original poster
was doing, I normally prefer to test just one thing at a time, that's
why I would normally try to do it by writing a sorted file, shuffling
it with sort -R, and copying it, server side if posible, to eliminate
so both
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | Francisco Olarte | 2016-08-19 17:13:44 | Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans |
Previous Message | Victor Blomqvist | 2016-08-19 17:02:58 | Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans |