Re: Sequential vs. random values - number of pages in B-tree

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: pinker <pinker(at)onet(dot)eu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Sequential vs. random values - number of pages in B-tree
Date: 2016-08-18 14:02:28
Message-ID: CA+bJJbzHT4SSzE2MTw=1d_u1TFo12w5GzRnbuJR0BjH3eHPrLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi:

On Thu, Aug 18, 2016 at 1:32 PM, pinker <pinker(at)onet(dot)eu> wrote:
...
> create table t01 (id bigint);
> create index i01 on t01(id);
> insert into t01 SELECT s from generate_series(1,10000000) as s;
>
> and random values:
> create table t02 (id bigint);
> create index i02 on t02(id);
> insert into t02 SELECT random()*100 from generate_series(1,10000000) as s;

It's already been told that btrees work that way, if you find it
strange read a bit about them, this is completely normal, but ...

... what I come to point is your test is severely flawed. It probably
does not matter in this case, but you are inserting 10M DIFFERENT
VALUES in the first case and only 100 in the second one, which an
average of 100K DUPLICATES of each. This affects btrees too. You could
try using random*1G, or at least 100M, for a better test ( which may
have even worse behaviour, ideally I would just write 10M integers to
a disk file, then shuffle it and compare COPY FROM times from both ) (
unless you know of an easy way to generate a random permutation on the
fly without using a lot of memory, I do not ).

Francisco Olarte.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Verite 2016-08-18 15:24:37 Re: Sequential vs. random values - number of pages in B-tree
Previous Message Francisco Olarte 2016-08-18 13:51:10 Re: SQL help - multiple aggregates