Quick Links

Re: BTree index

From:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To:	vjanand(at)uwm(dot)edu
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: BTree index
Date:	2003-11-09 21:57:28
Message-ID:	20031109215728.GA29601@dcc.uchile.cl
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Nov 05, 2003 at 09:08:31AM -0600, vjanand(at)uwm(dot)edu wrote:

> I am trying to find information regarding creation of B-tree index in postgres
> for variable length character data (Char/varchar type). Specifically, what
> pagination policy is used, does it use prefix BTree, or any other form of
> compression (encoding)?

I was very surprised while writing this answer:

The whole key is stored (no prefix, no pagination). If the key is too
big, it won't fit into the index and the insertion will be rejected:

regression=# create table test (a text);
CREATE TABLE
regression=# create index test_idx on test(a);
CREATE INDEX
regression=# insert into test values ('hello world');
INSERT 17115 1
regression=# insert into test select repeat(a,10) from test;
INSERT 17116 1
regression=# insert into test select repeat(a,10) from test;
INSERT 0 2
regression=# insert into test select repeat(a,10) from test;
INSERT 0 4
regression=# insert into test select repeat(a,10) from test;
INSERT 0 8
regression=# insert into test select repeat(a,10) from test;
ERROR: fila de índice requiere 12624 bytes, tamaño máximo es 8191
-- oops
regression=# set lc_messages to 'C';
SET
regression=# insert into test select repeat(a,10) from test;
ERROR: index row requires 12624 bytes, maximum size is 8191

So, what size were the tuples inserted:
regression=# select max(length(a)) from test;
max
--------
110000
(1 fila)

What!? 110000 bytes? I have always had the idea that the tuples were
uncompressed, so how can 110000 bytes be stored in 8191 bytes? After
tracking into the sourcecode, I found that in
src/backend/access/common/indextuple.c the index_formtuple routine seems
to compress the key before insertion. In src/include/tuptoaster.h there
is a symbol for activation of this feature that is set at least on my
sources (TOAST_INDEX_HACK).

So, there you are: the compression used is the same "lousy fast" LZ
algorithm used elsewhere in the TOAST code (toast_compress_datum()).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
FOO MANE PADME HUM

In response to

BTree index at 2003-11-05 15:08:31 from vjanand

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Larry Rosenman	2003-11-09 22:18:43	Re: OSR5: Passes without the setsockopt() calls...
Previous Message	Hannu Krosing	2003-11-09 21:56:51	Re: UPPER()/LOWER() and UTF-8