From: | Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> |
---|---|
To: | vjanand(at)uwm(dot)edu |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: BTree index |
Date: | 2003-11-09 21:57:28 |
Message-ID: | 20031109215728.GA29601@dcc.uchile.cl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 05, 2003 at 09:08:31AM -0600, vjanand(at)uwm(dot)edu wrote:
> I am trying to find information regarding creation of B-tree index in postgres
> for variable length character data (Char/varchar type). Specifically, what
> pagination policy is used, does it use prefix BTree, or any other form of
> compression (encoding)?
I was very surprised while writing this answer:
The whole key is stored (no prefix, no pagination). If the key is too
big, it won't fit into the index and the insertion will be rejected:
regression=# create table test (a text);
CREATE TABLE
regression=# create index test_idx on test(a);
CREATE INDEX
regression=# insert into test values ('hello world');
INSERT 17115 1
regression=# insert into test select repeat(a,10) from test;
INSERT 17116 1
regression=# insert into test select repeat(a,10) from test;
INSERT 0 2
regression=# insert into test select repeat(a,10) from test;
INSERT 0 4
regression=# insert into test select repeat(a,10) from test;
INSERT 0 8
regression=# insert into test select repeat(a,10) from test;
ERROR: fila de índice requiere 12624 bytes, tamaño máximo es 8191
-- oops
regression=# set lc_messages to 'C';
SET
regression=# insert into test select repeat(a,10) from test;
ERROR: index row requires 12624 bytes, maximum size is 8191
So, what size were the tuples inserted:
regression=# select max(length(a)) from test;
max
--------
110000
(1 fila)
What!? 110000 bytes? I have always had the idea that the tuples were
uncompressed, so how can 110000 bytes be stored in 8191 bytes? After
tracking into the sourcecode, I found that in
src/backend/access/common/indextuple.c the index_formtuple routine seems
to compress the key before insertion. In src/include/tuptoaster.h there
is a symbol for activation of this feature that is set at least on my
sources (TOAST_INDEX_HACK).
So, there you are: the compression used is the same "lousy fast" LZ
algorithm used elsewhere in the TOAST code (toast_compress_datum()).
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
FOO MANE PADME HUM
From | Date | Subject | |
---|---|---|---|
Next Message | Larry Rosenman | 2003-11-09 22:18:43 | Re: OSR5: Passes without the setsockopt() calls... |
Previous Message | Hannu Krosing | 2003-11-09 21:56:51 | Re: UPPER()/LOWER() and UTF-8 |