Re: Subject: bool / vacuum full bug followup part 2

From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-03 23:30:47
Message-ID: Pine.LNX.4.33.0205031702200.1471-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 3 May 2002, Tom Lane wrote:

> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > Well, my keys aren't changing and the index is growing like they are.
>
> Could we see the exact details of your test case?

Sure. I think I posted most of it here already...

Here's my table:

scott.marlowe=# \d test
Table "test"
Column | Type | Modifiers
--------+---------+-----------
buf | text |
yn | boolean |
id | integer |
Indexes: test_id_dx,
test_yn_dx

Here's the indexes:

scott.marlowe=# \di test_id_dx
List of relations
Name | Type | Owner
------------+-------+---------------
test_id_dx | index | scott.marlowe

scott.marlowe=# \di test_yn_dx
List of relations
Name | Type | Owner
------------+-------+---------------
test_yn_dx | index | scott.marlowe
(1 row)

Here's the php script that creates the test data, it makes $count number
of rows and sets the bool to true or false randomly:

set_time_limit(3600);
$conn = pg_connect("dbname=scott.marlowe host=css120.ihs.com");
$count = 100000;
pg_exec($conn,"begin");
$flag[0]="false";
$flag[1]="true";
for ($i=0;$i<$count;$i++){
if ($i%1000==0) {
pg_exec($conn,"end");
pg_exec($conn,"begin");
}
$letter = chr(rand(65,91));
$tf = rand(0,1);
$query = "insert into test (buf,yn) values ('";
$query.= $letter."',".$flag[$tf].")";
pg_exec($conn,$query);
}
pg_exec($conn,"end");

Here's the files in my database directory, and their size by du in
kbytes after vacuum full;
vacuum;
reindex index test_id_dx;
reindex index test_yn_dx;

[postgres(at)css120 16556]$ oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
126572 = accounts
126574 = accounts_pkey
126566 = branches
126568 = branches_pkey
126575 = history
126569 = tellers
126571 = tellers_pkey
16557 = test
1126687 = test_id_dx
1126688 = test_yn_dx

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
11448 16557 (test)
1772 1126687 (test_id_dx)
1772 1126688 (test_yn_dx)

WHAT I DID:

scott.marlowe=# update test set yn=true where yn=true;
UPDATE 50080

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
17176 16557
3516 1126687
2924 1126688

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
17176 16557
3516 1126687
2924 1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
11448 16557
3516 1126687
4052 1126688 <-- Notice that the index here just GREW

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
68744 16557
13980 1126687
15660 1126688

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
68736 16557
13964 1126687
15652 1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
11448 16557
13964 1126687
16808 1126688

***************
So, now thinking the problem might be just vacuum full, I try plain old
vacuums
***************

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
22908 16557
13964 1126687
20088 1126688

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
22908 16557
13964 1126687
22380 1126688

*****************************
Nope, the index on the bools just keeps growing and growing.
Given a few million updates and it will be bigger than the data it is
supposed to index.

scott.marlowe=# reindex index test_yn_dx;
REINDEX
scott.marlowe=# reindex index test_id_dx;
REINDEX

oid2name -d scott.marlowe |grep dx
1126690 = test_id_dx
1126689 = test_yn_dx
[postgres(at)css120 16556]$ du -s 16557 1126690 1126689
22908 16557
1772 1126690
1772 1126689

and now they're small again.

It would at least be nice if reindex was operational in a transaction so
it would be safe to use on a live database, since it appears to not be
intended for this purpose really, but for fixing broken indexes. Til then
I'll write a script that asks pg_indexes that drops the index and
recreates it in a transaction to keep my data store svelt and clean.

Thanks for the attention to this. Scott.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Uros Gruber 2002-05-03 23:31:54 Re: storing intermediate results in recursive plpgsql f
Previous Message Tom Lane 2002-05-03 22:36:31 Re: Subject: bool / vacuum full bug followup part 2