Index question

From: Troy D <ugly_hippo(at)yahoo(dot)ca>
To: pgadmin-support(at)postgresql(dot)org
Subject: Index question
Date: 2002-07-19 19:19:30
Message-ID: 3D386642.9000501@yahoo.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support

Hi,

I am running pg7.1 on SuSE 7.2.
I have a table with three varchar fields making up the primary key, plus
four data fields.

create table "foo"(
"pk1" varchar(7) not null,
"pk2" varchar(7) not null,
"pk3" varchar(7) not null,
"data1" float8 default 0 not null,
"data2" float8 default 0 not null,
"data3" float8 default 0 not null,
"data4" float8 default 0 not null
constraint "foo_pkey" primary key ("pk1", "pk2", "pk3")
);

This automatically creates an index (so says pgadmin)

create unique index foo_pkey on foo using btree ( pk1 varchar_ops, pk2
varchar_ops, pk3 varchar_ops );

On this table, I run an insert/update routine that reads from table bar
and inserts or updates table foo. I have used php to read in (about
2000) records from table bar and then insert or update table foo, then
print the record number (of result from bar) and the time.

I was getting about one transaction (each bar record requires [4
selects, an insert and an update]*64) every two to four seconds, getting
obviously progressively slower with each batch.

So, I did something stupid(?) - I created an index:

create index foo_index2 on foo using btree ( pk1 varchar_ops, pk2
varchar_ops, pk3 varchar_ops );

and now I get better than one transaction per second, with minor
degradation from one batch to the next.

Why?
The second index is hardly different than the first. (This is the
opposite of what Josh Berkus recommended doing in 'The Joy of Indexing'. )
Did the index not get created in the first place? Or did the stats get
updated when I created the second index?

Other info:
The empty table foo was vacuum'd before the first batch insert. Not
vacuum analyze'd.
The index was created halfway through the first batch - the time stamp
shows the interval change.
I wiped the table (delete from foo), removed the index and did it again,
hence, repeatable at least twice.
Before the index was created, an 'explain' on the insert statement
showed a seq scan on the table...

Insert into foo
select 'pk1value', 'pk2value', 'pk3value', 0, 0, 0, 0
where (select count(*) from foo where pk1 = 'pk1value' and pk2 =
'pk2value' and pk3 = 'pk3value') = 0

i.e don't insert if it's already there.

After the index was created, an 'explain' on the insert statement showed
an index scan on the table using foo_index2.

Version 0.01 of the query took three and a half days, but applying Bruce
Momjian's optimizing techniques (PostgreSQL Hardware Performance Tuning)
and other optimizing techniques got it down to three hours. Adding the
rogue index dropped it to an hour.

Any other suggestions would be greatly appreciated.

Thanks in advance,
Troy

______________________________________________________________________
Post your ad for free now! http://personals.yahoo.caa

Browse pgadmin-support by date

  From Date Subject
Next Message cdxalegal_ins 2002-07-20 19:42:31 If you had a legal problem today, what would you do? afnak
Previous Message Roger Fisher 2002-07-19 17:45:20 Newbie - PG_DUMP