Asking advice on speeding up a big table

From: felix(at)crowfix(dot)com
To: pgsql-general(at)postgresql(dot)org
Subject: Asking advice on speeding up a big table
Date: 2006-04-10 06:17:36
Message-ID: 20060410061736.GA14027@crowfix.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a simple benchmark which runs too slow on a 100M row table, and
I am not sure what my next step is to make it faster.

It's a simple setup, part of a larger system. There are three data
tables, each with a BIGINT id and a data column of dofferent types.
There is a fourth table with BIGINT foreign key references to the
other three tables' id columns.

felix=> \d key
Table "oddtimes.key"
Column | Type | Modifiers
--------+--------+-----------
id | bigint | not null
data | text | not null
Indexes:
"key_pkey" PRIMARY KEY, btree (id)
"key_data_key" UNIQUE, btree (data)

felix=> \d val
Table "oddtimes.val"
Column | Type | Modifiers
--------+---------+-----------
id | bigint | not null
data | integer | not null
Indexes:
"val_pkey" PRIMARY KEY, btree (id)
"val_data_key" UNIQUE, btree (data)

felix=> \d sid
Table "oddtimes.sid"
Column | Type | Modifiers
--------+---------------+-----------
id | bigint | not null
data | character(40) | not null
Indexes:
"sid_pkey" PRIMARY KEY, btree (id)

felix=> \d glue
Table "oddtimes.glue"
Column | Type | Modifiers
--------+--------+-----------
key | bigint |
val | bigint |
sid | bigint |
Indexes:
"glue_key_idx" btree ("key")
"glue_key_val_idx" btree ("key", val)
"glue_val_idx" btree (val)
"glue_val_key_idx" btree (val, "key")
Foreign-key constraints:
"glue_key" FOREIGN KEY ("key") REFERENCES "key"(id)
"glue_val" FOREIGN KEY (val) REFERENCES val(id)
"glue_sid" FOREIGN KEY (sid) REFERENCES sid(id)

Usage is to match data from the key and val tables to fetch the data
value from the sid table. It's sort of a glorified Berkeley db, but
you can do compare ranges, not just exact matches. If I can make it
fast enough, I may add two more types, date and text.

The key table has 1K rows, val has 100K, and sid and glue have 100M
rows. They take about 31G space, last time I checked.

felix=> SELECT relname, relpages FROM pg_class ORDER BY relpages DESC LIMIT 20;
relname | relpages
---------------------------------+----------
sid | 1086957
glue | 735295
glue_key_val_idx | 385042
glue_val_key_idx | 385042
sid_pkey | 274194
glue_key_idx | 274194
glue_val_idx | 274194
val | 589
val_pkey | 382
val_data_key | 283

My benchmark times SQL matches like this. This example was a repeat
and has reasonable speed. But fresh values take around 3-4 seconds.
This is 8.0.3 on a dual Opteron dual core machine with only 2G RAM (it
is meant for compute intensive work and was idle while I ran these
tests). The disks are plain vanilla IDE, maybe SATA, but nothing at
all special.

felix=> explain analyze SELECT sid.data, val.data FROM key, val, sid, glue WHERE key.data = 'UajzAQjTJPevVJBuuerjU4pcl8eJcyrIxzkC' AND key.id = glue.key AND val.data = 1984186373 AND val.id = glue.val AND glue.sid = sid.id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5.84..4480.89 rows=1 width=48) (actual time=32.157..32.157 rows=0 loops=1)
-> Hash Join (cost=5.84..4477.87 rows=1 width=12) (actual time=32.149..32.149 rows=0 loops=1)
Hash Cond: ("outer"."key" = "inner".id)
-> Nested Loop (cost=0.00..4467.01 rows=1001 width=20) (actual time=0.205..28.304 rows=1000 loops=1)
-> Index Scan using val_data_key on val (cost=0.00..6.01 rows=1 width=12) (actual time=0.059..0.066 rows=1 loops=1)
Index Cond: (data = 1984186373)
-> Index Scan using glue_val_idx on glue (cost=0.00..4447.15 rows=1108 width=24) (actual time=0.131..20.670 rows=1000 loops=1)
Index Cond: ("outer".id = glue.val)
-> Hash (cost=5.84..5.84 rows=1 width=8) (actual time=0.123..0.123 rows=0 loops=1)
-> Index Scan using key_data_key on "key" (cost=0.00..5.84 rows=1 width=8) (actual time=0.100..0.105 rows=1 loops=1)
Index Cond: (data = 'UajzAQjTJPevVJBuuerjU4pcl8eJcyrIxzkC'::text)
-> Index Scan using sid_pkey on sid (cost=0.00..3.01 rows=1 width=52) (never executed)
Index Cond: ("outer".sid = sid.id)
Total runtime: 32.293 ms
(14 rows)

I realize these indices and tables take a lot of space and that is
at least mostly the reason for the slow speed. But still, several
seconds seems like an eternity. Are there things I can do to speed
this up? Or this time way out of line and ought to be faster, meaning
I have some bug somewhere?

--
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
Felix Finch: scarecrow repairman & rocket surgeon / felix(at)crowfix(dot)com
GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-04-10 06:51:30 Re: Asking advice on speeding up a big table
Previous Message Shoaib Mir 2006-04-10 05:28:59 Re: Postgresql with Access 97