From: | hamann(dot)w(at)t-online(dot)de |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | how can this get faster |
Date: | 2013-04-22 17:26:34 |
Message-ID: | wolfgang-1130422192633.A0128168@amadeus3.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I have a table with mainly a list of keywords and a few odd columns referring to their origin.
There is a standard btree index, and searches are carried out as anchored regex.
Entries in the table and queries are uppercased; this turned out to have a big speed
advantage over a ~* condition.
So queries are like:
select * from mytable where word ~ '^CARNEVAL'
The table currently holds 15.000.000 rows, and query times are on the order of 10-20 msec
Now I am trying something different: I have a temp table with words appearing in the table
plus some extra data, and I want to get the other columns from that table too
select aux.col1, aux.col2, mytable.inf1, mytable.inf2 from aux, mytable where aux.word = mytable.word;
I specifically do
analyze aux;
before the operation. On a sample my aux table has 433 entries (different words), and I get
this behaviour
Hash Join (cost=12.74..321957.88 rows=472279 width=12) (actual time=0.530..2242.837 rows=29342 loops=1)
Hash Cond: (mytable.word = aux.word)
-> Seq Scan on mytable (cost=0.00..259779.89 rows=15317989 width=12) (actual time=0.009..808.577 rows=15317717 loops=1)
-> Hash (cost=7.33..7.33 rows=433 width=14) (actual time=0.173..0.173 rows=433 loops=1)
-> Seq Scan on aux (cost=0.00..7.33 rows=433 width=14) (actual time=0.004..0.063 rows=433 loops=1)
Total runtime: 2243.924 ms
experimenting with different query data (i.e. different collections of words) reveals a
strong dependency on actual data. In particular, I split the above in two subtables.
The first one containing 45 rows produced
Nested Loop (cost=31.41..169444.22 rows=49082 width=12) (actual time=0.040..37.739 rows=590 loops=1)
-> Seq Scan on aux1 (cost=0.00..1.45 rows=45 width=18) (actual time=0.003..0.011 rows=45 loops=1)
-> Bitmap Heap Scan on mytable (cost=31.41..3751.76 rows=1091 width=12) (actual time=0.014..0.054 rows=13 loops=45)
Recheck Cond: (mytable.word = aux1.word)
-> Bitmap Index Scan on mytable_word_idx (cost=0.00..31.14 rows=1091 width=0) (actual time=0.008..0.008 rows=13 loops=45)
Index Cond: (mytable.word = aux1.word)
Total runtime: 37.800 ms
while the second one (remaining 388 rows) resulted in another sequential scan on the large table,
(and even no result within 3 minutes, when omitting the analyze step
Is there a way to "help" the planner choose the index scan
Regards
Wolfgang Hamann
From | Date | Subject | |
---|---|---|---|
Next Message | Steven Schlansker | 2013-04-22 17:37:26 | Using an index for IS DISTINCT FROM queries |
Previous Message | Rafał Pietrak | 2013-04-22 17:12:35 | Re: using text search |