how can this get faster

From: hamann(dot)w(at)t-online(dot)de
To: pgsql-general(at)postgresql(dot)org
Subject: how can this get faster
Date: 2013-04-22 17:26:34
Message-ID: wolfgang-1130422192633.A0128168@amadeus3.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hi,

I have a table with mainly a list of keywords and a few odd columns referring to their origin.
There is a standard btree index, and searches are carried out as anchored regex.
Entries in the table and queries are uppercased; this turned out to have a big speed
advantage over a ~* condition.
So queries are like:
select * from mytable where word ~ '^CARNEVAL'
The table currently holds 15.000.000 rows, and query times are on the order of 10-20 msec

Now I am trying something different: I have a temp table with words appearing in the table
plus some extra data, and I want to get the other columns from that table too

select aux.col1, aux.col2, mytable.inf1, mytable.inf2 from aux, mytable where aux.word = mytable.word;

I specifically do
analyze aux;
before the operation. On a sample my aux table has 433 entries (different words), and I get
this behaviour
Hash Join (cost=12.74..321957.88 rows=472279 width=12) (actual time=0.530..2242.837 rows=29342 loops=1)
Hash Cond: (mytable.word = aux.word)
-> Seq Scan on mytable (cost=0.00..259779.89 rows=15317989 width=12) (actual time=0.009..808.577 rows=15317717 loops=1)
-> Hash (cost=7.33..7.33 rows=433 width=14) (actual time=0.173..0.173 rows=433 loops=1)
-> Seq Scan on aux (cost=0.00..7.33 rows=433 width=14) (actual time=0.004..0.063 rows=433 loops=1)
Total runtime: 2243.924 ms

experimenting with different query data (i.e. different collections of words) reveals a
strong dependency on actual data. In particular, I split the above in two subtables.
The first one containing 45 rows produced
Nested Loop (cost=31.41..169444.22 rows=49082 width=12) (actual time=0.040..37.739 rows=590 loops=1)
-> Seq Scan on aux1 (cost=0.00..1.45 rows=45 width=18) (actual time=0.003..0.011 rows=45 loops=1)
-> Bitmap Heap Scan on mytable (cost=31.41..3751.76 rows=1091 width=12) (actual time=0.014..0.054 rows=13 loops=45)
Recheck Cond: (mytable.word = aux1.word)
-> Bitmap Index Scan on mytable_word_idx (cost=0.00..31.14 rows=1091 width=0) (actual time=0.008..0.008 rows=13 loops=45)
Index Cond: (mytable.word = aux1.word)
Total runtime: 37.800 ms
while the second one (remaining 388 rows) resulted in another sequential scan on the large table,
(and even no result within 3 minutes, when omitting the analyze step

Is there a way to "help" the planner choose the index scan

Regards
Wolfgang Hamann

Browse pgsql-general by date

  From Date Subject
Next Message Steven Schlansker 2013-04-22 17:37:26 Using an index for IS DISTINCT FROM queries
Previous Message Rafał Pietrak 2013-04-22 17:12:35 Re: using text search