| From: | Gordan Bobic <gordan(at)bobich(dot)net> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | FTI Queries and Explain | 
| Date: | 2001-10-15 15:25:00 | 
| Message-ID: | 200110151525.f9FFP1T14870@sentinel.bobich.net | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Hi.
I've been playing with Full Text Indexing for a few days now, and there is 
something in the behaviour of the queries that I don't fully understand. I 
have set up a little test database that contains sample job adverts (it's the 
sample data I had lying around froma different project)
Selecting on 1 field returns results blindingly fast, as one would expect 
when indices are used. However, selecting on 2 fields takes forever.
I have done SET ENABLE_SEQSCAN=OFF.
Here's output of explain:
postgres=> explain select jobs.title from jobs, jobs_description_fti, 
jobs_title_fti where (jobs_description_fti.string = 'linux' or 
jobs_title_fti.string = 'linux') and (jobs_description_fti.id = jobs.oid and 
jobs_title_fti.id = jobs.oid);
NOTICE:  QUERY PLAN:
Nested Loop  (cost=200000018.60..200000027.18 rows=1 width=48)
  ->  Merge Join  (cost=200000018.60..200000024.31 rows=1 width=32)
        ->  Sort  (cost=100000004.09..100000004.09 rows=75 width=16)
              ->  Seq Scan on jobs_description_fti  
(cost=100000000.00..100000001.75 rows=75 width=16)
        ->  Sort  (cost=100000014.51..100000014.51 rows=251 width=16)
              ->  Seq Scan on jobs_title_fti  
(cost=100000000.00..100000004.51 rows=251 width=16)
  ->  Index Scan using jobs_description_oid_index on jobs  (cost=0.00..2.01 
rows=1 width=16)
EXPLAIN
This means, if I am understanding things correctly, that jobs_description_fti 
is scanned with a sequential scan. That would explain the slowness.
Hwever, doing a:
explain select jobs.title from jobs, jobs_description_fti where 
(jobs_description_fti.string = 'linux') and (jobs_description_fti.id = 
jobs.oid);
NOTICE:  QUERY PLAN:
Nested Loop  (cost=0.00..4.04 rows=1 width=20)
  ->  Index Scan using jobs_description_fti_index on jobs_description_fti  
(cost=0.00..2.01 rows=1 width=4)
  ->  Index Scan using jobs_description_oid_index on jobs  (cost=0.00..2.01 
rows=1 width=16)
yields lightning fast results, as one would expect. Why does selecting from 
two fields on an "or" basis cause both scans to be sequential? Even when 
sequential scans are "disabled"? Because of the breakdown of descriptions 
into thousands of lookup rows in the FTI tables, this is actually slower than 
doing an index-less "ILIKE" search on both of the fields because of the huge 
number of records in the lookup tables...
Can anyone suggest a way to do a two field "or" match using the FTI and 
indices?
Cheers.
Gordan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nick Fankhauser | 2001-10-15 15:33:48 | Re: Newbie | 
| Previous Message | Denis Gasparin | 2001-10-15 15:06:05 | Re: VACUUM, 24/7 availability and 7.2 |