From: | Steve Atkins <steve(at)blighty(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: For full text indexing, which is better, tsearch2 or |
Date: | 2003-11-28 05:04:17 |
Message-ID: | 20031128050417.GA14227@gp.word-to-the-wise.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, Nov 26, 2003 at 09:12:30PM -0800, Steve Atkins wrote:
> On Thu, Nov 27, 2003 at 12:41:59PM +0800, Christopher Kings-Lynne wrote:
> > >Does anyone have any metrics on how fast tsearch2 actually is?
> > >
> > >I tried it on a synthetic dataset of a million documents of a hundred
> > >words each and while insertions were impressively fast I gave up on
> > >the search after 10 minutes.
> > >
> > >Broken? Unusable slow? This was on the last 7.4 release candidate.
> >
> > I just created a 1.1million row dataset by copying one of our 30000 row
> > production tables and just taking out the txtidx column. Then I
> > inserted it into itself until it had 1.1 million rows.
> >
> > Then I created the GiST index - THAT took forever - seriously like 20
> > mins or half an hour or something.
> >
> > Now, to find a word:
> >
> > select * from tsearchtest where ftiidx ## 'curry';
> > Time: 9760.75 ms
>
> > So, I have no idea why you think it's slow? Perhaps you forgot the
> > 'create index using gist' step?
>
> No, it was indexed.
>
> Thanks, that was the datapoint I was looking for. It _can_ run fast, so
> I just need to work out what's going on. (It's hard to diagnose a slow
> query when you've no idea whether it's really 'slow').
Looking at it further, something is very broken, possibly with GIST
indices, possibly with tsearch2s use of 'em.
This is on a newly built 7.4 installation, built with 64 bit
datetimes, but completely stock other than that. Stock gcc 3.3.2,
Linux, somewhat elderly 2.4.18 kernel. Running on a 1.5GHz single
processor Athlon with a half gig of RAM. Configuration set to use 20%
of RAM as shared buffers (amongst other settings, this was the last of
a range I tried looking for variation).
Software RAID0 across two 7200RPM SCSI drives, reiserfs (it's a
development box, not a production system). System completely idle
apart from postgresql.
269000 rows, each row having 400 words. Analyzed.
Running the select query given below appears to pause a process trying
to insert into the table completely (locking issue? I/O bandwidth?).
top shows the select below consuming <2% of CPU and iostat shows it reading
~2800 blocks/second from each of the two RAID drives.
Physical size of the database is under 3 gigs, including toast and index
tables.
The select query takes around 6 minutes (consistently, even if the same
identical query is repeated).
For entertainment, I turned off indexscan and the query takes 1
minute with a simple seqscan.
Any thoughts?
Cheers,
Steve
=> select count(*) from ftstest;
count
--------
269000
(1 row)
=> \d ftstest
Table "public.ftstest"
Column | Type | Modifiers
--------+----------+----------------------------------------------------------
idx | integer | not null default nextval('public.ftstest_idx_seq'::text)
words | text | not null
idxfti | tsvector | not null
Indexes:
"ftstest_idx" gist (idxfti)
=> explain analyze select idx from ftstest where idxfti @@ 'dominican'::tsquery;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Index Scan using ftstest_idx on ftstest (cost=0.00..515.90 rows=271 width=4) (actual time=219.694..376042.428 rows=4796 loops=1)
Index Cond: (idxfti @@ '\'dominican\''::tsquery)
Filter: (idxfti @@ '\'dominican\''::tsquery)
Total runtime: 376061.541 ms
(4 rows)
((Set enable_indexscan=false))
=> explain analyze select idx from ftstest where idxfti @@ 'dominican'::tsquery;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on ftstest (cost=0.00..5765.88 rows=271 width=4) (actual time=42.589..62158.285 rows=4796 loops=1)
Filter: (idxfti @@ '\'dominican\''::tsquery)
Total runtime: 62182.277 ms
(3 rows)
From | Date | Subject | |
---|---|---|---|
Next Message | Christopher Kings-Lynne | 2003-11-28 05:18:48 | Re: For full text indexing, which is better, tsearch2 or |
Previous Message | Stefan Champailler | 2003-11-27 19:57:06 | Re: Impossibly slow DELETEs |