Tsearch2 performance on big database

From: Rick Jansen <rick(at)rockingstone(dot)nl>
To: pgsql-performance(at)postgresql(dot)org
Subject: Tsearch2 performance on big database
Date: 2005-03-22 12:28:07
Message-ID: 42400F57.3040604@rockingstone.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I'm looking for a *fast* solution to search thru ~ 4 million records of
book descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron
server with 8G of memory, running Linux 2.6. I haven't done a lot of
tuning on PostgreSQL itself, but here's the settings I have changed so far:

shared_buffers = 2000 (anything much bigger says the kernel doesnt allow
it, still have to look into that)
effective_cache_size = 32768

Here's my table:

ilab=# \d books
Table "public.books"
Column | Type | Modifiers
---------------+------------------------+----------------------------------------------------------
recordnumber | integer | not null default
nextval('books_recordnumber_seq'::text)
membernumber | integer | not null default 0
booknumber | character varying(20) | not null default
''::character varying
author | character varying(60) | not null default
''::character varying
titel | text | not null
description | character varying(100) | not null default
''::character varying
descriprest | text | not null
price | bigint | not null default 0::bigint
keywords | character varying(100) | not null default
''::character varying
dollarprice | bigint | not null default 0::bigint
countrynumber | smallint | not null default 0::smallint
entrydate | date | not null
status | smallint | not null default 0::smallint
recordtype | smallint | not null default 0::smallint
bookflags | smallint | not null default 0::smallint
year | smallint | not null default 0::smallint
firstedition | smallint | not null default 0::smallint
dustwrapper | smallint | not null default 0::smallint
signed | smallint | not null default 0::smallint
cover | smallint | not null default 0::smallint
specialfield | smallint | not null default 0::smallint
idxfti | tsvector |
Indexes:
"recordnumber_idx" unique, btree (recordnumber)
"idxfti_idx" gist (idxfti)

idxfti is a tsvector of concatenated description and descriprest.

ilab=# select
avg(character_length(description)),avg(character_length(descriprest))
from books;
avg | avg
---------------------+----------------------
89.1596992873947218 | 133.0468689304200538

Queries take forever to run. Right now we run a MySQL server, on which
we maintain our own indices (we split the description fields by word and
have different tables for words and the bookdescriptions they appear in).

For example, a query for the word 'terminology' on our MySQL search
takes 5.8 seconds and returns 375 results. The same query on postgresql
using the tsearch2 index takes 30802.105 ms and returns 298 results.

How do I speed this up? Should I change settings, add or change indexes
or.. what?

Rick Jansen
--
Systems Administrator for Rockingstone IT
http://www.rockingstone.com
http://www.megabooksearch.com - Search many book listing sites at once

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Oleg Bartunov 2005-03-22 12:36:11 Re: Tsearch2 performance on big database
Previous Message Richard Huxton 2005-03-22 11:56:24 Re: What about utility to calculate planner cost constants?