Re:

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re:
Date: 2002-12-03 03:56:04
Message-ID: 1038887764.4660.49.camel@haggis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 2002-12-02 at 12:30, typea(at)l-i-e(dot)com wrote:
> [I hope job postings are kosher...]
>
> I need help optimizing a PostgreSQL application:
>
> Full-text search
> ~17,000 records
> Articles (text) are about 10K long on average, ranging from 0 to 278K.
>
> I don't know if we need to throw more RAM, more hard drive, more
> comparison RAM in postmaster.conf or build a concordance or if this is
> just not something that can be done within our budget.
>
> I can't even seem to get the PostgreSQL profiling output using "-s" in the
> startup of postmaster and client to determine what the db engine is doing.
>
> I don't understand why PostgreSQL sometimes chooses not to use the
> existing INDEXes to do an index scan instead of sequential scan -- Does it
> really think sequential will be faster, or does it eliminate an index scan
> because there won't be enough hard drive or swap space to do it?
>
> Currently, full text search queries take on the order of 2 minutes to
> execute.
> We need them to be happening in 5 seconds, if at all possible.
>
> Unfortunately, this needs to happen EARLY THIS WEEK, if at all possible.
>
> Contact me off-list with some idea of price/availability/references if you
> are interested in taking on this task.

After reading the thread to see that your box has what looks like
1GB RAM, and firing up bc(1) to see that 17K articles each of
which is ~10KB == 166MB, it seems to this simple mind that given
enough buffers, you could suck all of the articles into the
buffers. Thus, no more disk IO, but boy would it burn up the CPU!

Also, I think that I might write some sort of "book index pre-processor"
to run against each article, to create, for each article, a list of
words plus byte offsets. (Some tweaking would have to occur in order
to handle capitalization vagaries. Probably capitalize all "index
words".) (Yes, this method has the limitation of [sub-]word searches
instead of arbitrary string searches,

Then, insert all that data into a 3rd table (T_LOOKUP) whose structure
is:
val TEXT (primary key)
article_name TEXT
byte_offset INTEGER

Then, 'EINSTEIN%' queries would go against T_LOOKUP instead of the
articles table.

--
+------------------------------------------------------------+
| Ron Johnson, Jr. mailto:ron(dot)l(dot)johnson(at)cox(dot)net |
| Jefferson, LA USA http://members.cox.net/ron.l.johnson |
| |
| "they love our milk and honey, but preach about another |
| way of living" |
| Merle Haggard, "The Fighting Side Of Me" |
+------------------------------------------------------------+

In response to

  • at 2002-12-02 18:30:50 from typea

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message li li 2002-12-03 05:46:43 Is there any limitations
Previous Message ir. F.T.M. van Vugt bc. 2002-12-02 23:51:03 Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION