Re: Tsearch2 performance on big database

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Mike Rylander <mrylander(at)gmail(dot)com>
Cc: Rick Jansen <rick(at)rockingstone(dot)nl>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Tsearch2 performance on big database
Date: 2005-03-22 15:45:17
Message-ID: Pine.GSO.4.62.0503221840480.5508@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mike,

no comments before Rick post tsearch configs and increased buffers !
Union shouldn't be faster than (term1|term2).
tsearch2 internals description might help you understanding tsearch2 limitations.
See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
Also, don't miss my notes:
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes

Oleg
On Tue, 22 Mar 2005, Mike Rylander wrote:

> On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <rick(at)rockingstone(dot)nl> wrote:
>>
>> ilab=# explain analyze select count(titel) from books where idxfti @@
>> to_tsquery('default', 'buckingham | palace');
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------------------------------
>> Aggregate (cost=35547.99..35547.99 rows=1 width=56) (actual
>> time=125968.119..125968.120 rows=1 loops=1)
>> -> Index Scan using idxfti_idx on books (cost=0.00..35525.81
>> rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1)
>> Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery)
>> Total runtime: 125968.212 ms
>> (4 rows)
>>
>> Time: 125969.264 ms
>> ilab=#
>
> Ahh... I should have qualified my claim. I am creating a google-esqe
> search interface and almost every query uses '&' as the term joiner.
> 'AND' queries and one-term queries are orders of magnitude faster than
> 'OR' queries, and fortunately are the expected default for most users.
> (Think, "I typed in these words, therefore I want to match these
> words"...) An interesting test may be to time multiple queries
> independently, one for each search term, and see if the combined cost
> is less than a single 'OR' search. If so, you could use UNION to join
> the results.
>
> However, the example you originally gave ('terminology') should be
> very fast. On a comparable query ("select count(value) from
> metabib.full_rec where index_vector @@ to_tsquery('default','jane');")
> I get 12ms.
>
> Oleg, do you see anything else on the surface here?
>
> Try:
>
> EXPLAIN ANALYZE
> SELECT titel FROM books WHERE idxfti @@
> to_tsquery('default', 'buckingham')
> UNION
> SELECT titel FROM books WHERE idxfti @@
> to_tsquery('default', 'palace');
>
> and see if using '&' instead of '|' where you can helps out. I
> imagine you'd be surprised by the speed of:
>
> SELECT titel FROM books WHERE idxfti @@
> to_tsquery('default', 'buckingham&palace');
>
>
>>
>> > As an example of what I think you *should* be seeing, I have a similar
>> > box (4 procs, but that doesn't matter for one query) and I can search
>> > a column with tens of millions of rows in around a second.
>> >
>>
>> That sounds very promising, I'd love to get those results.. could you
>> tell me what your settings are, howmuch memory you have and such?
>
> 16G of RAM on a dedicated machine.
>
>
> shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each
> work_mem = 10240 # min 64, size in KB
> maintenance_work_mem = 1000000 # min 1024, size in KB
> # big m_w_m for loading data...
>
> random_page_cost = 2.5 # units are one sequential page fetch cost
> # fast drives, and tons of RAM
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hannu Krosing 2005-03-22 15:59:16 Re: What needs to be done for real Partitioning?
Previous Message Mike Rylander 2005-03-22 15:30:03 Re: Tsearch2 performance on big database