Re: Full text search with ORDER BY performance issue

From: valgog <valgog(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Full text search with ORDER BY performance issue
Date: 2009-07-21 10:32:39
Message-ID: d6b6f094-e46f-43c2-bbc5-f66da847957a@q11g2000yqi.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Jul 21, 6:06 am, scott(dot)marl(dot)(dot)(dot)(at)gmail(dot)com (Scott Marlowe) wrote:
> On Mon, Jul 20, 2009 at 9:35 PM, Krade<kr(dot)(dot)(dot)(at)krade(dot)com> wrote:
> > But I think I might just do:
> > select * from a where comment_tsv @@ plainto_tsquery('query') and timestamp
> >> cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer)
> > order by timestamp desc limit 24 offset 0;
>
> > And if I get less than 24 rows, issue the regular query:
>
> > select * from a where comment_tsv @@ plainto_tsquery('query') order by
> > timestamp desc limit 24 offset 0;
>
> Couldn't you do tge second query as a with query then run another
> query to limit that result to everything greater than now()-xdays ?
>
> --
> Sent via pgsql-performance mailing list (pgsql-performa(dot)(dot)(dot)(at)postgresql(dot)org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance

Hi,

There is a problem with GIN and GIST indexes, that they cannot be used
by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it
possible to use the b-tree columns in GIST or GIN to make the sort
easier, but I have no idea how difficult it will be to implement it in
current GIN or GIST structures. I think Oleg or even Tom will be the
right people to ask it :) But even if it is possible it will not be
implemented at least until 8.5 that will need a year to come, so until
then...

It is possible to strip your table in several smaller ones putting
them on different machines and then splitting your query with DBLINK.
This will distribute the burden of sorting to several machines that
will have to sort smaller parts as well. After you have your 25 ids
from each of the machines, you can merge them, sort again and limit as
you wish. Doing large offsets will be still problematic but faster
anyway in most reasonable offset ranges. (Load balancing tools like
pg_pool can automate this task, but I do not have practical experience
using them for that purposes)

Yet another very interesting technology -- sphinx search (http://
www.sphinxsearch.com/) It can distribute data on several machines
automatically, but it will be probably too expensive to start using
(if your task is not your main one :)) as they do not have standard
automation scripts, it does not support live updates (so you will
always have some minutes delay), and this is a standalone service,
that needs to be maintained and configured and synchronized with our
main database separately (though you can use pg/python to access it
from postgres).

Good luck with your task :)

-- Valentine Gogichashvili

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Doug Hunley 2009-07-21 12:42:51 hyperthreaded cpu still an issue in 8.4?
Previous Message Scott Marlowe 2009-07-21 04:06:49 Re: Full text search with ORDER BY performance issue