From: | Nicolas Grilly <nicolas(at)gardentechno(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: ts_rank seems very slow (140 ranked documents / second on my machine) |
Date: | 2011-07-13 16:55:49 |
Message-ID: | CAG3yVS7Jn7oF+jfmsJhvLYvMErR+co+Q0HOqJ=zeFR66KEgq6g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Jul 13, 2011 at 17:36, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> I didn't notice, reading 40K tuples in random order takes a long time and
> this
> is a problem of any database. Can you measure time to read all documents
> found ?
As you asked, I measured the time required to read all documents.
For reference, after having dropped the operating system cache, my
machine can read a 1 GB file in 20 seconds, that is 50 MB / second.
Here are the stats for table posts_1000000:
Table size: 117 MB
TOAST table size: 8356 MB
Index size: 1720 MB
I forced PostgreSQL to read all documents using the following query,
which doesn't involve ts_rank:
explain analyze select sum(length(document_vector)) from posts_1000000;
Aggregate (cost=27472.52..27472.53 rows=1 width=18) (actual
time=346952.556..346952.557 rows=1 loops=1)
-> Seq Scan on posts_1000000 (cost=0.00..24975.01 rows=999001
width=18) (actual time=0.023..1793.523 rows=999001 loops=1)
Total runtime: 346952.595 ms
Then I ran a similar query that involves ts_rank:
explain analyze select sum(ts_rank_cd(document_vector,
to_tsquery('english', 'crare'), 32)) from posts_1000000
Aggregate (cost=27472.52..27472.53 rows=1 width=18) (actual
time=373713.957..373713.958 rows=1 loops=1)
-> Seq Scan on posts_1000000 (cost=0.00..24975.01 rows=999001
width=18) (actual time=20.045..1847.897 rows=999001 loops=1)
Total runtime: 373714.031 ms
The first query ran in 347 seconds; the second one in 374 seconds.
Conclusion: There is no significant overhead in the ts_rank function
itself. It's slow because ts_rank has to read in random order 40 000
ts_vector stored in TOAST table. The slow execution time looks like
a direct consequence of storing ts_vector in TOAST table...
> :( The only solution I see is to store enough information for ranking in index.
Is it the expected behavior? How can I improve that?
Thanks,
Nicolas
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Travers | 2011-07-13 17:02:00 | Re: Server stops responding randomly for 5 minutes |
Previous Message | Joseph Marlin | 2011-07-13 15:55:45 | Re: Web-based Graphical Query Building Tool for PostgreSQL |