From: | "Pierre C" <lists(at)peufeu(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org, "Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com> |
Subject: | Re: Slow count(*) again... |
Date: | 2010-10-11 10:09:02 |
Message-ID: | op.vkeo5cw2eorkce@apollo13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here
Well, tsearch full text search is excellent, but it has to work inside the
limits of the postgres database itself, which means row visibility checks,
and therefore, yes, extremely slow count(*) on large result sets when the
tables are not cached in RAM.
Also, if you want to use custom sorting (like by date, thread, etc)
possibly all the matching rows will have to be read and sorted.
Consider, for example, the Xapian full text search engine. It is not MVCC
(it is single writer, multiple reader, so only one process can update the
index at a time, but readers are not locked out during writes). Of course,
you would never want something like that for your main database ! However,
in its particular application, which is multi-criteria full text search
(and flexible sorting of results), it just nukes tsearch2 on datasets not
cached in RAM, simply because everything in it including disk layout etc,
has been optimized for the application. Lucene is similar (but I have not
benchmarked it versus tsearch2, so I can't tell).
So, if your full text search is a problem, just use Xapian. You can update
the Xapian index from a postgres trigger (using an independent process, or
simply, a plpython trigger using the python Xapian bindings). You can
query it using an extra process acting as a server, or you can write a
set-returning plpython function which performs Xapian searches, and you
can join the results to your tables.
> Pg will never have such a fast count() as MyISAM does or the same
> insanely fast read performance,
Benchmark it you'll see, MyISAM is faster than postgres for "small simple
selects", only if :
- pg doesn't use prepared queries (planning time takes longer than a
really simple select)
- myisam can use index-only access
- noone is writing to the myisam table at the moment, obviously
On equal grounds (ie, SELECT * FROM table WHERE pk = value) there is no
difference. The TCP/IP overhead is larger than the query anyway, you have
to use unix sockets on both to get valid timings. Since by default on
localhost MySQL seems to use unix sockets and PG uses tcp/ip, PG seem 2x
slower, which is in fact not true.
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2010-10-11 10:33:11 | Re: wip: functions median and percentile |
Previous Message | Pavel Stehule | 2010-10-11 09:55:16 | Re: wip: functions median and percentile |
From | Date | Subject | |
---|---|---|---|
Next Message | Craig James | 2010-10-11 17:46:17 | Re: Slow count(*) again... |
Previous Message | Scott Marlowe | 2010-10-11 06:46:50 | Re: join order |