Re: Slow count(*) again...

From: "Pierre C" <lists(at)peufeu(dot)com>
To: pgsql-performance(at)postgresql(dot)org, "Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com>
Subject: Re: Slow count(*) again...
Date: 2010-10-11 10:09:02
Message-ID: op.vkeo5cw2eorkce@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here

Well, tsearch full text search is excellent, but it has to work inside the
limits of the postgres database itself, which means row visibility checks,
and therefore, yes, extremely slow count(*) on large result sets when the
tables are not cached in RAM.

Also, if you want to use custom sorting (like by date, thread, etc)
possibly all the matching rows will have to be read and sorted.

Consider, for example, the Xapian full text search engine. It is not MVCC
(it is single writer, multiple reader, so only one process can update the
index at a time, but readers are not locked out during writes). Of course,
you would never want something like that for your main database ! However,
in its particular application, which is multi-criteria full text search
(and flexible sorting of results), it just nukes tsearch2 on datasets not
cached in RAM, simply because everything in it including disk layout etc,
has been optimized for the application. Lucene is similar (but I have not
benchmarked it versus tsearch2, so I can't tell).

So, if your full text search is a problem, just use Xapian. You can update
the Xapian index from a postgres trigger (using an independent process, or
simply, a plpython trigger using the python Xapian bindings). You can
query it using an extra process acting as a server, or you can write a
set-returning plpython function which performs Xapian searches, and you
can join the results to your tables.

> Pg will never have such a fast count() as MyISAM does or the same
> insanely fast read performance,

Benchmark it you'll see, MyISAM is faster than postgres for "small simple
selects", only if :
- pg doesn't use prepared queries (planning time takes longer than a
really simple select)
- myisam can use index-only access
- noone is writing to the myisam table at the moment, obviously

On equal grounds (ie, SELECT * FROM table WHERE pk = value) there is no
difference. The TCP/IP overhead is larger than the query anyway, you have
to use unix sockets on both to get valid timings. Since by default on
localhost MySQL seems to use unix sockets and PG uses tcp/ip, PG seem 2x
slower, which is in fact not true.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2010-10-11 10:33:11 Re: wip: functions median and percentile
Previous Message Pavel Stehule 2010-10-11 09:55:16 Re: wip: functions median and percentile

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2010-10-11 17:46:17 Re: Slow count(*) again...
Previous Message Scott Marlowe 2010-10-11 06:46:50 Re: join order