From: | Grzegorz Blinowski <g(dot)blinowski(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | poor pefrormance with regexp searches on large tables |
Date: | 2011-08-10 14:26:18 |
Message-ID: | CAF=aNMHHzBKE8_P0LykDb8Bvk7eHKdytcZ9xAkz=gtpeTj87+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Dear All,
I have some problems with regexp queries performance - common sense tells me
that my queries should run faster than they do.
The database - table in question has 590 K records, table's size is 3.5GB. I
am effectively querying a single attribute "subject" which has an average
size of 2KB, so we are doing a query on ~1GB of data. The query looks more
or less like this:
SELECT T.tender_id FROM archive_tender T WHERE
(( T.subject !~* '\\mpattern1.*\\M' ) AND ( T.subject ~* '\\mpattern2\\M' OR
[4-5 more similar terms] ) AND T.erased = 0 AND T.rejected = 0
ORDER BY
tender_id DESC
LIMIT
10000;
The planner shows seq scan on subject which is OK with regexp match.
Now, the query above takes about 60sec to execute; exactly: 70s for the
first run and 60s for the next runs. In my opinion this is too long: It
should take 35 s to read the whole table into RAM (assuming 100 MB/s
transfers - half the HDD benchmarked speed). With 12 GB of RAM the whole
table should be easily buffered on the operating system level. The regexp
match on 1 GB of data takes 1-2 s (I benchmarked it with a simple pcre
test). The system is not in the production mode, so there is no additional
database activity (no reads, no updates, effectively db is read-only)
To summarize: any idea how to speed up this query? (please, don't suggest
regexp indexing - in this application it would be too time consuming to
implement them, and besides - as above - I think that Postgres should do
better here even with seq-scan).
Server parameters:
RAM: 12 GB
Cores: 8
HDD: SATA; shows 200 MB/s transfer speed
OS: Linux 64bit; Postgres 8.4
Some performance params from postgresql.conf:
max_connections = 16
shared_buffers = 24MB
temp_buffers = 128MB
max_prepared_transactions = 50
work_mem = 128MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB
Database is vacuumed.
Regards,
Greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2011-08-10 15:08:29 | Re: poor pefrormance with regexp searches on large tables |
Previous Message | Craig Ringer | 2011-08-10 06:05:56 | Re: Fwd: Suspected Postgres Datacorruption |