Quick Links

Re: poor pefrormance with regexp searches on large tables

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Grzegorz Blinowski" <g(dot)blinowski(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject:	Re: poor pefrormance with regexp searches on large tables
Date:	2011-08-10 18:10:16
Message-ID:	4E428338020000250003FD5C@gw.wicourts.gov
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> So far I haven't seen any real indication that the time is spent
> in evaluating the regular expressions

Just as a reality check here, I ran some counts against a
moderately-sized table (half a million rows). Just counting the
rows unconditionally was about five times as fast as having to pick
out even a small column for a compare. Taking a substring of a
bigger (but normally non-TOASTed) value and doing a compare was only
a little slower. Using a regular expression anchored to the front
of the string to do the equivalent of the compare to the substring
took about twice as long as the substring approach. For a
non-anchored regular expression where it would normally need to scan
in a bit, it took twice as long as the anchored regular expression.
These times seem like they might leave some room for improvement,
but it doesn't seem too outrageous.

Each test run three times.

select count(*) from "Case";
count
--------
527769
(1 row)

Time: 47.696 ms
Time: 47.858 ms
Time: 47.687 ms

select count(*) from "Case" where "filingCtofcNo" = '0878';
count
--------
198645
(1 row)

Time: 219.233 ms
Time: 225.410 ms
Time: 226.723 ms

select count(*) from "Case"
where substring("caption" from 1 for 5) = 'State';
count
--------
178142
(1 row)

Time: 238.160 ms
Time: 237.114 ms
Time: 240.388 ms

select count(*) from "Case" where "caption" ~ '^State';
count
--------
178142
(1 row)

Time: 532.821 ms
Time: 535.341 ms
Time: 529.121 ms

select count(*) from "Case" where "caption" ~ 'Wisconsin';
count
--------
157483
(1 row)

Time: 1167.433 ms
Time: 1172.282 ms
Time: 1170.562 ms

-Kevin

In response to

Re: poor pefrormance with regexp searches on large tables at 2011-08-10 17:17:44 from Kevin Grittner

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alexis Lê-Quôc	2011-08-10 18:47:30	Re: Autovacuum running out of memory
Previous Message	Kevin Grittner	2011-08-10 17:22:49	Re: poor pefrormance with regexp searches on large tables