From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: procost for to_tsvector |
Date: | 2015-03-11 16:26:04 |
Message-ID: | 20150311162604.GL12445@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-03-11 12:07:20 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2015-03-11 14:40:16 +0000, Andrew Gierth wrote:
> >> ,but even without doing that, there's a strong
> >> argument that it should be increased to at least the order of 100.
>
> Nyet ... at least not without you actually making that argument, with
> numbers, rather than just handwaving. We use 100 for plpgsql and suchlike
> functions. I'd be OK with making it 10 just on general principles, but
> claiming that it's as expensive as a plpgsql function requires
> evidence.
I'll note that you proposed a higher cost than 10 years back ;):
http://www.postgresql.org/message-id/8971.1255891843@sss.pgh.pa.us
What you said back then makes sense to me:
On 2009-10-18 14:50:43 -0400, Tom Lane wrote:
> In another case I was looking at just now, it seems that to_tsquery()
> and to_tsvector() are noticeably slower than most other built-in
> functions, which is not surprising given the amount of mechanism that
> gets invoked inside them. It would be useful to tell the planner
> about that to discourage it from picking seqscan plans that involve
> repeated execution of these functions.
A trivial comparison shows with a simple plpgsql function:
CREATE FUNCTION a_simple_plpgsql_function(a text) RETURNS text LANGUAGE plpgsql AS $$BEGIN RETURN repeat(a, 3);END;$$;
SELECT a_simple_plpgsql_function('This is a long sentence in english. Or maybe not so long after all. But it includes a Metal Ümlaut. And parens: ()! Also a number: ' ||g.i)
FROM generate_series(1, 10000) g(i)
Time: 32.898 ms
and
SELECT to_tsvector('english',
'This is a long sentence in english. Or maybe not so
long after all. But it includes a Metal Ümlaut. And
parens: ()! Also a number: ' ||g.i)
FROM generate_series(1, 10000) g(i);
Time: 450.996 ms
Given that this is a short sentence and a simple text search
configuration a factor of 10 between them doesn't sound wrong. This is
obviously completely unscientific, but ...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2015-03-11 16:27:57 | Re: One question about security label command |
Previous Message | Robert Haas | 2015-03-11 16:15:32 | Re: One question about security label command |