From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | WIP: shared ispell dictionary |
Date: | 2010-03-18 10:33:46 |
Message-ID: | 162867791003180333s1933e5b7g9208dd9a2bb681c6@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello
attached patch add possibility to share ispell dictionary between
processes. The reason for this is the slowness of first tsearch query
and size of allocated memory per process. When I tested loading of
ispell dictionary (for Czech language) I got about 500 ms and 48MB.
With simple allocator it uses only 25 MB. If we remove some check and
tolower string transformation from loading stage it needs only 200 ms.
But with broken dict or affix file it can put wrong results. This
patch significantly reduce load on servers that use ispell
dictionaries.
I know so Tom worries about using of share memory. I think so it
unnecessarily. After loading data from dictionary are only read, never
modified. Second idea - this dictionary template can be distributed as
separate project (it needs a few changes in core - and simple
allocator).
Using:
a) set shared_data = 26MB (postgres.conf)
b) restart
c) register dictionary with option "share=yes"
CREATE TEXT SEARCH DICTIONARY cspell
(template=ispell, dictfile = czech, afffile=czech, stopwords=czech,
share = yes);
[pavel(at)nemesis src]$ psql-dev3 postgres
Timing is on.
psql-dev3 (9.0devel)
Type "help" for help.
postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň se napil
žluté vody');
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+-----------+-----------------+------------+-------------
word | Word, all letters | Příliš | {cspell,simple} | cspell
| {příliš}
blank | Space symbols | | {} | |
word | Word, all letters | žluťoučký | {cspell,simple} | cspell
| {žluťoučký}
blank | Space symbols | | {} | |
word | Word, all letters | kůň | {cspell,simple} | cspell
| {kůň}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | se | {cspell,simple} | cspell | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | napil | {cspell,simple} | cspell
| {napít}
blank | Space symbols | | {} | |
word | Word, all letters | žluté | {cspell,simple} | cspell
| {žlutý}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | vody | {cspell,simple} | cspell
| {voda}
(13 rows)
Time: 8,178 ms <<-- without patch 500ms
Limits and ToDo:
a) it support only simple regular expressions
b) it doesn't solve cache reset a shared memory deallocation
Regards
Pavel Stehule
Attachment | Content-Type | Size |
---|---|---|
shared_dictionary_02.diff | application/octet-stream | 40.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Gokulakannan Somasundaram | 2010-03-18 11:06:11 | Re: An idle thought |
Previous Message | Simon Riggs | 2010-03-18 09:43:24 | Re: Command to prune archive at restartpoints |