From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: WIP: shared ispell dictionary |
Date: | 2010-03-18 15:08:39 |
Message-ID: | 162867791003180808p49a047cfj72d1d89ce5121d9e@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2010/3/18 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> I know so Tom worries about using of share memory.
>
> You're right, and if I have any say in the matter no patch like this
> will ever go in.
>
> What I would suggest looking into is some way of preprocessing the raw
> text dictionary file into a format that can be slurped into memory
> quickly. The main problem compared to the way things are done now
> is that the current internal format relies heavily on pointers.
> Maybe you could replace those by offsets?
You have to maintain a new application :( There can be a new kind of bugs.
I playing with preload solution now. And I found a new issue.
I don't know why, but when I preload library with large mem like
ispell, then all next operations are ten times slower :(
[pavel(at)nemesis tsearch]$ psql-dev3 postgres
Timing is on.
psql-dev3 (9.0devel)
Type "help" for help.
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 0,611 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 0,277 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 0,266 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 0,348 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------
asciiword | Word, all ASCII | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | se | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Pavel | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
blank | Space symbols | | {} |
|
word | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | a | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
word | Word, all letters | bydlím | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | ve | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)
Time: 24,495 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------
asciiword | Word, all ASCII | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | se | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Pavel | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
blank | Space symbols | | {} |
|
word | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | a | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
word | Word, all letters | bydlím | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | ve | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)
...skipping...
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------
asciiword | Word, all ASCII | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | se | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Pavel | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
blank | Space symbols | | {} |
|
word | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | a | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
word | Word, all letters | bydlím | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | ve | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)
~
~
~
Time: 18,426 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 12,700 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 12,465 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 12,603 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 12,901 ms
postgres=# select 10;
?column?
----------
10
(1 row)
Time: 12,642 ms
When I reduce memory with simple allocator, then this issue is
removed, but it is strange.
Pavel
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2010-03-18 15:15:04 | Re: WIP: shared ispell dictionary |
Previous Message | Tom Lane | 2010-03-18 14:40:32 | Re: WIP: shared ispell dictionary |