From: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | PostgreSQL WWW <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: Searching for pgweb |
Date: | 2017-04-03 13:37:36 |
Message-ID: | CAF4Au4xcS_t2ryygQP2Mbms+TmUS=WouzS6vpm=g6kfXhVWPYw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
On Sun, Apr 2, 2017 at 9:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>
> On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus(at)hagander(dot)net>
>> wrote:
>>
>>> On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On 29 Mar 2017 09:49, "Magnus Hagander" <magnus(at)hagander(dot)net> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus(at)hagander(dot)net>
>>>>> wrote:
>>>>>
>>>>>> Right now our main website search uses plainto_tsquery() to generate
>>>>>> the searches.
>>>>>>
>>>>>> Should we consider switching that to phraseto_tsquery() now that we
>>>>>> have phrase searching?
>>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>> Also, I suggest to use new parser, which better works _ and -, for
>>>>> example:
>>>>>
>>>>> 1.
>>>>> select ts_parse('tsparser', 'btree_gin');
>>>>> ts_parse
>>>>> ----------------
>>>>> (16,btree_gin)
>>>>> (11,btree)
>>>>> (12,_)
>>>>> (11,gin)
>>>>> (4 rows)
>>>>>
>>>>> select ts_parse('default', 'btree_gin');
>>>>> ts_parse
>>>>> -----------
>>>>> (1,btree)
>>>>> (12,_)
>>>>> (1,gin)
>>>>> (3 rows)
>>>>>
>>>>> Default parser produces too much noise, just check the difference:
>>>>>
>>>>> https://postgrespro.ru/search/?area=version&q=btree_gin&prod
>>>>> uct=postgresql&version=9.6
>>>>>
>>>>> https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=btree_gin
>>>>>
>>>>>
>>>>> 2.
>>>>> select ts_parse('tsparser', 'utc-5');
>>>>> ts_parse
>>>>> ------------
>>>>> (15,utc-5)
>>>>> (11,utc)
>>>>> (12,-)
>>>>> (9,5)
>>>>> (4 rows)
>>>>>
>>>>> select ts_parse('default', 'utc-5');
>>>>> ts_parse
>>>>> ----------
>>>>> (1,utc)
>>>>> (21,-5)
>>>>> (2 rows)
>>>>>
>>>>> again, compare
>>>>>
>>>>> https://postgrespro.ru/search/?area=version&q=utc-5&product=
>>>>> postgresql&version=9.6
>>>>>
>>>>> https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=utc-5
>>>>>
>>>>>
>>>>> We have also better parsing of email, but I'm not sure we need it on
>>>>> postgres site.
>>>>>
>>>>> We'll publish soon on github, let me know if you know it.
>>>>>
>>>>>
>>>> That sounds interesting. Two questions:
>>>>
>>>> 1. Do you have plans for contributing this one for upstream postgres,
>>>> or is it intended to be run separately?
>>>>
>>>>
>>>> We would love to do this, but currently it's there
>>>> https://github.com/postgrespro/pg_tsparser
>>>>
>>>
>>>
>>> Right, found that one. But if your long term plan is to contribute it
>>> upstream, that makes it easier to rely on :)
>>>
>>
>> I'd love if you test it, give us feedback what to improve, what to fix.
>> Then we could try to convince community to accept it.
>>
>>
> I've applied this one for testing on the main website search.
>
> At the same time I realized we didn't setweight() on the title on regular
> webpages, so I fixed that too (setting title to weight A).
>
> Basically the conf is:
>
> CREATE TEXT SEARCH DICTIONARY english_ispell (
> TEMPLATE = pg_catalog.ispell,
> dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
> CREATE TEXT SEARCH DICTIONARY pg_dict (
> TEMPLATE = pg_catalog.synonym,
> synonyms = 'pg_dict' );
> CREATE TEXT SEARCH CONFIGURATION pg (
> PARSER = tsparser );
> ALTER TEXT SEARCH CONFIGURATION pg
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH pg_dict, english_ispell, english_stem;
> ALTER TEXT SEARCH CONFIGURATION pg
> ALTER MAPPING FOR email, file, float, host, hword_numpart, int,
> numhword, numword, sfloat, uint, url, url_path, version WITH simple;
>
> If you have any other suggestions of things we should change there, please
> let me know!
>
Depending on the load I'd use also shared ispell
https://github.com/postgrespro/shared_ispell, which will save memory a lot.
>
> So far, this is on the main website search and *not* on the archives
> search. Let's try it there first, but in the long run we should use similar
> configurations.
>
> --
> Magnus Hagander
> Me: http://www.hagander.net/
> Work: http://www.redpill-linpro.com/
>
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2017-04-03 16:35:55 | archive links broken? |
Previous Message | Ashutosh Bapat | 2017-04-03 04:39:34 | Re: Please provide editor privileges for postgresql wiki |