From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
Cc: | PostgreSQL WWW <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: Searching for pgweb |
Date: | 2017-04-05 16:52:25 |
Message-ID: | CABUevExcUE0ggT_aPadQWDwU5KaNgNP_4bFvOMc7u7JMrBQLVQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
On Mon, Apr 3, 2017 at 3:37 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
>
>
> On Sun, Apr 2, 2017 at 9:37 AM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>>
>>
>> On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus(at)hagander(dot)net>
>>> wrote:
>>>
>>>> On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 29 Mar 2017 09:49, "Magnus Hagander" <magnus(at)hagander(dot)net> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus(at)hagander(dot)net
>>>>>> > wrote:
>>>>>>
>>>>>>> Right now our main website search uses plainto_tsquery() to generate
>>>>>>> the searches.
>>>>>>>
>>>>>>> Should we consider switching that to phraseto_tsquery() now that we
>>>>>>> have phrase searching?
>>>>>>>
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Also, I suggest to use new parser, which better works _ and -, for
>>>>>> example:
>>>>>>
>>>>>> 1.
>>>>>> select ts_parse('tsparser', 'btree_gin');
>>>>>> ts_parse
>>>>>> ----------------
>>>>>> (16,btree_gin)
>>>>>> (11,btree)
>>>>>> (12,_)
>>>>>> (11,gin)
>>>>>> (4 rows)
>>>>>>
>>>>>> select ts_parse('default', 'btree_gin');
>>>>>> ts_parse
>>>>>> -----------
>>>>>> (1,btree)
>>>>>> (12,_)
>>>>>> (1,gin)
>>>>>> (3 rows)
>>>>>>
>>>>>> Default parser produces too much noise, just check the difference:
>>>>>>
>>>>>> https://postgrespro.ru/search/?area=version&q=btree_gin&prod
>>>>>> uct=postgresql&version=9.6
>>>>>>
>>>>>> https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=btree_gin
>>>>>>
>>>>>>
>>>>>> 2.
>>>>>> select ts_parse('tsparser', 'utc-5');
>>>>>> ts_parse
>>>>>> ------------
>>>>>> (15,utc-5)
>>>>>> (11,utc)
>>>>>> (12,-)
>>>>>> (9,5)
>>>>>> (4 rows)
>>>>>>
>>>>>> select ts_parse('default', 'utc-5');
>>>>>> ts_parse
>>>>>> ----------
>>>>>> (1,utc)
>>>>>> (21,-5)
>>>>>> (2 rows)
>>>>>>
>>>>>> again, compare
>>>>>>
>>>>>> https://postgrespro.ru/search/?area=version&q=utc-5&product=
>>>>>> postgresql&version=9.6
>>>>>>
>>>>>> https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=utc-5
>>>>>>
>>>>>>
>>>>>> We have also better parsing of email, but I'm not sure we need it on
>>>>>> postgres site.
>>>>>>
>>>>>> We'll publish soon on github, let me know if you know it.
>>>>>>
>>>>>>
>>>>> That sounds interesting. Two questions:
>>>>>
>>>>> 1. Do you have plans for contributing this one for upstream postgres,
>>>>> or is it intended to be run separately?
>>>>>
>>>>>
>>>>> We would love to do this, but currently it's there
>>>>> https://github.com/postgrespro/pg_tsparser
>>>>>
>>>>
>>>>
>>>> Right, found that one. But if your long term plan is to contribute it
>>>> upstream, that makes it easier to rely on :)
>>>>
>>>
>>> I'd love if you test it, give us feedback what to improve, what to fix.
>>> Then we could try to convince community to accept it.
>>>
>>>
>> I've applied this one for testing on the main website search.
>>
>> At the same time I realized we didn't setweight() on the title on regular
>> webpages, so I fixed that too (setting title to weight A).
>>
>> Basically the conf is:
>>
>> CREATE TEXT SEARCH DICTIONARY english_ispell (
>> TEMPLATE = pg_catalog.ispell,
>> dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
>> CREATE TEXT SEARCH DICTIONARY pg_dict (
>> TEMPLATE = pg_catalog.synonym,
>> synonyms = 'pg_dict' );
>> CREATE TEXT SEARCH CONFIGURATION pg (
>> PARSER = tsparser );
>> ALTER TEXT SEARCH CONFIGURATION pg
>> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>> word, hword, hword_part
>> WITH pg_dict, english_ispell, english_stem;
>> ALTER TEXT SEARCH CONFIGURATION pg
>> ALTER MAPPING FOR email, file, float, host, hword_numpart, int,
>> numhword, numword, sfloat, uint, url, url_path, version WITH simple;
>>
>> If you have any other suggestions of things we should change there,
>> please let me know!
>>
>
>
> Depending on the load I'd use also shared ispell https://github.com/
> postgrespro/shared_ispell, which will save memory a lot.
>
Our load is pretty low, so we don't really have need for that one at this
point. But I'll try to remember it :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-04-05 17:43:42 | Re: [pgsql-www] Small issue in online devel documentation build |
Previous Message | KiSung Kim | 2017-04-05 15:41:59 | Wiki editor request |