Re: Searching for pgweb

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Searching for pgweb
Date: 2017-04-02 13:37:02
Message-ID: CABUevEzPfCtbH1Qg9nDQNkwgzw2vUqg7yQgCEgygpRy4f45_HQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:

>
>
> On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>> On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>> wrote:
>>
>>>
>>>
>>> On 29 Mar 2017 09:49, "Magnus Hagander" <magnus(at)hagander(dot)net> wrote:
>>>
>>>
>>>
>>> On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus(at)hagander(dot)net>
>>>> wrote:
>>>>
>>>>> Right now our main website search uses plainto_tsquery() to generate
>>>>> the searches.
>>>>>
>>>>> Should we consider switching that to phraseto_tsquery() now that we
>>>>> have phrase searching?
>>>>>
>>>>
>>>> +1
>>>>
>>>> Also, I suggest to use new parser, which better works _ and -, for
>>>> example:
>>>>
>>>> 1.
>>>> select ts_parse('tsparser', 'btree_gin');
>>>> ts_parse
>>>> ----------------
>>>> (16,btree_gin)
>>>> (11,btree)
>>>> (12,_)
>>>> (11,gin)
>>>> (4 rows)
>>>>
>>>> select ts_parse('default', 'btree_gin');
>>>> ts_parse
>>>> -----------
>>>> (1,btree)
>>>> (12,_)
>>>> (1,gin)
>>>> (3 rows)
>>>>
>>>> Default parser produces too much noise, just check the difference:
>>>>
>>>> https://postgrespro.ru/search/?area=version&q=btree_gin&prod
>>>> uct=postgresql&version=9.6
>>>>
>>>> https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=btree_gin
>>>>
>>>>
>>>> 2.
>>>> select ts_parse('tsparser', 'utc-5');
>>>> ts_parse
>>>> ------------
>>>> (15,utc-5)
>>>> (11,utc)
>>>> (12,-)
>>>> (9,5)
>>>> (4 rows)
>>>>
>>>> select ts_parse('default', 'utc-5');
>>>> ts_parse
>>>> ----------
>>>> (1,utc)
>>>> (21,-5)
>>>> (2 rows)
>>>>
>>>> again, compare
>>>>
>>>> https://postgrespro.ru/search/?area=version&q=utc-5&product=
>>>> postgresql&version=9.6
>>>>
>>>> https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=utc-5
>>>>
>>>>
>>>> We have also better parsing of email, but I'm not sure we need it on
>>>> postgres site.
>>>>
>>>> We'll publish soon on github, let me know if you know it.
>>>>
>>>>
>>> That sounds interesting. Two questions:
>>>
>>> 1. Do you have plans for contributing this one for upstream postgres, or
>>> is it intended to be run separately?
>>>
>>>
>>> We would love to do this, but currently it's there
>>> https://github.com/postgrespro/pg_tsparser
>>>
>>
>>
>> Right, found that one. But if your long term plan is to contribute it
>> upstream, that makes it easier to rely on :)
>>
>
> I'd love if you test it, give us feedback what to improve, what to fix.
> Then we could try to convince community to accept it.
>
>
I've applied this one for testing on the main website search.

At the same time I realized we didn't setweight() on the title on regular
webpages, so I fixed that too (setting title to weight A).

Basically the conf is:

CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = pg_catalog.ispell,
dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
CREATE TEXT SEARCH DICTIONARY pg_dict (
TEMPLATE = pg_catalog.synonym,
synonyms = 'pg_dict' );
CREATE TEXT SEARCH CONFIGURATION pg (
PARSER = tsparser );
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pg_dict, english_ispell, english_stem;
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR email, file, float, host, hword_numpart, int,
numhword, numword, sfloat, uint, url, url_path, version WITH simple;

If you have any other suggestions of things we should change there, please
let me know!

So far, this is on the main website search and *not* on the archives
search. Let's try it there first, but in the long run we should use similar
configurations.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Ashutosh Bapat 2017-04-03 04:39:34 Re: Please provide editor privileges for postgresql wiki
Previous Message Magnus Hagander 2017-04-01 16:53:39 Re: Further UTF8/MIME fixes for the commitfest app