Re: Out of the box, full text search feature suggestion for postgresql

From: aa <ghevge(at)gmail(dot)com>
To: Artur Zakirov <zaartur(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Out of the box, full text search feature suggestion for postgresql
Date: 2024-01-02 17:56:42
Message-ID: CA+hGcwLQodzWhvr+7Y6c-gwbvh1tPrLnOs_xVb7at4ZeZ5xQqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Pgroonga project seems to have solved that problem as it supports any
language, out of the box.
As for the "pgsql-hackers" you are looking for, I would say Kou (the main
developer in pgroonga) will be the right candidate for that.

I guess it will be just a matter for you guys to convince him to join
forces.

IMO integrating pgroonga logic into postgres will be a huge benefit for the
whole postgres community, in regards to full text search functionality.

On Tue, Jan 2, 2024 at 12:21 PM Artur Zakirov <zaartur(at)gmail(dot)com> wrote:

> On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:
> > > Hello Postgres Team!
> > >
> > > First of all, a big THANK YOU for the great work you folks are doing!
> > >
> > > The reason I am writing to you is to suggest a feature in future
> Postgres
> > > versions, a feature that is partially there but is not quite where it
> should be
> > > in my opinion: the full text search functionality. This functionality
> in my
> > > opinion, should be available out of the box, for any possible language
> > > available, including east Asia character based languages. You would
> probably
> > > say that this will require a huge amount of work, and I would say, a
> postgres
> > > extension which does exactly this, already exists, and it is called :
> pgroonga
> > > (https://pgroonga.github.io/)
> >
> > Please explain how this is different from what we already have:
> >
> > https://www.postgresql.org/docs/current/textsearch.html
>
> I'm not familiar with pgroonga, but the main issue with built-in text
> search is that it cannot tokenize asian and many other languages
> properly.
>
> Here default parser cannot tokenize Japanese text:
>
> =# select * from ts_parse('default', 'これはペンです');
> tokid | token
> -------+----------------
> 2 | これはペンです
>
> Unlike Latin:
>
> =# select * from ts_parse('default', 'this is a pen');
> tokid | token
> -------+-------
> 1 | this
> 12 |
> 1 | is
> 12 |
> 1 | a
> 12 |
> 1 | pen
>
> To add support for Japanese (and other languages) it is necessary to
> write a new parser or fix the existing default parser.
>
> On the other hand pgroonga's source code looks complex, and I doubt
> that there are pgsql-hackers who know it and target languages well and
> who will be able to port it to Postgres core.
>
> --
> Artur
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amadeo Gallardo 2024-01-02 19:51:07 Re: Postgres 16.1 - Bug: cache entry already complete
Previous Message Tom Lane 2024-01-02 17:31:22 Re: Postgres 16.1 - Bug: cache entry already complete