Re: Has there been any discussion of custom dictionaries being defined in the database?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Has there been any discussion of custom dictionaries being defined in the database?
Date: 2019-10-19 13:08:26
Message-ID: 20191019130826.usuxx5k7rhwmmnr5@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Oct 17, 2019 at 11:52:39AM +0200, Tom Lane wrote:
>Morris de Oryx <morrisdeoryx(at)gmail(dot)com> writes:
>> Given that Amazon is bragging this week about turning off Oracle, it seems
>> like they could kick some resources towards contributing something to the
>> Postgres project. With that in mind, is the idea of defining dictionaries
>> within a table somehow meritless, or unexpectedly difficult?
>
>Well, it'd just be totally different. I don't think anybody cares to
>provide two separate definitions of common dictionaries (which'd have to
>somehow be kept in sync).
>
>As for why we did it with external text files in the first place ---
>for at least some of the dictionary types, the point is that you can
>drop in data files that are available from upstream sources, without any
>modification. Getting the same info into a table would require some
>nonzero amount of data transformation.
>

IMHO being able to load dictionaries from a table would be quite
useful, and not just because of RDS. For example, it's not entirely true
we're just using the upstream dictionaries verbatim - it's quite common
to add new words, particularly in specialized fields. That's way easier
when you can do that through a table and not through a file.

>Having said that ... in the end a dictionary is really just a set of
>functions implementing the dictionary API; where they get their data
>from is their business. So in theory you could roll your own
>dictionary that gets its data out of a table. But the dictionary API
>would be pretty hard to implement except in C, and I bet RDS doesn't
>let you install your own C functions either :-(
>

Not sure. Of course, if we expect the dictionary to work just like the
ispell one, with preprocessing the dictionary into shmem, then that
requires C. I don't think that's entirely necessary, thoug - we could
use the table directly. Yes, that would be slower, but maybe it'd be
sufficient.

But I think the idea is ultimately that we'd implement a new dict type
in core, and people would just specify which table to load data from.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dmitry Dolgov 2019-10-19 13:32:30 Re: jsonb_set() strictness considered harmful to data
Previous Message Tomas Vondra 2019-10-19 12:44:29 Re: releasing space