Re: Overcoming Initcap Function limitations?

From: Steve Midgley <science(at)misuse(dot)org>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: Bo Guo <bo(dot)guo(at)gisticinc(dot)com>, pgsql-sql(at)lists(dot)postgresql(dot)org
Subject: Re: Overcoming Initcap Function limitations?
Date: 2023-12-04 18:39:12
Message-ID: CAJexoSK3grH+4khLYMDH=dBwbZxx=F-5GUM1rzHh5fzT_GGuwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Mon, Dec 4, 2023 at 10:09 AM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
wrote:

> It's not clear exactly what you are trying to achieve, but you can use
> Postgres' built-in text searching system to exclude stopwords. For example:
>
> CREATE FUNCTION initcap_realword(myword TEXT)
> returns TEXT language SQL AS
> $$
> SELECT CASE WHEN length(to_tsvector(myword)) < 1
> THEN myword ELSE initcap(myword) END;
> $$;
>
> You could extend that to multi-word strings with a little effort. However,
> knowing that macdonald should be MacDonald requires a lot more intelligence
> than is provided by any Postgres built-in system or extension that I know
> of. What you are looking at is the field of science known as Natural
> Language Processing, which can get very complex very quickly. But for a
> Postgres answer, you might combine plpython3u with spacy (
> https://spacy.io/usage/spacy-101)
>
> Cheers,
> Greg
>
> I've been having some pretty good experiences with "hard" text
transformations such as correct capitalization of names like MacDonald
using GPT 3.5 Turbo API which is pretty cheap for the volume of data I've
been working with.. Seems like Spacy might do similar things, and if it can
be run locally, might be much cheaper than a rental API..

Steve

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Bo Guo 2023-12-05 01:39:18 Re: Overcoming Initcap Function limitations?
Previous Message Greg Sabino Mullane 2023-12-04 18:08:03 Re: Overcoming Initcap Function limitations?