Re: Multiple word synonyms (maybe?)

From: rob stone <floriparob(at)gmail(dot)com>
To: Tim van der Linden <tim(at)shisaa(dot)jp>, pgsql-general(at)postgresql(dot)org
Subject: Re: Multiple word synonyms (maybe?)
Date: 2015-10-20 10:57:59
Message-ID: 1445338679.1853.30.camel@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2015-10-20 at 19:35 +0900, Tim van der Linden wrote:
> Hi All
>
> I have a question regarding PostgreSQL's full text capabilities and
> (presumably) the synonym dictionary.
>
> I'm currently implementing FTS on a medical themed setup which uses
> domain specific jargon to denote a bunch of stuff. A specific request
> I wish to implement here are the jargon synonyms that are heavily
> used.
>
> Of course, I can simply go ahead and create my own synonym dictionary
> with a jargon specific synonym file to feed it. However, most of the
> synonyms are comprised out of more then a single word.
>
> The term "heart attack" for example has the following "synonyms":
>
> - Acute MI
> - MI
> - Myocardial infarction
>
> As far as I understand it, the tokenizer within PostgreSQL FTS engine
> splits words on spaces to generate tokens which are then proposed to
> each dictionary. I think it is therefor impossible to have "multi-
> word synonyms" in this sense as multiple words cannot reach the
> dictionary. The term "heart attack" would be presented as the tokens
> "heart" and "attack".
>
> From a technical standpoint I understand FTS is about looking at
> individual words and lexemizing them ... yet from a natural language
> lookup perspective you still wish to tie "Heart attack" to "Acute MI"
> so when a client search on one, the other will turn up as well.
>
> Should I write my own tokenizer to catch all these words and present
> them as a single token? Or is this completely outside the realm of
> FTS (or FTS within Postgresql)?
>
> Cheers,
> Tim
>
>

Looking at this from an entirely different perspective, why are you not
using ICD codes to identify patient events?
It is a one to many relationship between patient and their events
identified by the relevant ICD code and date.
Given that MI has several applicable ICD codes you can use a select
along the lines of:-
WHERE icd_code IN (  . . . )

I know it doesn't answer your question!

Cheers,
Rob

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Geoff Winkless 2015-10-20 11:02:46 Re: Multiple word synonyms (maybe?)
Previous Message Tim van der Linden 2015-10-20 10:35:38 Multiple word synonyms (maybe?)