From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: tsearch2 dictionary for statute cites |
Date: | 2009-03-11 00:24:44 |
Message-ID: | 17558.1236731084@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> People are likely to search for statute cites, which tend to have a
> hierarchical form. I'm not sure the prefix approach will work for
> this. For example, there is a section 939.64 in the state statutes
> dealing with commission of a crime while wearing a bulletproof
> garment. If someone searches for that, they should find subsections
> like 939.64(1) or 939.64(2) but not different sections which start
> with the same characters like 939.641 (the section on concealing
> identity) or 939.645 (the section on hate crimes). A search for
> chapter 939 should return any of the above.
I think what you need is a custom parser that treats these similarly to
hyphenated words. If I pretend that the dot is a hyphen I get matching
behavior that seems to meet all those requirements.
Unfortunately we don't seem to have any really easy way to plug in a
custom parser, other than copy-paste-modify the existing one which would
be a PITA from a maintenance standpoint. Perhaps you could pass the
texts and the queries through a regexp substitution that converts
digit-dot-digit to digit-dash-digit?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2009-03-11 00:41:20 | Re: Enable user access from remote host |
Previous Message | Tom Lane | 2009-03-10 23:58:41 | Re: pg_toast_temp_xx AND pg_temp_xx SCHEMAS |