From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: tsearch2 dictionary for statute cites |
Date: | 2009-03-11 06:58:46 |
Message-ID: | Pine.LNX.4.64.0903110952430.31919@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 10 Mar 2009, Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> People are likely to search for statute cites, which tend to have a
>> hierarchical form. I'm not sure the prefix approach will work for
>> this. For example, there is a section 939.64 in the state statutes
>> dealing with commission of a crime while wearing a bulletproof
>> garment. If someone searches for that, they should find subsections
>> like 939.64(1) or 939.64(2) but not different sections which start
>> with the same characters like 939.641 (the section on concealing
>> identity) or 939.645 (the section on hate crimes). A search for
>> chapter 939 should return any of the above.
>
> I think what you need is a custom parser that treats these similarly to
> hyphenated words. If I pretend that the dot is a hyphen I get matching
> behavior that seems to meet all those requirements.
>
> Unfortunately we don't seem to have any really easy way to plug in a
> custom parser, other than copy-paste-modify the existing one which would
> be a PITA from a maintenance standpoint. Perhaps you could pass the
> texts and the queries through a regexp substitution that converts
> digit-dot-digit to digit-dash-digit?
perhaps, for 8.4 it's better to utilize prefix search, like
to_tsquery('939.645:*') will find what Kevin need. The problem is with
parser, so I'd preprocess text before indexing to convert all
digit.digit(digit) to digit.digit.digit, which is what parser recognizes as
a single lexem 'version'. Here is just an illustration
qq=# select * from ts_parse('default',translate('939.64(1)','()','. '));
tokid | token
-------+----------
8 | 939.64.1
12 |
btw, having 'version' it's possible to use dict_regex for 8.3.
>
> regards, tom lane
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Marc Cuypers | 2009-03-11 10:01:02 | Re: upgrade from 7.4 to 8.3 |
Previous Message | A. Kretschmer | 2009-03-11 06:55:40 | Re: Combine psql command with shell script |