From: | Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, "teodor(at)sigaev(dot)ru" <teodor(at)sigaev(dot)ru> |
Subject: | Re: websearch_to_tsquery() and apostrophe inside double quotes |
Date: | 2019-10-10 14:10:04 |
Message-ID: | DB6PR0202MB2904B23F822E4157B4105649E3940@DB6PR0202MB2904.eurprd02.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Tom,
Thank you for looking at this. You are right I couldn't find anything in the docs that would explain this.
I can't think of any rationale for producing a query like this so it does look like a bug.
Best regards,
Alastair
________________________________
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: 10 October 2019 14:35
To: Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org <pgsql-general(at)lists(dot)postgresql(dot)org>; teodor(at)sigaev(dot)ru <teodor(at)sigaev(dot)ru>
Subject: Re: websearch_to_tsquery() and apostrophe inside double quotes
Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> writes:
> I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.
> ...
> select websearch_to_tsquery('"peter o''toole"');
> websearch_to_tsquery
> ------------------------------
> 'peter' <-> ( 'o' & 'tool' )
> (1 row)
> I am not quite sure what text this will actually match?
I believe it's impossible for that to match anything :-(.
It would require 'o' and 'tool' to match the same lexeme
(one immediately after a 'peter') which of course is impossible.
The underlying tsvector type seems to treat the apostrophe the
same as whitespace; it separates 'o' and 'toole' into
distinct words:
# select to_tsvector('peter o''toole');
to_tsvector
--------------------------
'o':2 'peter':1 'tool':3
(1 row)
So it seems to me that this is a bug: websearch_to_tsquery
should also treat "'" like whitespace. There's certainly
not anything in its documentation that suggests it should
treat "'" specially. If it didn't, you'd get
# select websearch_to_tsquery('"peter o toole"');
websearch_to_tsquery
----------------------------
'peter' <-> 'o' <-> 'tool'
(1 row)
which would match this tsvector.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Alastair McKinley | 2019-10-10 14:23:56 | websearch_to_tsquery() and handling of ampersand characters inside double quotes |
Previous Message | Adrian Klaver | 2019-10-10 13:54:02 | Re: syntax error with v12 |