From: | valgog(at)gmail(dot)com |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #6375: tsearch does not recognize all valid emails |
Date: | 2012-01-03 18:04:23 |
Message-ID: | E1Ri8il-0008Ct-9p@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 6375
Logged by: Valentine Gogichashvili
Email address: valgog(at)gmail(dot)com
PostgreSQL version: 9.1.1
Operating system: Debian 4.4.5-8
Description:
Hello,
default tsearch parser does not recognize all valid email addresses and
tokenizes them as text, splitting into tokens.
For example:
postgres=# select to_tsquery('simple', 'normal(at)email(dot)com' );
to_tsquery
────────────────────
'normal(at)email(dot)com'
(1 row)
here it behaves ok;
postgres=# select to_tsquery('simple', '-still-normal(at)email(dot)com' );
to_tsquery
──────────────────────────
'still-normal(at)email(dot)com'
(1 row)
here it trims '-' from the beginning of an email. This is not correct, but
will at least find that email.
postgres=# select to_tsquery('simple', '-not-normal-with-dash-(at)email(dot)com'
);
to_tsquery
───────────────────────────────────────────────────────────────────────────────
'not-normal-with-dash' & 'not' & 'normal' & 'with' & 'dash' & 'email.com'
(1 row)
and this is now a real problem as it leads to finding emails that are not
the same, but are "super-sets" of that one.
Valid email characters, that are not correctly treated also are at least '+'
and '.'
With my best regards,
-- Valentine Gogichashvili
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Kupershmidt | 2012-01-03 23:08:45 | Re: BUG #6370: manual does not discuss transactional DDL |
Previous Message | Tom Lane | 2012-01-02 21:00:19 | Re: BUG #6372: Error while creating database with fsync parameter as on incase of CIFS |