From: | Martin Dubé <martin(dot)dube(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Email parsing in Text Search |
Date: | 2016-09-07 17:51:31 |
Message-ID: | CAGny-cMH0s4Q-Ob=Ebn+-yDchLMVEm8bZ9PBP88vEvppsh5BPw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
I'm having a weird behavior with the email parser and wonder if it is a bug
or a feature.
When using the default regconfig and parse an email where the first part is
numbers only, it is not parsed as an email.
db=# select * from ts_debug('pg_catalog.english', '000000001(at)asdf(dot)com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+------------------+-----------+--------------+------------+-------------
uint | Unsigned integer | 000000001 | {simple} | simple |
{000000001}
blank | Space symbols | @ | {} | |
host | Host | asdf.com | {simple} | simple | {
asdf.com}
(3 rows)
However, if I add a letter, it is parsed as an email.
db=# select * from ts_debug('pg_catalog.english', '000000001a(at)asdf(dot)com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+---------------+---------------------+--------------+------------+-----------------------
email | Email address | 000000001a(at)asdf(dot)com | {simple} | simple | {
000000001a(at)asdf(dot)com}
(1 row)
According to RFC and several forums, an email address with only numbers in
the first part is valid.
Is it a normal behavior?
I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.
Thanks,
--
Mart
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-09-07 18:32:33 | Re: Email parsing in Text Search |
Previous Message | Olivier Dony | 2016-09-07 16:31:36 | Re: Serialization failures on PQ9.5 |