Email parsing in Text Search

From: Martin Dubé <martin(dot)dube(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Email parsing in Text Search
Date: 2016-09-07 17:51:31
Message-ID: CAGny-cMH0s4Q-Ob=Ebn+-yDchLMVEm8bZ9PBP88vEvppsh5BPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I'm having a weird behavior with the email parser and wonder if it is a bug
or a feature.

When using the default regconfig and parse an email where the first part is
numbers only, it is not parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001(at)asdf(dot)com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+------------------+-----------+--------------+------------+-------------
uint | Unsigned integer | 000000001 | {simple} | simple |
{000000001}
blank | Space symbols | @ | {} | |
host | Host | asdf.com | {simple} | simple | {
asdf.com}
(3 rows)

However, if I add a letter, it is parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001a(at)asdf(dot)com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+---------------+---------------------+--------------+------------+-----------------------
email | Email address | 000000001a(at)asdf(dot)com | {simple} | simple | {
000000001a(at)asdf(dot)com}
(1 row)

According to RFC and several forums, an email address with only numbers in
the first part is valid.

Is it a normal behavior?

I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.

Thanks,

--
Mart

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-09-07 18:32:33 Re: Email parsing in Text Search
Previous Message Olivier Dony 2016-09-07 16:31:36 Re: Serialization failures on PQ9.5