Re: Email data type

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Steve Atkins <steve(at)blighty(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Email data type
Date: 2004-05-17 16:21:54
Message-ID: 200405170921.54981.scrawford@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday 17 May 2004 8:45 am, Steve Atkins wrote:
> Also, a(at)10(dot)11(dot)12(dot)13 is a syntactically valid email address, in the
> .13 TLD. It does not deliver to 10.11.12.13, or anywhere else, as
> of today, unless the MTA or local recursive resolver is broken (a
> common case). a(at)[10(dot)11(dot)12(dot)13] is a whole other thing. As is
> a(at)[::10.11.12.13] and various other IPv6 variants.

Along those lines a(at)foo(dot)bar(dot)com and a(at)[foo(dot)bar(dot)com] are valid but they
don't necessarily refer to the same mailbox (depends on the mx for
foo.bar.com)

> Parsing email addresses is a significant part of my day job, and
> email address validation is a lot harder than it looks at first
> sight.

Yes, indeed.

> Don't forget quoting, whitespace, escaping and nesting
> parenthetical comments....

The "just looking for an @" breaks pretty quickly, say with
"joe@"some.dom or a myriad of other variations.

> In some contexts the empty string is a valid email address. In some
> contexts "Postmaster" is a valid email address.

As are postmaster and pOsTmaSTeR and POSTmaster and they are all the
same address.

> I'm not entirely convinced that an email address is a simple and
> well-defined enough datatype to handle comprehensively within the
> DB. The validation decisions are complex and vary from application
> to application.
>
> (I use two text columns - localpart and domainpart, with an index
> on reverse(lower(domainpart)) and leave validation to the
> application, myself).

Indeed. A problem with the "email address" datatype is that it hinders
normalization:

"Joe User" <joe(at)user(dot)dom> is valid but a database designer would
probably prefer columns for name and email, or if the addresses were
all people, firstname, middlename, lastname, email.

As you mentioned, the email can be broken into localpart and
domainpart but if the app requires it, the domainpart could be
further rendered into toplevel (so you could find all the .gov or
.edu), secondlevel (at least corresponds to a registrant) and
subdomain(s) as necessary.

Cheers,
Steve

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Atkins 2004-05-17 16:22:20 Re: Email data type
Previous Message Dave Page 2004-05-17 16:17:59 Re: Email data type