Re: extracting the domain from an email address

From: Rui DeSousa <rui(at)crazybean(dot)net>
To: Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: extracting the domain from an email address
Date: 2019-01-16 18:28:08
Message-ID: 8B3FC027-0DC2-4A7B-9511-F6A06824B1BE@crazybean.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

> On Jan 16, 2019, at 8:20 AM, Mark Steben <mark(dot)steben(at)drivedominion(dot)com> wrote:
>
> Good morning,
> I am searching for a better more efficient way to extract the domain portion of an email address. The two I have been using are very expensive.
> The one extract I mostly use: substring(email from '@(.*)$')
> also has an index on the email column which is the full email address:
> btree (email DESC) WHERE email::text = "substring"(email::text, '@(.*)$'::text) AND length(email::text) > 0,
> The other extract:
> substr(e.email,(strpos(e.email, '@') + 1))
> Currently has no index.
>
> The referenced table has 72 million rows. The email column can be empty (ergo the length = 0 check. I am running postgresql 9.4.
>
> Any insights/observations welcome.
>
>

Not sure of the exact problem you are trying to solve; is it just querying the table via domain name? Does the index that you created work?

A few things; the regex given does not get domain name for some valid emails addresses;

i.e. foo(at)bar@nowhere.com is a valid email; however, the regex expression would return bar(at)nowhere(dot)com <mailto:bar(at)nowhere(dot)com> instead of nowhere.com <http://nowhere.com/>.

Assuming the emails are valid:

create or replace function get_domainname(_value text)
returns text
as $$
begin
_value := reverse(_value);

return nullif(reverse(substring(_value, 0, strpos(_value, '@'))), '');
end;
$$ language plpgsql
immutable returns null on null input
;

create index table_name_idx1 on table_name (get_domainname(email));

Then in the where clause:

where get_domainname(email) = 'nowhere.com'

p.s.

If you want to validate that the email address conforms to the RFC 2822 you can use this function which makes use of Perl’s Email::Address module. You would need to install plperl and Email::Address module.

create or replace function is_valid_email_address(eaddr text)
returns boolean
as
$body$
use Email::Address;

return Email::Address->parse($_[0]);
$body$
language 'plperlu' immutable
;

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Achilleas Mantzios 2019-01-16 19:26:48 Re: Backup solution over unreliable network
Previous Message Pepe TD Vo 2019-01-16 18:15:00 how to store data file in Postgres