From: | Paul Lambert <paul(dot)lambert(at)autoledgers(dot)com(dot)au> |
---|---|
To: | sql pgsql <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Extracting hostname from URI column |
Date: | 2007-09-16 22:54:53 |
Message-ID: | 46EDB43D.9060405@autoledgers.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Paul Lambert wrote:
> chester c young wrote:
>>> I'm trying to use substr() and position() functions to extract the
>>> full host name (and later a domain) from a column that holds URLs.
>>
>> substring( href from '.*://\([^/]*)' );
>>
>
> Ok, your solution looks better than mine... but I have no idea how to
> interpret that, time to consult some manuals.
>
OK - following on from this, I'm doing a similar thing to the OP to
analyze my proxy's traffic (never occured to me to do it in a db until
that post)
I've done the above regex to pull out the domain part of the URL and am
left with results such as:
"acvs.mediaonenetwork.net"
"profile.ak.facebook.com"
"www.bankwest.com.au"
What I want to do next is pull out the major domain part of the URL
I.e. for the above three records I should end up with
"mediaonenetwork.net"
"facebook.com"
"bankwest.com.au"
What would be the best way to do something like that? I assume it won't
be a simple regex like the above due to the country codes on the end of
some domains. My thought is look at the last portion of the domain, if
it's 2 characters long then assume it's a country code and grab the last
three sections, if it's not three characters long then assume it's an
international domain and grab the last two... but that sounds a bit dodgy.
--
Paul Lambert
Database Administrator
AutoLedgers
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2007-09-16 23:18:56 | Re: Extracting hostname from URI column |
Previous Message | Andreas Joseph Krogh | 2007-09-16 19:32:47 | Re: Format intervall as hours/minutes etc |