Re: Shorter archive URLs

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: Shorter archive URLs
Date: 2019-07-16 08:49:41
Message-ID: CABUevEwu3RMgQKqO1WFaaaMt+WtU52g=VsjhwM-72vGRxU0dBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Tue, Jul 16, 2019 at 5:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Sun, Jul 14, 2019 at 12:52:46PM +0200, Magnus Hagander wrote:
> >> This means that instead of being:
> >> https://www.postgresql.org/message-id/
> >> CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg%40mail.gmail.com
> >>
> >> The url would be:
> >> https://www.postgresql.org/message-id/Z0oaTfo56bV4tke6-r_PKJstHF8=
>
> > It would be nice if I could easily compute the hash if I know the
> > message-id --- I assume I can just run it through sha1. This would
> > allow me to shorten commit URLs, which would be a win for GMail.
>
> Now that I look closer, Magnus' example shows that this proposal
> is underspecified: exactly how would the message-ID be rendered
> before being fed into sha1? In particular it's not clear from
> this whether "@" should be spelled "@" or "%40". The existing
> archive website is quite forgiving about that, you can write
> either --- but the sha1 transform would be utterly unforgiving.
> Instead of opaque hash X you'd get opaque hash Y, and there'd
> be no way even to see what caused the mismatch.
>

It should always be @. The %40 is a sideeffect of @ not being allowed in an
URL.

>
> (BTW, after some experimentation I'm totally unable to reproduce
> Magnus' example using sha1sum(1) and base64(1), so that is not
> the only underspecified point here.)
>

The problem is that sha1sum generates a hex version of the sum, not the
binary version. You also need to be careful about the newlines.
How I've done it is simply (in python):

>>> import hashlib, base64
>>> base64.urlsafe_b64encode(hashlib.sha1(
b'CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg(at)mail(dot)gmail(dot)com
').digest())
b'Z0oaTfo56bV4tke6-r_PKJstHF8='

We could use a hex digest instead of a base64 of course, but that would
make the URLs longer.

(FWIW, I'm not wedded to making this change -- that's why I posted here
first -- this is just explaining how it was actually calculated)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Uliana Philippova (Ispirer Systems) 2019-07-16 14:45:12 Wiki editor request - Ispirer Systems
Previous Message Tom Lane 2019-07-16 03:49:05 Re: Shorter archive URLs