Re: Fixing Google Search on the docs (redux)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Dave Page <dpage(at)pgadmin(dot)org>, PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Fixing Google Search on the docs (redux)
Date: 2020-11-21 19:45:34
Message-ID: 20201121194534.bthxe2nrc7pvjelo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

Hi,

On 2020-11-21 15:57:28 +0100, Magnus Hagander wrote:
> On Thu, Nov 19, 2020 at 8:50 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2020-11-18 18:28:49 +0100, Magnus Hagander wrote:
> > > We've discussed this many times before, and I think so far they've all
> > > bogged down at "google suck" :) The problem is that they don't even
> > > consider the case like we have where the pages *aren't* identical, but
> > > yet related.
> >
> > Is any search engine better at this? I don't think so?
>
> I doubt it, most tend to copy Google. And in either case it doesn't
> matter that much -- the *vast* majority of our inbound search traffic
> is google vs the other searches. By such a margin that it's not even a
> point in considering the others.

I was more wondering whether it's "search engines sucks" or "google
sucks" - obviously g search is dominant...

> > > The problem it usually comes down to is that if we do that, then you
> > > will no longer be able to say search for something in the old docs *at
> > > all*.
> >
> > I think that'd still be better than the current situation. But I hope we
> > can do better:
> >
> > > A good example right now might be that recovery.conf stuff goes
> > > away. Even if you explicitly search for "postgresql recovery.conf 11".
> > > And I'd guess the majority of people are actually looking for things
> > > in versions that are NOT the latest (though an even bigger majority of
> > > people will be looking for things in versions that are not 9.1).
> >
> > E.g. not applying canonical when there's no newer version.
>
> That we can definitely go. So for recovery.conf it would still work,
> but anything that goes on a page where the page still exists, I don't
> see how we could separate that out and not do a canonical for that...

Compute a similarity metric ;). No, I'm not serious.

I wonder if it's worth adding some more metadata to our pages for
google's benefit. Perhaps it'd be *slightly* less annoying to navigate
to the right version of the docs if we added breadcrumb annotations
https://developers.google.com/search/docs/data-types/breadcrumb#json-ld_1

I can imagine - but have nothing but intuition to back that up - that we
also make google's job harder by having very recent timestamp for each
version of the docs. Perhaps we ought to add datePublished /
dateModified annotations, and freeze datePublished to the release?

And probably also not update dateModified when the page didn't change,
but I think you were discussing that elsewhere.

Greetings,

Andres Freund

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2020-11-23 10:28:53 Archiving of pgsql-announce
Previous Message Magnus Hagander 2020-11-21 14:57:28 Re: Fixing Google Search on the docs (redux)