Re: an attempt to fix the Google search problem

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: an attempt to fix the Google search problem
Date: 2016-11-10 15:05:31
Message-ID: F8F6AA24-7951-4938-99B9-933B4AC1A9C9@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

> On 09 Nov 2016, at 18:07, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> On Wed, Nov 9, 2016 at 6:34 PM, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com <mailto:peter(dot)eisentraut(at)2ndquadrant(dot)com>> wrote:
> It is a well-known problem that a Google search for something in the
> PostgreSQL documentation will usually return hits in old documentation
> versions first, because those pages have been around for the longest.
>
> I believe I have a promising fix for that. By adding a <link
> rel="canonical"> to the documentation pages that point to the "current"
> version, search engines will be encouraged to return the current version
> search results.
>
> I had heard that the Django project had the same problem and got this
> solution from there. See for example the source of this page:
> <https://docs.djangoproject.com/en/1.10/topics/db/models/ <https://docs.djangoproject.com/en/1.10/topics/db/models/>>. Here is
> also some information from Google about this:
> <https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html <https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html>>
>
> I think this is worth trying. A one-line patch is attached.
>
> By that article you linked, it's important not to link to pages that don't exist. So we should at least verify that the page does exist in the current version (the same way that we do for the links at the top of the pages for old versions). IIRC someone (sorry, this is a long time ago, can't remember who or why) mentioned that the pages can get severely punished if the canonical link goes to a 404.

While I can’t cite a source supporting that Google punish 4XX responses, I have
first-hand experience in that they in fact do (or at least have done).

> We did try this at some point ages and ages ago and it didn't help, but I agree it's probably worth another try. But we definitely need to be careful not to destroy existing google ranking.

The backing RFC states that the target document must be a duplicate or superset
of the context document, and Google says similar. The current version of a doc
page fit that but we should be careful when doc pages have been substantially
rewritten, targetting a completely different page could lead to punishment.

cheers ./daniel

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Greg Stark 2016-11-10 18:32:02 Re: an attempt to fix the Google search problem
Previous Message Magnus Hagander 2016-11-10 14:34:54 Re: CSS updates for new documentation build