Quick Links

Re: WIP: Access method extendability

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: Access method extendability
Date:	2014-10-28 14:22:38
Message-ID:	CA+U5nMKNcbcy1wFZVfr3S9QREJSVJ_xNuU1VS9o0McTjFgtZxg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 15 October 2014 13:08, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:

> Postgres was initially designed to support access methods extendability.
> This extendability lives to present day. However, this is mostly internal
> in-core extendability. One can quite easily add new access method into
> PostgreSQL core. But if one try to implement access method as external
> module, he will be faced with following difficulties:

...

> Problem of WAL is a bit more complex. According to previous discussions, we
> don't want to let extensions declare their own xlog records. If we let them
> then recovery process will depend on extensions. That is much violates
> reliability. Solution is to implement some generic xlog record which is able
> to represent difference between blocks in some general manner.

Thank you for progressing with these thoughts.

I'm still a little uncertain about the approach, now my eyes are open
to the problems of extendability.

The main problem we had in the past was that GiST and GIN indexes both
had faulty implementations for redo, which in some cases caused severe
issues. Adding new indexes will also suffer the same problems, so I
see a different starting place.

The faults there raised the need for us to be able to mark specific
indexes as corrupt, so that they could be avoided during Hot Standby
and in normal running after promotion.

Here's the order of features I think we need

1. A mechanism to mark an index as corrupt so that it won't be usable
by queries. That needs to work during recovery, so we can persist a
data structure which tells us which indexes are corrupt. Then
something that checks whether an index is known corrupt during
relcache access. So if we decide an index is bad, we record the index
as corrupt and then fire a relcache invalidation.

2. Some additional code in Autovacuum to rebuild corrupt indexes at
startup, using AV worker processes to perform a REINDEX CONCURRENTLY.

This will give us what we need to allow an AM to behave sensibly, even
in the face of its own bugs. It also gives us UNLOGGED indexes for
free. Unlogged indexes means we can change the way unlogged tables
behave to allow them to truncate down to the highest unchanged data at
recovery, so we don't lose all the data when we crash.

3. That then allows us to move towards having indexes that are marked
"changed" when we perform first DML on the table in any checkpoint
cycle. Which allows us to rebuild indexes which were in the middle of
being changed when we crashed. (The way we'd do that is to have an LSN
on the metapage and then only write WAL for the metapage). The
difference here is that they are UNLOGGED but do not get trashed on
recovery unless they were in the process of changing.

If we do those things, then we won't even need to worry about needing
AMs to write their own WAL records. Recovery will be safe AND we won't
need to go through problems of buggy persistence implementations in
new types of index.

Or put it another way, it will be easier to write new index AMs
because we'll be able to skip the WAL part until we know we want it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

WIP: Access method extendability at 2014-10-15 12:08:38 from Alexander Korotkov

Responses

Re: WIP: Access method extendability at 2014-10-28 14:53:49 from Robert Haas
Re: WIP: Access method extendability at 2014-10-28 17:04:00 from Simon Riggs
Re: WIP: Access method extendability at 2014-10-28 17:50:44 from Jim Nasby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2014-10-28 14:40:06	Re: [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates
Previous Message	Andres Freund	2014-10-28 14:16:42	Re: Deferring some AtStart* allocations?