Re: alternative back-end block formats

From: Christian Convey <christian(dot)convey(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: alternative back-end block formats
Date: 2014-01-27 18:42:29
Message-ID: CAPfS4ZzwxnQuYjEBnmd0eiYW3t85o4YOvGXfqK=AcNOgKc77rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Craig,

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 01/21/2014 07:43 PM, Christian Convey wrote:
> > Hi all,
> >
> > I'm playing around with Postgres, and I thought it might be fun to
> > experiment with alternative formats for relation blocks, to see if I can
> > get smaller files and/or faster server performance.
>
> It's not clear how you'd do this without massively rewriting the guts of
> Pg.
>
> Per the docs on internal structure, Pg has a block header, then tuples
> within the blocks, each with a tuple header and list of Datum values for
> the tuple. Each Datum has a generic Datum header (handling varlena vs
> fixed length values etc) then a type-specific on-disk representation
> controlled by the type output function for that type.
>

I'm still in the process of getting familiar with the pg backend code, so I
don't have a concrete plan yet. However, I'm working on the assumption
that some set of macros and functions encapsulates the page layout.

If/when I tackle this, I expect to add a layer of indirection somewhere
around that boundary, so that some non-catalog tables, whose schemas meet
certain simplifying assumptions, are read and modified using specialized
code.

I don't want to get into the specific optimizations I'd like to try, only
because I haven't fully studied the code yet, so I don't want to put my
foot in my mouth.

What concrete problem do you mean to tackle? What idea do you want to
> explore or implement?
>

My real motivation is that I'd like to get more familiar with the pg
backend codebase, and tilting at this windmill seemed like an interesting
way to accomplish that.

If I was focused on really solving a real-world problem, I'd say that this
lays the groundwork for table-schema-specific storage optimizations and
optimized record-filtering code. But I'd only make that argument if I
planned to (a) perform a careful study with statistically significant
benchmarks, and/or (b) produce a merge-worthy patch. At this point I have
no intentions of doing so. My main goal really is just to have fun with
the code.

> > Does anyone know if this has been done before with Postgres? I would
> > have assumed yes, but I'm not finding anything in Google about people
> > having done this.
>
> AFAIK (and I don't know much in this area) the storage manager isn't
> very pluggable compared to the rest of Pg.
>

Thanks for the warning. Duly noted.

Kind regards,
Christian

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-01-27 18:51:11 Re: Standalone synchronous master
Previous Message Fujii Masao 2014-01-27 18:42:05 Re: [PATCH] Support for pg_stat_archiver view