Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Kang Yuzhe <tiggreen87(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...
Date: 2017-04-17 15:53:50
Message-ID: CACjxUsO09uiXUrjr4-OuyytuVveWgSuxDJDnQ-AttF6CJG4Dhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 28, 2017 at 10:36 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> Personally I have to agree that the learning curve is very steep. Some
> of the docs and presentations help, but there's a LOT to understand.

Some small patches can be kept to a fairly narrow set of areas, and
if you can find a similar capability to can crib technique for
handling some of the more mysterious areas it might brush up
against. When I started working on my first *big* patch that was
bound to touch many areas (around the start of development for 9.1)
I counted lines of code and found over a million lines just in .c
and .h files. We're now closing in on 1.5 million lines. That's
not counting over 376,000 lines of documentation in .sgml files,
over 12,000 lines of text in README* files, over 26,000 lines of
perl code, over 103,000 lines of .sql code (60% of which is in
regression tests), over 38,000 lines of .y code (for flex/bison
parsing), about 9,000 lines of various type of code just for
generating the configure file, and over 439,000 lines of .po files
(for message translations). I'm sure I missed a lot of important
stuff there, but it gives some idea the challenge it is to get your
head around it all.

My first advice is to try to identify which areas of the code you
will need to touch, and read those over. Several times. Try to
infer the API to areas *that* code needs to reference from looking
at other code (as similar to what you want to work on as you can
find), reading code comments and README files, and asking
questions. Secondly, there is a lot that is considered to be
"coding rules" that is, as far as I've been able to tell, only
contained inside the heads of veteran PostgreSQL coders, with
occasional references in the discussion list archives. Asking
questions, proposing approaches before coding, and showing work in
progress early and often will help a lot in terms of discovering
these issues and allowing you to rearrange things to fit these
conventions. If someone with the "gift of gab" is able to capture
these and put them into a readily available form, that would be
fantastic.

> * SSI (haven't gone there yet myself)

For anyone wanting to approach this area, there is a fair amount to
look at. There is some overlap, but in rough order of "practical"
to "theoretical foundation", you might want to look at:

https://www.postgresql.org/docs/current/static/transaction-iso.html

https://wiki.postgresql.org/wiki/SSI

The SQL standard

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=refs/heads/master

http://www.vldb.org/pvldb/vol5.html

http://hdl.handle.net/2123/5353

Papers cited in these last two. I have found papers authored by
Alan Fekete or Adul Adya particularly enlightening.

If any of the other areas that Craig listed have similar work
available, maybe we should start a Wiki page where we list areas of
code (starting with the list Craig included) as section headers, and
put links to useful reading below each?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-04-17 16:02:38 Re: logical replication and PANIC during shutdown checkpoint in publisher
Previous Message Dan Langille 2017-04-17 15:47:08 PGCon 2017 registration now open