The History of PostgreSQL Development

Draft


PostgreSQL is the most advanced open-source database server.  It is Object-Relational(ORDBMS), and supported by a team of Internet developers.  PostgreSQL began as Ingres, developed at the University of California at Berkeley.  The Ingres code was  taken and enhanced  by Ingres Corporation, which produced one of the first commercially successful relational database servers.  (Ingres Corp. was later purchased by Computer Associates.)  The Ingres code was taken by Michael Stonebraker as part of a Berkeley project to develop an object-relational database server called Postgres.  The Postgres code was taken by Illustra and developed into a commercial product.  (Illustra was later purchased by Informix and integrated into Informix's Universal Server.)  Several graduate students added SQL capabilities to Postgres, and called it Postgres95.  The graduate students left Berkeley, but the code was maintained by one of the graduate students, Jolly Chen, and had an active mailing list.

In the summer of 1996, it became clear that the demand for an open-source SQL database server was great, and a team should be formed to continue development.  Marc G. Fournier, in Toronto, Canada, offered to host the mailing list, and provide a server to host the source tree.  The 1,000 mailing list subscribers were moved to the new list, and a server was configured, giving a few people login accounts to apply patches to the source tree using CVS.

Jolly Chen had stated, "This project needs a few people with lots of time, not many people with a little time."  With 250,000 lines of C code, we understood what he meant.  In the early days, there were four major people involved, Marc, Thomas Lockhart in Pasadena, California, and Vadim Mikheev, in Krasnoyarsk, Russia, and myself.  We all had full-time jobs, so were doing this in our spare time.  It certainly was a challenge.

Our first goal was to scour the old mailing list, evaluating patches that had been posted to fix various problems.  The system was quite fragile then, and not easily understood.  During the first six months of development, there was fear that some patch would break the system, and we would never be able to correct the problem.  Many problem reports had us scratching our heads, trying to figure out not only what was wrong, but how the system even performed many functions.

We inherited a huge installed base.  A typical bug report was, "When I do this, it crashes the database backend."  We had a whole list of them.  It became clear that some organization was needed.  Most bug reports required significant research to fix, and many were duplicates, so our TODO list reported every buggy SQL query.  It helped us identify our bugs, and made users aware of them too, cutting down on duplicate bug reports.  We had many eager developers, but the learning curve in understanding how the backend worked was significant.  Many developers got involved in the edges of the source code, like language interfaces or database tools, where things were easier to understand.  Other developers focused on specific problem queries, trying to locate the source of the bug.  It was amazing to see that many bugs were fixed with just one line of C code.  Postgres had evolved in an academic environment, and had not been exposed to the full spectrum of real-world queries.  During that time, there was talk of adding features, but the instability of the system made bug fixing our major focus.

We changed our name from Postgres95 to PostgreSQL.  It is a mouthful, but touts our SQL capabilities.  We started distributing our source tree using sup, which allowed people to keep up-to-date copies of the development tree without downloading a whole tarball.  We later switched to remote CVS.

Releases were every 3-5 months.  This consisted of 2-3 months of development, one month of beta testing, a major release, and a few weeks to issue subreleases to correct serious bugs.  We were never tempted to do a more aggressive schedule with more releases.  A database server is not like a word processor or a game, where you can easily restart it if there is a problem.  Database are multi-user, and lock user data inside our servers, so we have to be very careful that released software is as reliable as possible.

Development of source code of this scale and complexity is not for the novice.  We had trouble getting developers interested in a project with such a steep learning curve.  However, our civilized atmosphere, and our improved reliability and performance, finally helped attract the experienced talent we needed.

Getting our developers the knowledge they needed to assist with PostgreSQL was clearly a priority.  We had a TODO list that outlined what needed to be done, but with 250,000 lines of code, taking on any TODO item was a major project.  We realized developer education would pay major benefits in helping people get started.  We wrote a flowchart of the backend modules, outlining the purpose of each.  We wrote a developers FAQ, to describe some of the common questions/troubles  of PostgreSQL developers.  With this, developers became productive much quicker.

The source code we inherited from Berkeley was very modular, but suffered from bit rot, and some Berkeley coders hadn't understand the proper way to handle certain tasks.  Their coding styles were also quite varied.  We wrote a tool to format/indent the entire source tree in a consistent manner.  We wrote a script to find functions that could be marked as static, or never-called functions that could be removed completely.  These are run just before each release.  A release checklist reminds us of the things that have to be changed for each release.

As we gained knowledge of the code, we became able to perform more complicated fixes and feature additions.  We started to redesign poorly structured code.   We moved into a mode where each release had major features, instead of just fixes for previous bugs.  We improved SQL conformance, added subselects, improved locking,  and added major missing SQL functionality.

The Usenet discussion group archives started touting us.  In the previous year, we had searched for PostgreSQL, and found that many people we recommending other databases, even though we were addressing user concerns as rapidly as possible. One year later, Usenet  clearly recommended us to users who needed transaction support, complex queries, commerical-grade SQL support, complex data types, and reliability.  Other databases were recommended when speed was the overriding concern.  This more clearly portrayed our strengths.  RedHat's shipment of PostgreSQL as part of their Linux distribution quickly multiplied our user base.

Every release is a major improvement over the last.  Our upcoming 6.5 release marks the development team's final mastery of the source code we inherited from Berkeley.  Finally, every code module is understood by at least one development team member.  We  are now easily adding major features, thanks to the increasing size and experience of our world-wide development team.  Like most open-source projects, we don't know how many people are using our software, but our increased  functionality, visibility and mailing list traffic clearly point to continued growth for PostgreSQL.