Re: Questions about PostgreSQL implementation details

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Julien Delplanque <julien(dot)delplanque(at)inria(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Questions about PostgreSQL implementation details
Date: 2019-12-13 05:48:46
Message-ID: CAMsr+YGdO6WsuXPABEWE3aT3a0Juwx1WpP=VrJv0j7qhDhjf+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 9 Dec 2019 at 23:35, Julien Delplanque <julien(dot)delplanque(at)inria(dot)fr>
wrote:

> Hello PostgreSQL hackers,
>
> I hope I am posting on the right mailing-list.
>
> I am actually doing a PhD related to relational databases and software
> engineering.
>
> I use PostgreSQL for my research.
>
> I have a few questions about the internals of PostgreSQL and I think they
> require experts knowledge.
>
> I could not find documentation about that in the nice PostgreSQL
> documentation but maybe I missed something? Tell me if it is the case.
>

There are a bunch of README files in the source tree that concern various
innards of PostgreSQL. They're not always referred to by any comments etc,
so you have to know they exist. They're usually well worth reading, though
it can take a while before you understand enough of PostgreSQL's
architecture for them to make sense...

Try

find src/ -name README\*

> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the
> "reality" concerning the state of the DB or are they just a virtual
> representation ?
>

That's been largely answered. But I want to point out an important caveat
that isn't obvious to new people: The oid of a relation (pg_class.oid) is
not the same thing as the pg_class.relfilenode, which is usually the base
of the filename of the on-disk storage for the relation. On an idle or new
database most relations are created with an equal oid and relfilename, so
it's easy to think the oid maps to the on-disk name of a relation, but it
doesn't. The relation oid will not change so long as the relation exists,
but the relfilenode may change if the table contents are rewritten, etc.
Additionally, there are special tables that are "relmapped" such that they
don't have a normal relfilenode at all, instead access is indirected via a
separate mapping. IIRC that's mainly necessary so we can bootstrap access
to the catalog tables that tell us how to read the catalogs.

What I would like to know with this question is: would it be possible to
> implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER
> TABLE, etc.) as DML queries that modify the meta-data stored in
> meta-description tables?
>

Not really.

PostgreSQL has a caching layer - sycache, relcache, catcache - and
invalidation scheme that it relies on. It doesn't execute regular queries
on the system catalogs. It also has simplifying rules around how they are
updated and accessed. See the logic in genam.c etc. Catalogs may also
represent things that aren't just other DB rows - for example, pg_class
entries are associated with files on disk for individual database tables.

You can't just insert into pg_class, pg_attribute, etc and expect that to
safely create a table. Though it's surprising how much you can get away
with by hacking the catalogs if you're very careful and you trick
PostgreSQL into firing appropriate invalidations. I'd quite like to have a
SQL-exposed way to do a forced global cache flush and invalidation for use
in emergency scary catalog hacking situations.

So you can do quite a bit with direct catalog surgery, but it's dangerous
and if you break the database, you get to keep the pieces.

Q1.1 If it is possible, is what is done in reality? I have the feeling that
> it is not the case and that DDL queries are implemented in C directly.
>

Right. See standard_ProcessUtility() and friends.

Q1.2 If it is possible and not done, what is the reason?
>

Speed - no need to run the full executor. Simplification of catalog access.
Caching and invalidations. Chicken/egg problems: how do you "CREATE TABLE
pg_class"? . Lots more.

> Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to
> database structure such as "a table can only have a single primary key")
> implemented in C code or via data constraints on PostgreSQL's
> meta-description tables?
>

System catalogs are not permitted to have CONSTRAINTs (CHECK constraints,
UNIQUE constraints, PRIMARY KEY constraints, FOREIGN KEY constraints, etc).

All such management is done in C level logic with the assistance of the
pg_depend catalog and the relationships it tracks.

> Q2.1 If they are not implemented via data constraints on meta-description
> tables, why ?
>

Same as above.

> Q2.2 Is there somewhere in the documentation a list of such
> "meta-constraints" implemented by PostgreSQL?
>

Not AFAIK.

Why?

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-12-13 05:49:20 Re: Conflict handling for COPY FROM
Previous Message Andres Freund 2019-12-13 05:45:29 Re: A varint implementation for PG?