Quick Links

Re: Schema version management

From:	Joel Jacobson <joel(at)trustly(dot)com>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Schema version management
Date:	2012-05-23 01:16:20
Message-ID:	CAASwCXcbjwDb6NEqAab9WxHs9MOdaAeYg53Hr_XbZATLvx-Eqw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, May 23, 2012 at 3:24 AM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> Is there a reason why the current directory format could not be
> adjusted to become more human-readable friendly for mechanical
> reasons? I realize there is a backwards compatibility problem, but it
> may be better than bloating a new option.

I like your idea, then the format would be directory, while the option
would be something like, --human-friendly?

Currently, the directory format only dumps the data of tables
into separate files. Everything else goes into the toc.dat file.
To make it work, also the stuff written to the toc.dat file must
be written to separate files.

> But I don't see how making the directory output format more
> human-friendly could be seen as a bad thing overall, except in the
> notable axis of implementation complexity. Silly issues like naming
> files on different platforms, case sensitivity, and file length
> restrictions may rear their ugly head.

If the entire function identity arguments would be included in the filename,
two dumps of the same schema in two different databases
would be guaranteed to produce the same dump.

This would render some very long filenames for functions with many arguments,
but this problem could at least be reduced by using the shorter aliases for each
data type, as "varchar" instead of "character varying" and "timestamptz"
instead of "timestamp with time zone", etc.

http://www.postgresql.org/docs/devel/static/datatype.html#DATATYPE-TABLE

Also, to get even more space, as the name of the function can be long too,
the function name could be made a directory, and the different overloaded types
different files, e.g:

/public/FUNCTION/myfunc/int.sql
/public/FUNCTION/myfunc/int_timestamptz.sql

And functions with no arguments are written to a single file
(suffic .sql to avoid conflict with eventual directory name for function):

/public/FUNCTION/myfunc.sql

> I think about this because in addition to the data types and operators
> defined in the development process, there are often small tables that
> need to be loaded with content and version controlled as well, rather
> like userland-equivalents pg_enum entries.

Is there a term for such tables? I use the term "lookup tables", but perhaps
there is a better one?

In my schema, they typically maps statusids, stateids,
etc to human friendly names.

E.g., if Orders is a huge table for all orders, I might have a
OrderStatuses table to
lookup all the OrderStatusID columns in Orders.
Orders.OrderStatusID -fk-> OrderStatuses.OrderStatusID
OrderStatuses.Name is unqiue and contains the human friendly name of the status.

These small lookup tables also needs to be version controlled of course.

This is a tricky one though, because you might have small tables with base data,
but with references to other huge tables, which you don't want to
include in your
automatically version controlled schema dump.

I solved this problem by creating a quite complex recursive plpgsql function,
resolving all dependencies and joining only the rows from each table required,
allowing you to specify a regex matching a list of tables, which in
turn resolves
to all tables they have references to, and dumps these tables too, but only the
required rows. The result is a dump of each such table into a separate file,
in a restorable order not causing any dependency problems.
Then I have a similar function to do the restoring.

I use this approach to build a restorable clean test database of any version of
the system, may it be the production or some developer's local version of it.

And also, not to forget, to make it work all the sequences also needs
to be restarted
to the same values as in the original database after the dump is restored.

In response to

Re: Schema version management at 2012-05-22 20:24:39 from Daniel Farina

Responses

Re: Schema version management at 2012-05-23 02:09:52 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tatsuo Ishii	2012-05-23 02:04:23	Re: Proposal: add new field to ErrorResponse and NoticeResponse
Previous Message	Tom Lane	2012-05-23 00:05:03	Re: Proposal: add new field to ErrorResponse and NoticeResponse