pg_dump versus views and opclasses

From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_dump versus views and opclasses
Date: 2009-01-15 21:05:45
Message-ID: 37ed240d0901151305k18bd98aciabf78a7207dbab33@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey all,

I recently had pg_dump produce a non-restorable dump for one of my
databases. I can't share the dump itself, but I can describe what
went wrong.

This database had a user-defined composite type in it. I wanted to be
able to GROUP BY this type, so I added a family of comparison
functions and operators, and created an opclass for btree. Then I
created a view which grouped on the composite type. All was well.

Then I tried to create a copy of the database. Nothing out of the
ordinary, I just did a pg_dump and piped the results into a psql
connecting to a different database name. The restore failed on
creation of the view; the opclass hadn't been created yet and
therefore the view couldn't figure out how to group on that composite
type.

I had a closer look at the output of pg_dump, and yes indeed, the
opclass and all its operators were listed after the view which needed
them to live.

Then I poked around in the pg_dump source code to get a better feel
for how it orders objects. That lead me to poke around in pg_depend.
A couple of observations that might be relevant to this case:

* It seems there's no pg_depend entry for
types/functions/operators/opclasses that the view depends on, unless
they are part of the SELECT list. I don't yet know enough about the
dependency system to understand if this is really a defect (and if so
whether there is a pratical fix for it), but it does seem a bit
limited. If, for example, you call a user-defined operator in the
WHERE clause of a view, Postgres has no idea about that dependency.

* pg_dump queries the information from pg_depend sorting first by
classid -- the oid of the system catalog the dependent object lives
in, then by the oid of the dependent object. I ran a similar query in
psql and the results were reminiscent of the order of objects produced
by pg_dump. pg_opclass and pg_operator have larger oids than
pg_class, and in the absence of dependency information to guide it,
pg_dump output the view (pg_class) first, followed by the opclass.
Coincidence?

Assuming for the moment that I'm on the right track with these
hunches, it seems that pg_dump needs some help. We could try to make
the dependency information about views more complete, in which case
everything should Just Work in pg_dump. If that's not practical (and
perhaps even if it is practical) we could also try to give opclasses
and operators a higher default priority in the dump order than views.
This seems logical since a view can depend on an opclass, but an
opclass can't depend on a view. At least, not directly.

So, am I on to something here, or is it just the ridiculously hot .au
weather talking?

If it would be helpful, I could write up a proper minimal test case to
demonstrate the error. Let me know.

Cheers,
BJ

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-01-15 21:26:37 Re: FWD: Re: Updated backslash consistency patch
Previous Message Dimitri Fontaine 2009-01-15 20:48:51 Re: Updated backslash consistency patch