Re: [GSOC] questions about idea "rewrite pg_dump as library"

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, ˧ <shuai900217(at)126(dot)com>, 'PostgreSQL-development' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GSOC] questions about idea "rewrite pg_dump as library"
Date: 2013-04-12 12:07:33
Message-ID: 5167F905.1020108@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/11/2013 12:17 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> Hannu Krosing wrote:
>>> Natural solution to this seems to move most of pg_dump functionality
>>> into backend as functions, so we have pg_dump_xxx() for everything
>>> we want to dump plus a topological sort function for getting the
>>> objects in right order.
>> This idea doesn't work because of back-patch considerations (i.e. we
>> would not be able to create the functions in back branches, and so this
>> new style of pg_dump would only work with future server versions). So
>> pg_dump itself would have to retain capability to dump stuff from old
>> servers. This seems unlikely to fly at all, because we'd be then
>> effectively maintaining pg_dump in two places, both backend and the
>> pg_dump source code.
> There are other issues too, in particular that most of the backend's
> code tends to work on SnapshotNow time whereas pg_dump would really
> prefer it was all done according to the transaction snapshot.
I was just thinking of moving the queries the pg_dump currently
uses into UDF-s, which do _not_ use catalog cache, but will use
the same SQL to query catalogs as pg_dump currently does
using whatever snapshot mode is currently set .

the pg_dump will need to still have the same queries for older
versions of postgresql but for new versions pg_dump can become
catalog-agnostic.

and I think that we can retire pg_dump support for older
postgresql versions the same way we drop support for
older versions of postgresql itself.

Hannu

> We have
> got bugs of that ilk already in pg_dump, but we shouldn't introduce a
> bunch more. Doing this right would therefore mean that we'd have to
> write a lot of duplicative code in the backend, ie, it's not clear that
> we gain any synergy by pushing the functionality over. It might
> simplify cross-backend-version issues (at least for backend versions
> released after we'd rewritten all that code) but otherwise I'm afraid
> it'd just be pushing the problems somewhere else.
>
> In any case, "push it to the backend" offers no detectable help with the
> core design issue here, which is figuring out what functionality needs
> to be exposed with what API.
main things I see would be

* get_list_of_objects(object_type, pattern or namelist)
* get_sql_def_for_object(object_type, object_name)
* sort_by_dependency(list of [obj_type, obj_name])

from this you could easily construct most uses, especially if
sort_by_dependency(list of [obj_type, obj_name])
would be smart enough to break circular dependencies, like
turning to tables with mutual FK-s into tabledefs without
FKs + separate constraints.

Or we could always have constraints separately, so that
the ones depending on non-exported objects would be easy
to leave out

My be the dependency API analysis itself is something
worth a GSOC effort ?

Hannu
>
> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2013-04-12 12:45:48 Re: [PATCH] pg_regress and non-default unix socket path
Previous Message Andres Freund 2013-04-12 11:47:23 Re: Inconsistent DB data in Streaming Replication