Re: CSV mode option for pg_dump

From: PFC <lists(at)peufeu(dot)com>
To: "Volkan YAZICI" <yazicivo(at)ttnet(dot)net(dot)tr>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CSV mode option for pg_dump
Date: 2006-06-13 16:25:47
Message-ID: op.ta3crgb9cigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


From what I gather, the CSV format dump would only contain data.
I think pg_dump is the friend of pg_restore. It dumps everything
including user defined functions, types, schemas etc. CSV does not fit
with this.

Besides, people will probably want to dump into CSV the result of any
query, to load it into excel, not just the full contents of a table.

So, why not create a separate tool, someone suggested pg_query for that,
I second it.
This tool would take a query and format options, and would output a file
in whatever format chosen by the user (CSV, COPY format, xml, whatever)

A script language (python) can be used, which will significantly shorten
development times and allow easy modularity, as it is easier to add a
module to a python program than a C program.
I would vote for Python because I love it and it has a very good postgres
adapter (psycopg2) which knows how to convers every postgres type to a
native language type (yes, even multidimensional arrays of BOX get
converted). And it's really fast at retrieving large volumes of data.

So you have a stable, fast tool for backup and restore (pg_dump) and a
rapidly evolving, user-friendly and extendable tool for exporting data,
and everyone is happy.

Mr Momijan talks about adding modular functionality to pg_dump. Is it
really necessary ? What is the objective ? Is it to reuse code in pg_dump
? I guess not ; if a user wants to dump, for instance, all the tables in a
schema, implementing this logic in python is only a few lines of code
(select from information_schema...)

To be realistic, output format modules should be written in script
languages. Noone sane is eager to do string manipulation in C. Thus these
modules would have to somehow fit with pg_dump, maybe with a pipe or
something. This means designing another protocol. Reimplementing in a
scripting langage the parts of pg_dump which will be reused by this
project (mainly, enumerating tables and stuff) will be far easier.

Just look.

Python 2.4.2 (#1, Mar 30 2006, 14:34:35)
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

...opens a db connection...

>>> c.execute( "SELECT * FROM test.csv" )
>>> data = c.fetchall()
>>> data
[[1, datetime.date(2006, 6, 13), 'this\tcontains\ttabulations'], [2,
datetime.date(2006, 6, 13), "this'contains'quotes"], [3,
datetime.date(2006, 6, 13), 'this"contains"double quotes']]
>>> import csv, sys
>>> c = csv.writer( sys.stdout, dialect = csv.excel )
>>> c.writerows( data )
1,2006-06-13,this contains tabulations
2,2006-06-13,this'contains'quotes
3,2006-06-13,"this""contains""double quotes"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-06-13 16:30:54 Re: CSV mode option for pg_dump
Previous Message Joshua D. Drake 2006-06-13 16:14:48 Re: CSV mode option for pg_dump