Re: xml import/export tools and performance

From: Ries van Twisk <pg(at)rvt(dot)dds(dot)nl>
To: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: xml import/export tools and performance
Date: 2009-11-06 14:12:11
Message-ID: FF8197B1-A645-489C-9DC1-CE2BE801C181@rvt.dds.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Nov 6, 2009, at 2:52 AM, Ivan Sergio Borgonovo wrote:

> I need to import/export through xml files.
>
> I was looking for tools/examples/HOWTO for postgresql.
>
> Right now I still don't know the details of the requirements.
>
> I know I won't need a GUI.
>
> I know one of the exporting parties will be a MS SQL 2005 server, so
> it would be nice if there was an easy way to import xml generated
> with the FOR XML AUTO, XMLSCHEMA ('forpg').
>
> I'd like to have a tool that can write XSD from queries
> automatically.
>
> I may need to strictly specify one of the xml output format since
> one of the importing parties pretend to support xml with something
> that is more like a tagged csv.
>
> Currently I'm mostly looking around to see where it would be
> convenient to move the boundaries of the problem.
>
> I'd be curious about what kind of performance impact they may have
> compared to COPY.
>
> thanks
>

Ivan,

I have been using JasperETL for this purpose. But like what others say,
XML is very in efficient for this task.

Currently I am in the process if importing XML feeds into PostgreSQL.
At first I used the ETL tool to write to the correct table by creating
joins
into the ETL tool.

However, this slows things down quite a bit (but the process is VERY
manageable).
Currently I keep teh route within JasperETL as short as possible in
teh following steps

- load XML
- remove columns I don't need
- cleanup some data using a javarow
- load into a staging table

From there I execute a stored procedure to normalize the data (create
additional records where needed in
other foreign tables) and push data into production. Triggers on my
production table
do some additional work for tsearch2 and some other small stuff.

In my case, on a 32Bit system loading a 200Mb XML file consumes 1Gb
real memory.
On 64Bit systems with Java 1.6 you can push this further down the road
though.

if you can export to CSV and import using copy into PG, I would go for
that route
if you can control the format of the CSV well and reliable, because
detecting errors
in a CSV is much harder then in XML (missing column in CSV can mess up
data)

Ries

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2009-11-06 14:27:52 Re: PostgreSQL for 64 Bit Windows Version
Previous Message Stephen Tyler 2009-11-06 14:02:32 Re: Re: Bench marking performance or experience using Solid State Disk Drives (SSD) with postgres