Re: XMLEXISTS on legacy XML with malformed xmlns

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Edson Richter <edsonrichter(at)hotmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: XMLEXISTS on legacy XML with malformed xmlns
Date: 2016-02-12 17:32:07
Message-ID: CAFj8pRDMGkK0en0WdmMTYzErkSo1qNR8vhN-H8KGW6_W-q4XGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2016-02-12 17:53 GMT+01:00 Edson Richter <edsonrichter(at)hotmail(dot)com>:

>
> 2016-02-12 1:53 GMT+01:00 Edson Richter <edsonrichter(at)hotmail(dot)com>:
>
>> Hi!
>>
>> I've some (about 1M records) containing legacy XML I would like to parse
>> and apply XMLEXISTS.
>>
>> This is the query:
>>
>> select * from xmllog
>> where xpath_exists(('//MyDocument[(at)DocNum = ''000411828'']'::text),
>> xmlparse(document cdataout));
>>
>> This is the error:
>>
>> ERRO: could not parse XML document
>> SQL state: 2200M
>> Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a
>> valid URI
>> <leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
>> ^
>>
>> This is the sample XML with malformed xmlns (I've shortenet the data, but
>> the important thing here is the malformed xmlns):
>>
>> "<?xml version="1.0" encoding="utf-8" ?>
>> <leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
>> <carrier xmlns="" controlnum="04503660000146">
>> <MyDocument DocNum="000511852">
>> <other_info>0</other_info>
>> <complement info (...)"
>>
>>
>> I can easly read this XML in Notepad++, and also in Java - but PostgreSQL
>> always throw error.
>>
>> Can you plase tell me how can make PostgreSQL ignore this malformed xmlns
>> and proceed processing the XML?
>>
>
> PostgreSQL uses libxml2, but the usage isn't too configurable. So my
> advice is using defensive strategy and clean/fix wrong namespace with
> string tools - replace function.
>
> Regards
>
> Pavel
>
>
> Thanks, Pavel.
>
> I did suspect that. But then I have about 10.000 new records each week,
> and I've no control over the system that generates it.
> It is a shame, but sometimes we have to live with such problems.
>

I understand - you can handle this error and broken xml you can ignore. Or
you can use a parser from Python, Perl - PLPerlu or PLPythonu is great for
it.

Regards

Pavel

>
> Regards,
>
> Edson Richter
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kaushal Shriyan 2016-02-12 17:59:28 Re: Unrecognized configuration parameter in bdr 0.9.3
Previous Message Yury Zhuravlev 2016-02-12 17:01:38 Re: Test CMake build