From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Edson Richter <edsonrichter(at)hotmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: XMLEXISTS on legacy XML with malformed xmlns |
Date: | 2016-02-12 17:32:07 |
Message-ID: | CAFj8pRDMGkK0en0WdmMTYzErkSo1qNR8vhN-H8KGW6_W-q4XGg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2016-02-12 17:53 GMT+01:00 Edson Richter <edsonrichter(at)hotmail(dot)com>:
>
> 2016-02-12 1:53 GMT+01:00 Edson Richter <edsonrichter(at)hotmail(dot)com>:
>
>> Hi!
>>
>> I've some (about 1M records) containing legacy XML I would like to parse
>> and apply XMLEXISTS.
>>
>> This is the query:
>>
>> select * from xmllog
>> where xpath_exists(('//MyDocument[(at)DocNum = ''000411828'']'::text),
>> xmlparse(document cdataout));
>>
>> This is the error:
>>
>> ERRO: could not parse XML document
>> SQL state: 2200M
>> Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a
>> valid URI
>> <leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
>> ^
>>
>> This is the sample XML with malformed xmlns (I've shortenet the data, but
>> the important thing here is the malformed xmlns):
>>
>> "<?xml version="1.0" encoding="utf-8" ?>
>> <leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
>> <carrier xmlns="" controlnum="04503660000146">
>> <MyDocument DocNum="000511852">
>> <other_info>0</other_info>
>> <complement info (...)"
>>
>>
>> I can easly read this XML in Notepad++, and also in Java - but PostgreSQL
>> always throw error.
>>
>> Can you plase tell me how can make PostgreSQL ignore this malformed xmlns
>> and proceed processing the XML?
>>
>
> PostgreSQL uses libxml2, but the usage isn't too configurable. So my
> advice is using defensive strategy and clean/fix wrong namespace with
> string tools - replace function.
>
> Regards
>
> Pavel
>
>
> Thanks, Pavel.
>
> I did suspect that. But then I have about 10.000 new records each week,
> and I've no control over the system that generates it.
> It is a shame, but sometimes we have to live with such problems.
>
I understand - you can handle this error and broken xml you can ignore. Or
you can use a parser from Python, Perl - PLPerlu or PLPythonu is great for
it.
Regards
Pavel
>
> Regards,
>
> Edson Richter
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kaushal Shriyan | 2016-02-12 17:59:28 | Re: Unrecognized configuration parameter in bdr 0.9.3 |
Previous Message | Yury Zhuravlev | 2016-02-12 17:01:38 | Re: Test CMake build |