Re: BUG #14584: Segmentation fault importing large XML file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jorge Solórzano <jorsol(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14584: Segmentation fault importing large XML file
Date: 2017-03-08 19:36:08
Message-ID: CAFj8pRDt-9TnisDrvaB2KRtyxEbC5-f_E_Co7Sq0yOwdNkbjVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

2017-03-08 20:17 GMT+01:00 Jorge Solórzano <jorsol(at)gmail(dot)com>:

> Hi Pavel,
>
> By large I mean big in size: 935M
> Posts.xml: XML 1.0 document, UTF-8 Unicode text, with very long lines
>
>
> I installed debug symbols for libxml2 if this helps:
>
> #0 xmlParserPrintFileContextInternal (input=input(at)entry=0x55afc89ef4b0,
> channel=0x55afc636ca30 <appendStringInfo>, data=0x55afc89e5cc0) at
> ../../error.c:181
> cur = <optimized out>
> base = <optimized out>
> n = <optimized out>
> col = <optimized out>
> content = "\000\004\000\000\000\000\000\
> 000\260\364\236ȯU\000\000\000\000\000\000\000\000\000\000\
> 330\340\236ȯU\000\000`\203D*\375\177\000\000M(at)XƯU\000\000\
> 300\\\236ȯU\000\000\260\364\236ȯU\000\000\"\000\000\000\
> 000\000\000\000\332\370\023\340\241\177\000\000\240"
> ctnt = <optimized out>
> #1 0x00007fa1e00a587a in xmlParserPrintFileContext__internal_alias
> (input=input(at)entry=0x55afc89ef4b0) at ../../error.c:231
> No locales.
> #2 0x000055afc6542a1c in xml_errorHandler (data=0x55afc89e59b0,
> error=<optimized out>) at /build/postgresql-9.6-ZHxyhz/
> postgresql-9.6-9.6.2/build/../src/backend/utils/adt/xml.c:1661
> errFuncSaved = 0x7fa1e00a41b0 <xmlGenericErrorDefaultFunc>
> errCtxSaved = 0x0
> xmlerrcxt = 0x55afc89e59b0
> ctxt = <optimized out>
> input = 0x55afc89ef4b0
> node = <optimized out>
> name = <optimized out>
> domain = <optimized out>
> level = <optimized out>
> errorBuf = 0x55afc89e5cc0
> __func__ = "xml_errorHandler"
> #3 0x00007fa1e00a5fa4 in __xmlRaiseError (schannel=0x55afc6542920
> <xml_errorHandler>, schannel(at)entry=0x0, channel=channel(at)entry=0x0,
> data=0x55afc89e59b0, data(at)entry=0x0, ctx=ctx(at)entry=0x55afc89ede80,
> nod=nod(at)entry=0x0, domain=domain(at)entry=1, code=1,
> level=XML_ERR_FATAL, file=0x0, line=838090, str1=0x7fa1e01cc24d "Huge input
> lookup", str2=0x0, str3=0x0, int1=0, col=4,
> msg=0x7ffd2a4485e0 "internal error: %s\n") at ../../error.c:604
> ctxt = <optimized out>
> node = 0x0
> str = 0x55b08f70e270 "internal error: Huge input lookup\n"
> input = <optimized out>
> to = 0x55afc89ee0d8
> baseptr = 0x0
> #4 0x00007fa1e00aa900 in xmlFatalErr (ctxt=ctxt(at)entry=0x55afc89ede80,
> error=error(at)entry=XML_ERR_INTERNAL_ERROR, info=info(at)entry=0x7fa1e01cc24d
> "Huge input lookup") at ../../parser.c:546
> errmsg = <optimized out>
> errstr = "internal error: %s\n", '\000' <repetidos 109 veces>
> #5 0x00007fa1e00acf14 in xmlGROW (ctxt=0x55afc89ede80) at
> ../../parser.c:2084
> curEnd = <optimized out>
> curBase = <optimized out>
> #6 0x00007fa1e00c1338 in xmlParseContent__internal_alias
> (ctxt=0x55afc89ede80) at ../../parser.c:10101
> test = <optimized out>
> cons = 0
> cur = <optimized out>
> #7 0x00007fa1e00c1c13 in xmlParseElement__internal_alias (ctxt=ctxt(at)entry
> =0x55afc89ede80) at ../../parser.c:10255
> name = 0x55afc89ef577 "posts"
> prefix = 0x0
> URI = 0x0
> node_info = {node = 0x0, begin_pos = 140333225866765, begin_line =
> 94213473493376, end_pos = 140333225866817, end_line = 94213473493376}
> line = 2
> tlen = 5
> ret = 0x55afc89efab0
> nsNr = 0
> #8 0x00007fa1e00c266a in xmlParseDocument__internal_alias
> (ctxt=ctxt(at)entry=0x55afc89ede80) at ../../parser.c:10952
> start = "<?xm"
> enc = <optimized out>
> #9 0x00007fa1e00c9fd9 in xmlDoRead (reuse=1, options=0, encoding=0x0,
> URL=0x0, ctxt=0x55afc89ede80) at ../../parser.c:15430
> ret = <optimized out>
> #10 xmlCtxtReadMemory__internal_alias (ctxt=0x55afc89ede80,
> buffer=buffer(at)entry=0x7fa09306f040 "<?xml version=\"1.0\"
> encoding=\"utf-8\"?>\n<posts>\n <row Id=\"1\" PostTypeId=\"1\"
> AcceptedAnswerId=\"727273\" CreationDate=\"2009-07-15T06:27:46.723\"
> Score=\"155\" ViewCount=\"92736\" Body=\"&lt;p&gt;A Vista virtua"...,
> size=size(at)entry=979473840, URL=URL(at)entry=0x0, encoding=encoding(at)entry=0x0,
> options=options(at)entry=0) at ../../parser.c:15719
> input = 0x55afc89ef410
>

It looks not well handled libxml2 fatal error "internal error: Huge input
lookup"

So you are hit libxml2 limit - but this should not to finish by segfault

PostgreSQL probably doesn't use huge_tree feature of libxml2

maybe it is some new bug, because
https://www.postgresql.org/message-id/20140304155421.GM23803@rdorte.org was
reported correct behave.

With size 935M you are very near to PostgreSQL XML max size 1GB.

Regards

Pavel

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-03-08 19:54:37 Re: BUG #14584: Segmentation fault importing large XML file
Previous Message Jorge Solórzano 2017-03-08 19:17:37 Re: BUG #14584: Segmentation fault importing large XML file