Re: Problem with restoring dump (may be tsearch-related)

From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Problem with restoring dump (may be tsearch-related)
Date: 2002-09-05 16:45:19
Message-ID: 2266D0630E43BB4290742247C8910575014CE3C5@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

The &uuml; is literally in the file - we are parsing all of our editor's
input for optimal HTML-output. And the german umlauts are represented as
&[v]uml; where [v] is the corresponding vowel. Now you mention it, I
believe that all of the strings which are in one of these "parse error
at or near"-messages are actually preceded by a HTML-umlaut or the like:

Just a snippet from my first example:
psql:alldb1.sql:1122826: ERROR: parser: parse error at or near "ußerst"
would be "äußerst" -> &auml;ßerst
psql:alldb1.sql:1122826: ERROR: parser: parse error at or near "chst"
could be "höchst" -> h&uml;chst
psql:alldb1.sql:1122826: ERROR: parser: parse error at or near "mmern"
could be "kümmern" -> k&uuml;mmern"
psql:alldb1.sql:1122827: ERROR: parser: parse error at or near "ren"
could be "Türen" -> "T&uuml;ren"
psql:alldb1.sql:1122827: ERROR: parser: parse error at or near "rfer"
could be "Dörfer" -> "D&ouml;rfer"
psql:alldb1.sql:1122827: ERROR: parser: parse error at or near "ndig"
could be "hintergründig" -> "hintergr&uuml;ndig"
psql:alldb1.sql:1122828: ERROR: parser: parse error at or near
"henvorteile" could be "Höhenvorteile" -> "H&ouml;henvorteile"
psql:alldb1.sql:1122828: ERROR: parser: parse error at or near "hten"
could be "blühten" -> "bl&uuml;hten"
psql:alldb1.sql:1122829: ERROR: parser: parse error at or near
"berqueren" could be "überqueren" -> "&uuml;berqueren"
psql:alldb1.sql:1122829: ERROR: parser: parse error at or near "cken"
-> "Lücken" -> "L&uuml;cken"
psql:alldb1.sql:1122830: ERROR: parser: parse error at or near "ck" ->
"zurück" -> "zur&uuml;ck"
psql:alldb1.sql:1122831: ERROR: parser: parse error at or near "hrend"
-> "führend" -> "f&uuml;hrend"
psql:alldb1.sql:1122831: ERROR: parser: parse error at or near "ude" ->
"Gebäude" -> "Geb&auml;ude"
psql:alldb1.sql:1122831: ERROR: parser: parse error at or near "nnen"
-> "können" -> "k&ouml;nnen"
psql:alldb1.sql:1122831: ERROR: parser: parse error at or near
"berzeugen" -> "überzeugen" ->"&uuml;berzeugen"

As txtidx actually just contains substrings and ignores the HTML-umlauts
(a slight disadvantage we are quite happy to live with), it only stores
those substrings before or after ampersand or semicolon anyway - which
shouldn't cause any problems whatsoever, so I think we might rule out
tsearch being the cause. But why would ordinary plain text cause these
parse-errors?

What shall I do next in order to get down to the problem itself?

Regards,

Markus

> -----Ursprüngliche Nachricht-----
> Von: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Gesendet: Donnerstag, 5. September 2002 18:23
> An: Markus Wollny
> Cc: pgsql-general(at)postgresql(dot)org
> Betreff: Re: [GENERAL] Problem with restoring dump (may be
> tsearch-related)
>
>
> "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
> > The entries are quite long, and I don't want to cause too
> much traffic,
> > so I don't dare to give you more than this one example:
>
> > Restore-attempt outputs e.g.:
> > psql:alldb1.sql:1434914: ERROR: parser: parse error at or near
> > "ckenmuskeln"
>
> Hmm. I see that string in the context
>
> > Wie sich die R&uuml;ckenmuskeln anspannen, wird im Bild aber nicht
>
> What exactly is the string that you've represented here as &uuml; ?
> Is that literally what's in the dump file, or has something helpfully
> html-ized some weird Unicode sequence?
>
> As far as I can tell, what must be happening is that the COPY data
> transfer has been terminated and the regular SQL parser is trying to
> make sense of the input starting at "ckenmuskeln anspannen,". I'm
> wondering if something is misreading the &uuml; sequence as "\." ...
> which would probably be a character-set-encoding kind of problem.
>
> regards, tom lane
>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joel Rodrigues 2002-09-05 16:46:44 "...integer[] references..." = error
Previous Message Tom Lane 2002-09-05 16:23:08 Re: Problem with restoring dump (may be tsearch-related)