Re: Problem with restoring dump (may be tsearch-related)

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Problem with restoring dump (may be tsearch-related)
Date: 2002-09-05 18:20:41
Message-ID: Pine.GSO.4.44.0209052114260.10266-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 5 Sep 2002, Markus Wollny wrote:

> Hi!
>
> The &uuml; is literally in the file - we are parsing all of our editor's
> input for optimal HTML-output. And the german umlauts are represented as
> &[v]uml; where [v] is the corresponding vowel. Now you mention it, I
> believe that all of the strings which are in one of these "parse error
> at or near"-messages are actually preceded by a HTML-umlaut or the like:
>
> Just a snippet from my first example:
> psql:alldb1.sql:1122826: ERROR: parser: parse error at or near "uъerst"
> would be "Дuъerst" -> &auml;ъerst
> psql:alldb1.sql:1122826: ERROR: parser: parse error at or near "chst"
> could be "hЖchst" -> h&uml;chst
> psql:alldb1.sql:1122826: ERROR: parser: parse error at or near "mmern"
> could be "kЭmmern" -> k&uuml;mmern"
> psql:alldb1.sql:1122827: ERROR: parser: parse error at or near "ren"
> could be "TЭren" -> "T&uuml;ren"
> psql:alldb1.sql:1122827: ERROR: parser: parse error at or near "rfer"
> could be "DЖrfer" -> "D&ouml;rfer"
> psql:alldb1.sql:1122827: ERROR: parser: parse error at or near "ndig"
> could be "hintergrЭndig" -> "hintergr&uuml;ndig"
> psql:alldb1.sql:1122828: ERROR: parser: parse error at or near
> "henvorteile" could be "HЖhenvorteile" -> "H&ouml;henvorteile"
> psql:alldb1.sql:1122828: ERROR: parser: parse error at or near "hten"
> could be "blЭhten" -> "bl&uuml;hten"
> psql:alldb1.sql:1122829: ERROR: parser: parse error at or near
> "berqueren" could be "Эberqueren" -> "&uuml;berqueren"
> psql:alldb1.sql:1122829: ERROR: parser: parse error at or near "cken"
> -> "LЭcken" -> "L&uuml;cken"
> psql:alldb1.sql:1122830: ERROR: parser: parse error at or near "ck" ->
> "zurЭck" -> "zur&uuml;ck"
> psql:alldb1.sql:1122831: ERROR: parser: parse error at or near "hrend"
> -> "fЭhrend" -> "f&uuml;hrend"
> psql:alldb1.sql:1122831: ERROR: parser: parse error at or near "ude" ->
> "GebДude" -> "Geb&auml;ude"
> psql:alldb1.sql:1122831: ERROR: parser: parse error at or near "nnen"
> -> "kЖnnen" -> "k&ouml;nnen"
> psql:alldb1.sql:1122831: ERROR: parser: parse error at or near
> "berzeugen" -> "Эberzeugen" ->"&uuml;berzeugen"
>
> As txtidx actually just contains substrings and ignores the HTML-umlauts
> (a slight disadvantage we are quite happy to live with), it only stores

Hmm, default parser indeed doesn't understand &uuml;
I think the right way is to move from using flex, because it doesn't
work with locale (or just we don't know how to get it work, NEED HELP !)
Does locale will solve the problem ?

Or you could try to add support of &uuml; to parser.l (look around line 125)

> those substrings before or after ampersand or semicolon anyway - which
> shouldn't cause any problems whatsoever, so I think we might rule out
> tsearch being the cause. But why would ordinary plain text cause these
> parse-errors?
>
> What shall I do next in order to get down to the problem itself?
>

try dumping with --insert option

> Regards,
>
> Markus
>
> > -----UrsprЭngliche Nachricht-----
> > Von: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > Gesendet: Donnerstag, 5. September 2002 18:23
> > An: Markus Wollny
> > Cc: pgsql-general(at)postgresql(dot)org
> > Betreff: Re: [GENERAL] Problem with restoring dump (may be
> > tsearch-related)
> >
> >
> > "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
> > > The entries are quite long, and I don't want to cause too
> > much traffic,
> > > so I don't dare to give you more than this one example:
> >
> > > Restore-attempt outputs e.g.:
> > > psql:alldb1.sql:1434914: ERROR: parser: parse error at or near
> > > "ckenmuskeln"
> >
> > Hmm. I see that string in the context
> >
> > > Wie sich die R&uuml;ckenmuskeln anspannen, wird im Bild aber nicht
> >
> > What exactly is the string that you've represented here as &uuml; ?
> > Is that literally what's in the dump file, or has something helpfully
> > html-ized some weird Unicode sequence?
> >
> > As far as I can tell, what must be happening is that the COPY data
> > transfer has been terminated and the regular SQL parser is trying to
> > make sense of the input starting at "ckenmuskeln anspannen,". I'm
> > wondering if something is misreading the &uuml; sequence as "\." ...
> > which would probably be a character-set-encoding kind of problem.
> >
> > regards, tom lane
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2002-09-05 18:38:11 Re: "...integer[] references..." = error
Previous Message Brian Hirt 2002-09-05 18:07:58 Re: postgresql does seqscan instead of using an existing