Quick Links

Re: make_greater_string() does not return a string in some cases

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>
To:	pgsql-bugs(at)postgresql(dot)org
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: make_greater_string() does not return a string in some cases
Date:	2011-07-08 09:21:16
Message-ID:	20110708.182116.44187733.horiguchi.kyotaro@oss.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

Hello, Could you let me go on with this topic?

It is hard to ignore this glitch for us using CJK - Chinese,
Japanese, and Korean - characters on databse.. Maybe..

Saying on Japanese under the standard usage, about a hundred
characters out of seven thousand make make_greater_string() fail.

This is not so frequent to happen but also not as rare as
ignorable.

I think this glitch is caused because the method to derive the
`next character' is fundamentally a secret of each encoding but
now it is done in make_greater_string() using the method extended
from that of 1 byte ASCII charset for all encodings together.

So, I think it is reasonable that encoding info table (struct
pg_wchar_tbl) holds the function to do that.

How about this idea?

Points to realize this follows,

- pg_wchar_tbl(at)pg_wchar(dot)c has new element `charinc' that holds a
function to increment a character of this encoding.

- Basically, the value of charinc is a `generic' increment
function that does what make_greater_string() does in current
implement.

- make_greater_string() now uses charinc for database encoding to
increment characters instead of the code directly written in
it.

- Give UTF-8 a special increment function.

As a consequence of this modification, make_greater_string()
looks somewhat simple thanks to disappearing of the sequence that
handles bare bytes in string. And doing `increment character'
with the knowledge of the encoding can be straightforward and
light and backtrack-free, and have fewer glitches than the
generic method.

# But the process for BYTEAOID remains there dissapointingly.

There still remains some glitches but I think it is overdo to do
conversion that changes the length of the character. Only 5
points out of 17 thousands (in current method, roughly for all
BMP characters) remains, and none of them are not Japanese
character :-)

The attached patch is sample implement of this idea.

What do you think about this patch?

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
unknown_filename	text/plain	16.2 KB

Responses

Re: make_greater_string() does not return a string in some cases at 2011-07-09 03:28:32 from Robert Haas

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Dmitry	2011-07-08 11:33:09	BUG #6101: ALTER TABLE hangs with AccessExclusiveLock
Previous Message	zhaowy	2011-07-08 08:20:45	BUG #6099: Does pgcluster support hibernate?

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2011-07-08 09:26:54	Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.
Previous Message	Kohei KaiGai	2011-07-08 09:09:54	Re: [v9.2] Fix leaky-view problem, part 2