From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [v9.2] make_greater_string() does not return a string in some cases |
Date: | 2011-09-23 13:11:59 |
Message-ID: | CA+TgmoavQP28OY0QRkXSQiS131u2sEFc7e72yU3x=pjr0BPU=w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Fri, Sep 23, 2011 at 8:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Anyway, I won't stand in the way of the patch as long as it's modified
>>> to limit the number of values considered for any one character position
>>> to something reasonably small.
>
>> I think that limit in both the old and new code is 1, except that the
>> new code does it more efficiently.
>
>> Am I confused?
>
> Yes, or else I am. Consider a 4-byte UTF8 character at the end of the
> string. The existing code increments the last byte up to 255 (rejecting
> everything past 0xBF), then gives up and truncates that character away.
> So the maximum number of tries for that character position is between 0
> and 127 depending on what the original character was (with at most 63 of
> the incremented values getting past the verifymbstr test).
>
> The proposed patch is going to iterate through all Unicode code points
> up to U+7FFFFF before giving up. Since it's possible that we need to
> increment something further left to succeed at all, this doesn't seem
> like a good plan.
I think you're misreading the code. It does this:
while (len > 0)
{
boring stuff;
if (charincfunc(lastchar, charlen))
{
more boring stuff;
if (we made a greater string)
return it;
cleanup;
}
truncate away last character;
}
I don't see how that's ever going to try more than one character in
the same position.
What may be confusing you is that the old code has two loops: an outer
loop that tests whether we've made a greater string, and an inner loop
that tests whether we've made a validly encoded string at all. In the
new code, at least in the UTF-8 case, the inner loop is GONE
altogether. Instead of iterating until we construct a valid
character, we just use our mad UTF-8 skillz to assemble one, and
return it.
Or else I need to go drink a few cups of tea and look at this again.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2011-09-23 13:39:46 | Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present |
Previous Message | Lou Picciano | 2011-09-23 13:01:37 | Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-09-23 13:29:01 | Re: new createuser option for replication role |
Previous Message | Lou Picciano | 2011-09-23 13:01:37 | Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present |