From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [v9.2] make_greater_string() does not return a string in some cases |
Date: | 2011-09-22 15:46:43 |
Message-ID: | 21348.1316706403@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> One thing I was thinking about is that it would be useful to have some
> metric for judging how well any given algorithm that we might pick
> here actually works.
Well, the metric that we were indirectly using earlier was the
number of characters in a given locale for which the algorithm
fails to find a greater one (excluding whichever character is "last",
I guess, or you could just recognize there's always at least one).
> For example, if we were to try all possible
> three character strings in some encoding and run make_greater_string()
> on each one of them, we could then measure the failure percentage. Or
> if that's too many cases to crank through then we could limit it some
> way -
Even in UTF8 there's only a couple million assigned code points, so for
test purposes anyway it doesn't seem like we couldn't crank through them
all. Also, in many cases you could probably figure it out by analysis
instead of brute-force testing every case.
A more reasonable objection might be that a whole lot of those code
points are things nobody cares about, and so we need to weight the
results somehow by the actual popularity of the character. Not sure
how to take that into account.
Another issue here is that we need to consider not just whether we find
a greater character, but "how much greater" it is. This would apply to
my suggestion of incrementing the top byte without considering
lower-order bytes --- we'd be skipping quite a lot of code space for
each increment, and it's conceivable that that would be quite hurtful in
some cases. Not sure how to account for that either. An extreme
example here is an "incrementer" that just immediately returns the last
character in the sort order for any lesser input.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-09-22 16:09:45 | Re: [v9.2] make_greater_string() does not return a string in some cases |
Previous Message | Robert Haas | 2011-09-22 15:26:56 | Re: [v9.2] make_greater_string() does not return a string in some cases |
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira de Oliveira | 2011-09-22 15:49:42 | Re: unaccent contrib |
Previous Message | Alvaro Herrera | 2011-09-22 15:44:58 | Re: memory barriers (was: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs) |