From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Implementation of SASLprep for SCRAM-SHA-256 |
Date: | 2017-04-05 16:33:13 |
Message-ID: | bcdd548d-04ce-69a2-1328-29627104d212@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/05/2017 07:23 AM, Michael Paquier wrote:
> fore
>
> On Wed, Apr 5, 2017 at 7:05 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> I will continue tomorrow, but I wanted to report on what I've done so far.
>> Attached is a new patch version, quite heavily modified. Notable changes so
>> far:
>
> Great, thanks!
>
>> * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints.
>> IMHO this makes the tables easier to read (to a human), and they are also
>> packed slightly more tightly (see next two points), as you can fit more
>> codepoints in a 16-bit integer.
>
> Using directly codepoints is not much consistent with the existing
> backend, but for the sake of packing things more, OK.
Oh, I see, we already have similar functions in wchar.c.
unicode_to_utf8() and utf8_to_unicode(). We should probably move those
to src/common, rather than re-invent the wheel.
> pg_utf8_islegal() and pg_utf_mblen() should as well be moved in their
> own file I think, and wchar.c can use that.
Yeah..
>> * The list of characters excluded from recomposition is currently hard-coded
>> in utf_norm_generate.pl. However, that list is available in machine-readable
>> format, in file CompositionExclusions.txt. Since we're reading most of the
>> data from UnicodeData.txt, would be good to read the exclusion table from a
>> file, too.
>
> Ouch. Those are present here...
> http://www.unicode.org/reports/tr41/tr41-19.html#Exclusions
> Definitely it makes more sense to read them from a file.
Did that.
>> * SASLPrep specifies normalization form KC, but it also specifies that some
>> characters are mapped to space or nothing. Should do those mappings, too.
>
> Ah, right. Those ones are here:
> https://tools.ietf.org/html/rfc3454#appendix-B.1
Yep.
Attached is a new version. Notable changes since yesterday:
* Implemented the rest of the SASLPrep, mapping some characters to
spaces, leaving out others, and checking for prohibited characters and
bidirectional strings.
* Moved things around. There's now a separate directory,
src/common/unicode, which contains the perl scripts and the test code.
Those are not needed to build from source, as the pre-generated tables
are put in src/include/common. Similar to the scripts in
src/backend/utils/mb/Unicode, really.
* Renamed many things from utf_* to unicode_*, since they don't deal
with utf-8 input anymore.
This is starting to shape up, but still some cleanup work to do. I will
continue tomorrow..
- Heikki
Attachment | Content-Type | Size |
---|---|---|
implement-SASLprep-3.patch.gz | application/gzip | 68.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-04-05 16:53:58 | Re: partitioned tables and contrib/sepgsql |
Previous Message | Tom Lane | 2017-04-05 16:29:34 | Re: Functions Immutable but not parallel safe? |