Quick Links

[PATCH] Expand character set for ltree labels

From:	Garen Torikian <gjtorikian(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	[PATCH] Expand character set for ltree labels
Date:	2022-10-04 16:54:46
Message-ID:	CAGXsc+-mNg9Gc0rp-ER0sv+zkZSZp2wE9-LX6XcoWSLVz22tZA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dear hackers,

I am submitting a patch to expand the label requirements for ltree.

The current format is restricted to alphanumeric characters, plus _.
Unfortunately, for non-English labels, this set is insufficient. Rather
than figure out how to expand this set to include characters beyond the
ASCII limit, I have instead opted to provide users with some mechanism for
storing encoded UTF-8 characters which is widely used: punycode (
https://en.wikipedia.org/wiki/Punycode)

The punycode range of characters is the exact same set as the existing
ltree range, with the addition of a hyphen (-). Within this system, any
human language can be encoded using just A-Za-z0-9-.

On top of this, I added support for two more characters: # and ;, which are
used for HTML entities. Note that & and % have special significance in the
existing ltree logic; users would have to encode items as #20; (rather than
%20). This seems a fair compromise.

Since the encoding could make a regular slug even longer, I have also
doubled the character limit, from 256 to 512.

Please let me know if I can provide any more information or changes.

Very sincerely,
Garen

Attachment	Content-Type	Size
0001-Expand-character-set-for-ltree-labels.patch	application/octet-stream	15.3 KB

Responses

Re: [PATCH] Expand character set for ltree labels at 2022-10-04 22:32:24 from Nathan Bossart
Re: [PATCH] Expand character set for ltree labels at 2022-10-05 18:59:01 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2022-10-04 17:10:21	Re: interrupted tap tests leave postgres instances around
Previous Message	Tom Lane	2022-10-04 15:55:06	Re: Reducing the chunk header sizes on all memory context types