Quick Links

Re: Ltree syntax improvement

From:	Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Dmitry Belyavsky <beldmit(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>
Subject:	Re: Ltree syntax improvement
Date:	2020-03-26 21:14:37
Message-ID:	a066844c-5546-b9fe-c776-bf17bd26bf82@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 25.03.2020 2:08, Tom Lane wrote:

> Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> writes:
>> Attached new version of the patch.
> I spent a little bit of time looking through this, and have a few
> comments:
>
> * You have a lot of places where tests for particular ASCII characters
> are done like this:
>
> if ((charlen == 1) && t_iseq(src, '\\'))
>
> This is a tedious and expensive way to spell
>
> if (*src == '\\')
>
> because charlen is necessarily 1 if you are looking at an ASCII character;
> there is no allowed backend encoding in which an ASCII character can be
> the first byte of a multibyte character. Aside from the direct
> simplifications of the tests that this makes possible, I see some places
> where you'd not have to pass charlen around, either.

All unnecessary checks of charlen were removed, but t_iseq() were left for
consistency.

> * I spent a fair amount of time thinking that a lot of the added code
> was wrong because it was only considering escaping and not
> double-quoting. I eventually concluded that the idea is to convert
> double-quoting to a pure escape-based representation during input
> and store it that way. However, I don't really see why that is either
> necessary or a good idea --- the internal storage already has a length
> counter for each label. So I think what you really ought to be doing
> here is simplifying out both quotes and escapes during ltree_in
> and just storing the notionally-represented string internally.
> (If I've misunderstood what the plan is, well the utter lack of
> documentation in the patch isn't helping.)

ltree_in() removes quotes and escapes before storing strings (see
copy_unescaped()), just as you suggest.

ltree_out() adds escapes and quotes if necessary (see copy_escaped(),
extra_bytes_for_escaping()).

I have refactored code a bit, removed duplicated code, fixed several
bugs in reallocation of output strings, and added some comments.

> * The added test cases seem a bit excessive and repetitive.

I have removed some tests that have become redundant after changes in
parsing.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Replace-if-with-switch-in-ltree-code-20200326.patch	text/x-patch	10.4 KB
0002-Ltree-syntax-improvements-20200326.patch	text/x-patch	66.1 KB

In response to

Re: Ltree syntax improvement at 2020-03-24 23:08:15 from Tom Lane

Responses

Re: Ltree syntax improvement at 2020-04-01 23:46:50 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2020-03-26 21:18:40	Re: Berserk Autovacuum (let's save next Mandrill)
Previous Message	Stephen Frost	2020-03-26 21:00:00	Re: backup manifests