Re: Hide exposed impl detail of wchar.c

From: Jubilee Young <workingjubilee(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Hide exposed impl detail of wchar.c
Date: 2023-11-20 18:50:36
Message-ID: CAPNHn3pGK2cvg4xACegaXjbRkun26F--7d2T4aT5ASvExMZpjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 17, 2023 at 2:26 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Fri, Nov 17, 2023 at 5:54 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >
> > It looks like is_valid_ascii() was originally added to pg_wchar.h so that
> > it could easily be used elsewhere [0] [1], but that doesn't seem to have
> > happened yet.
> >
> > Would moving this definition to a separate header file be a viable option?
>
> Seems fine to me. (I believe the original motivation for making it an
> inline function was for in pg_mbstrlen_with_len(), but trying that
> hasn't been a priority.)

In that case, I took a look across the codebase and saw a
utils/ascii.h that doesn't
seem to have gotten much love, but I suppose one could argue that it's intended
to be a backend-only header file?

As the codebase is growing some enhanced UTF-8 support, you'll want somewhere
that contains the optimized US-ASCII routines, because, as US-ASCII is
a subset of
UTF-8, and often faster to handle, it's typical for such codepaths to look like

```c
while (i < len && no_multibyte_chars) {
i = i + ascii_op_version(i, buffer, &no_multibyte_chars);
}

while (i < len) {
i = i + utf8_op_version(i, buffer);
}
```

So it should probably end up living somewhere near the UTF-8 support, and
the easiest way to make it not go into something pgrx currently
includes would be
to make it a new header file, though there's a fair amount of API we
don't touch.

From the pgrx / Rust perspective, Postgres function calls are passed
via callback
to a "guard function" that guarantees that longjmp and setjmp don't
cause trouble
(and makes sure we participate in that). So we only want to call
Postgres functions
if we "can't replace" them, as the overhead is quite a lot. That means
UTF-8-per-se
functions aren't very interesting to us as the Rust language already
supports it, but
we do benefit from access to transcoding to/from UTF-8.

—Jubilee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-11-20 19:03:01 Re: trying again to get incremental backup
Previous Message Robert Haas 2023-11-20 18:44:26 Re: Add recovery to pg_control and remove backup_label