From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: speed up verifying UTF-8 |
Date: | 2021-06-03 19:10:35 |
Message-ID: | CAFBsxsGU9osh5j16FdzrFHLPTV0sR0ccxHx5p_gRwxqEFAjsbA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 03/06/2021 17:33, Greg Stark wrote:
> >> 3. It's probably cheaper perform the HAS_ZERO check just once on (half1
> > | half2). We have to compute (half1 | half2) anyway.
> >
> > Wouldn't you have to check (half1 & half2) ?
>
> Ah, you're right of course. But & is not quite right either, it will
> give false positives. That's ok from a correctness point of view here,
> because we then fall back to checking byte by byte, but I don't think
> it's a good tradeoff.
Ah, of course.
> /*
> * Check if there are any zero bytes in this chunk.
> *
> * First, add 0x7f to each byte. This sets the high bit
in each byte,
> * unless it was a zero. We already checked that none of
the bytes had
> * the high bit set previously, so the max value each
byte can have
> * after the addition is 0x7f + 0x7f = 0xfe, and we don't
need to
> * worry about carrying over to the next byte.
> */
> x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
> x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
>
> /* then check that the high bit is set in each byte. */
> x = (x1 | x2);
> x &= UINT64CONST(0x8080808080808080);
> if (x != UINT64CONST(0x8080808080808080))
> return 0;
That seems right, I'll try that and update the patch. (Forgot to attach
earlier anyway)
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2021-06-03 19:11:25 | Re: security_definer_search_path GUC |
Previous Message | Heikki Linnakangas | 2021-06-03 19:08:57 | Re: speed up verifying UTF-8 |