From: | Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | FTS parser - missing UUID token type |
Date: | 2022-09-14 09:26:41 |
Message-ID: | 53476871-233c-f30d-2168-d3a2e89c99a6@sztoch.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I miss UUID, which indexes very strangely, is more and more popular and
people want to search for it.
See: https://www.postgresql.org/docs/current/textsearch-parsers.html
UUID is fairly easy to parse:
The hexadecimal digits are grouped as 32 hexadecimal characters with
four hyphens: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.
The number of characters per hyphen is 8-4-4-4-12. The last section of
four, or the N position, indicates the format and encoding in either one
to three bits.
Now, UUIDs parse each other differently, depending on whether the
individual parts begin with numbers or letters:
00633f1d-1fff-409e-8294-40a21f565904 '-40':6 '00633f1d':2
'00633f1d-1fff-409e':1 '1fff':3 '409e':4 '8294':5 'a21f565904':7
00856c28-2251-4aaf-82d3-e4962f5b732d '-2251':2 '-4':3 '00856c28':1
'82d3':6 'aaf':5 'aaf-82d3-e4962f5b732d':4 'e4962f5b732d':7
00a1cc84-816a-490a-a99c-8a4c637380b0 '00a1cc84':2
'00a1cc84-816a-490a-a99c-8a4c637380b0':1 '490a':4 '816a':3
'8a4c637380b0':6 'a99c':5
As a result, such identifiers cannot be found in the database later.
What is your opinion on missing tokens for FTS?
--
Przemysław Sztoch | Mobile +48 509 99 00 66
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2022-09-14 09:33:12 | Re: Improve description of XLOG_RUNNING_XACTS |
Previous Message | bt22kawamotok | 2022-09-14 09:12:52 | Re: [PATCH]Feature improvement for MERGE tab completion |