| From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> | 
|---|---|
| To: | Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | text search and "filenames" | 
| Date: | 2007-10-25 13:47:40 | 
| Message-ID: | 20071025134740.GK5661@alvh.no-ip.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
I noticed that the default parser does not recognize Windows-style
filenames:
alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\archivos');
   alias   |   description   |  token   
-----------+-----------------+----------
 asciiword | Word, all ASCII | c
 blank     | Space symbols   | :\
 asciiword | Word, all ASCII | archivos
(3 lignes)
I played with it a bit (see attached patch -- basically I added \ in all
places where a / was being parsed, in the file-path states) and managed
to have it parse some naive versions, like
alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\archivos\\foo');
 alias |    description    |      token      
-------+-------------------+-----------------
 file  | File or path name | c:\archivos\foo
(1 ligne)
However it fails as soon as you have a space, which is quite common on
Windows, for example
alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\Program Files\\');
   alias   |    description    |   token    
-----------+-------------------+------------
 file      | File or path name | c:\Program
 blank     | Space symbols     |  
 asciiword | Word, all ASCII   | Files
 blank     | Space symbols     | \
(4 lignes)
It also fails to recognize "network" file names, like
alvherre=# SELECT alias, description, token FROM ts_debug(e'\\\\server\\archivos\\foo');
   alias   |   description   |  token   
-----------+-----------------+----------
 blank     | Space symbols   | \\
 asciiword | Word, all ASCII | server
 blank     | Space symbols   | \
 asciiword | Word, all ASCII | archivos
 blank     | Space symbols   | \
 asciiword | Word, all ASCII | foo
(6 lignes)
Is this something worth worrying about?
-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
| Attachment | Content-Type | Size | 
|---|---|---|
| tsearch-win-files.patch | text/x-diff | 2.6 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zdenek Kotala | 2007-10-25 13:51:35 | Datum should be defined outside postgres.h | 
| Previous Message | Magnus Hagander | 2007-10-25 13:06:09 | Re: 8.3 GSS Issues |