Re: tsearch filenames unlikes special symbols and numbers

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch filenames unlikes special symbols and numbers
Date: 2007-09-09 04:38:37
Message-ID: Pine.LNX.4.64.0709090817530.2767@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sun, 2 Sep 2007, Tom Lane wrote:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> I made it reject all but latin letters, which is the same restriction
>>> that's in place for timezone set filenames. That might be overly
>>> strong, but we definitely have to forbid "." and "/" (and "\" on
>>> Windows). Do we want to restrict it to letters, digits, underscore?
>>> Or does it need to be weaker than that?
>
>> What's the problem with "."?
>
> ../../../../etc/passwd
>
> Possibly we could allow '.' as long as we forbade /, but the other
> trouble with allowing . is that it encourages people to try to specify
> the filetype suffix (as indeed Oleg was doing). I'd prefer to keep the
> suffixes out of the SQL object definitions, with an eye to possibly
> someday migrating all the configuration data inside the database.
> There's a reasonable argument for restricting the names used for these
> things in the SQL definitions to be valid SQL identifiers, so that that
> will work nicely...

So, what's the current policy ? Still a-z, A-Z ? I think we should allow
'.' and prevent '/'. Look, how ugly is our current ispell setup, which
depends on 3 files - stop word list, .dict and .aff.

Right now, I can use something like

CREATE TEXT SEARCH DICTIONARY en_ispell (
TEMPLATE = ispell,
DictFile = englishDict,
AffFile = englishAff,
StopWords = english
);

I'd better use english.dict, english.aff, english.stop, whih is usual for
any user, without dictating user here. We already did a lot of
restrictions.

I hope we won't require special extension like .dict, .aff, since it's
unknown in advance what files will use other dictionaries.
If we allow '.' without '/', then we'd be happy.
I'd remove requirement for extension of stop words list, which looks
rather artificially to me.

Oh, my god, I see we dictate extensions !

STATEMENT: CREATE TEXT SEARCH DICTIONARY en_ispell (
TEMPLATE = ispell,
DictFile = englishDict,
AffFile = englishAff,
StopWords = englishStop
);
ERROR: could not open dictionary file "/usr/local/pgsql-dev/share/tsearch_data/englishdict.dict": No such file or directory

Folk, this is too much ! Now, we dictate extensions '.dict, .affix, .stop',
what else ?

Does it defined by ispell template only, or it's global requirements ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Oleg Bartunov 2007-09-09 04:49:32 Re: tsearch filenames unlikes special symbols and numbers
Previous Message Oleg Bartunov 2007-09-04 16:37:47 Re: Code examples

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2007-09-09 04:43:19 ispell dictionary broken in CVS HEAD ?
Previous Message Andrew Dunstan 2007-09-09 04:02:28 invalidly encoded strings