| From: | Chris Angelico <rosuav(at)gmail(dot)com> |
|---|---|
| To: | pgsql-general(at)postgresql(dot)org |
| Subject: | Re: Finding matching words in a word game |
| Date: | 2013-03-06 13:57:28 |
| Message-ID: | CAPTjJmoQ9sdw81Hfb7eGd+Uvxx66N-62J8BnPdjM8z7upmtSpA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Tue, Mar 5, 2013 at 8:29 PM, Alexander Farber
<alexander(dot)farber(at)gmail(dot)com> wrote:
> is there maybe a clever way of finding all possible words
> from a given set of letters by means of PostgreSQL
> (i.e. inside the database vs. scanning all database
> rows by a PHP script, which would take too long) -
> if the dictionary is kept in a simple table like:
>
> create table good_words (
> word varchar(16) primary key,
> stamp timestamp default current_timestamp
> );
How many words are you looking at, in your dictionary? I wrote an
anagramming program in C++ that works off a fairly small dictionary of
common words (~60K words) and gives adequate performance without any
indexing - the time is dwarfed by just scrolling the text up the
console. The only thing I'd recommend doing differently is the
language - use one that has a proper hash/tree type, saving you the
trouble of implementing one (I implemented my own non-balancing binary
tree for the task... no wait, on examination, it seems to actually be
a linear search - and yet it has passable performance). PHP can quite
probably do everything you want here; otherwise, I'd recommend
something like Python or Pike.
Simple Python example:
words = {}
for word in dictionary: # provide a dictionary somehow - maybe from a file/db
words.setdefault(''.join(sorted(word)),[]).append(word)
# Voila! You now have your mapping. One-off initialization complete.
find_anagrams_of = "stop"
anagrams = words.get(''.join(sorted(find_anagrams_of)),[])
# anagrams is now a list of all known anagrams of the target -
possibly an empty list
print(anagrams)
['opts', 'post', 'pots', 'spot', 'stop', 'tops']
On my laptop, loading ~100K words took about 1 second, and the lookup
took effectively no time. I don't think there's any need for a heavy
database engine here, unless you're working with millions and millions
of words :)
Chris Angelico
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Farber | 2013-03-06 14:14:43 | Re: Finding matching words in a word game |
| Previous Message | Alvaro Herrera | 2013-03-06 13:19:48 | Re: Change owner for all tables in a database in one batch |