Re: Finding matching words in a word game

From: Chris Angelico <rosuav(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Finding matching words in a word game
Date: 2013-03-06 13:57:28
Message-ID: CAPTjJmoQ9sdw81Hfb7eGd+Uvxx66N-62J8BnPdjM8z7upmtSpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Mar 5, 2013 at 8:29 PM, Alexander Farber
<alexander(dot)farber(at)gmail(dot)com> wrote:
> is there maybe a clever way of finding all possible words
> from a given set of letters by means of PostgreSQL
> (i.e. inside the database vs. scanning all database
> rows by a PHP script, which would take too long) -
> if the dictionary is kept in a simple table like:
>
> create table good_words (
> word varchar(16) primary key,
> stamp timestamp default current_timestamp
> );

How many words are you looking at, in your dictionary? I wrote an
anagramming program in C++ that works off a fairly small dictionary of
common words (~60K words) and gives adequate performance without any
indexing - the time is dwarfed by just scrolling the text up the
console. The only thing I'd recommend doing differently is the
language - use one that has a proper hash/tree type, saving you the
trouble of implementing one (I implemented my own non-balancing binary
tree for the task... no wait, on examination, it seems to actually be
a linear search - and yet it has passable performance). PHP can quite
probably do everything you want here; otherwise, I'd recommend
something like Python or Pike.

Simple Python example:

words = {}
for word in dictionary: # provide a dictionary somehow - maybe from a file/db
words.setdefault(''.join(sorted(word)),[]).append(word)
# Voila! You now have your mapping. One-off initialization complete.

find_anagrams_of = "stop"
anagrams = words.get(''.join(sorted(find_anagrams_of)),[])
# anagrams is now a list of all known anagrams of the target -
possibly an empty list
print(anagrams)

['opts', 'post', 'pots', 'spot', 'stop', 'tops']

On my laptop, loading ~100K words took about 1 second, and the lookup
took effectively no time. I don't think there's any need for a heavy
database engine here, unless you're working with millions and millions
of words :)

Chris Angelico

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Farber 2013-03-06 14:14:43 Re: Finding matching words in a word game
Previous Message Alvaro Herrera 2013-03-06 13:19:48 Re: Change owner for all tables in a database in one batch